Where do statistics fit into the world of analysis? Well, analysis allows us to make objective arguments to discover the truth based on actual observations – that would be data – as opposed to subjective claims based purely on intuition or potentially biased points of view.
But observed data still comes with an inherent degree of uncertainty. Where math tends to be “black and white” precise, statistics deals in the realm of “gray.” Statistics allow us to factor in uncertainty so we can draw meaningful conclusions with confidence. In this blog post, we will take a look at statistics’ fundamental concepts and tools.
Our analytical conclusions are typically based on a subset of data, known as a sample, that we use to better understand an entire population. The patterns in those samples allow us to estimate things about the overall population. Understanding those patterns, the limitations of samples, and how to draw defensible conclusions is at heart of numeracy. Pure math certainly is important as well, but as analysts, we’ll be dealing in the “gray” of statistics as often as not, so it’s essential to know if the difference.
Statistics consists of a set of tools and techniques that help us to identify and interpret patterns. Deborah Rumsey is our expert for helping to understand these. Her book “Statistics for Dummies” offers a helpful introduction to the fundamental concepts. In fact, the organization of the book provides a good way to ground ourselves in the basics.
This idea is about “discovering the middle” of a population. It’s the realm of averages – means, medians, and modes – and is typically the starting point for much of our analysis.
Variation and distribution
Though most populations have a concentration in the middle, typically there’s variation to both sides. Often that variation is uniform and predictable. In fact, the variation in many populations falls into a pattern called the normal distribution, which allows us to reliably estimate the probability of how far a specific observation will be from the middle.
Rank and relative standing
This area provides one of the most straightforward and easy to comprehend approaches to analyzing a population. Once you’ve put all the members of a population in order, you can then divide them into even groups: quartiles for four groups, deciles for 10 groups, percentiles for 100 groups and so on. The distribution of these groups, the boundaries between them, and where the middle falls can reveal some important things with minimal work.
Hypothesis tests and confidence intervals
Once we’ve done our preliminary analysis, it’s time to draw conclusions. Hypothesis tests provide a quantitative basis for showing whether the data supports our claims. And since this is an uncertain business to begin with, we use “confidence intervals” to qualify that our conclusion is reliable within a certain probability, such as 95%.
Among the important patterns in a set of data is correlation, or associations, among different types of data. Some observations, such as height and weight, may have a very strong correlation. This presents opportunities to use the value of certain observations to predict the value of others. Not that one always directly causes the other, but the relationships may still be helpful in making predictions.
These concepts form the foundation for understanding the patterns within a data set that represents a population. In the next Practical Analysis post, we will look at some of the tools that an analyst can use to apply these concepts to confidently draw conclusions.
Read other articles in the Practical Analysis blog series
- Practical Analysis: The Next Chapter - May 21, 2020
- Exploratory Data Analysis Part 2: Helping You Make Better Decisions - October 11, 2019
- Practical Analysis: Understanding Visualization Concepts - September 19, 2019