How to Separate Noise from Meaning in Big Data

by | Nov 17, 2017 | General BI

Reading Time: 4 minutes

How to Separate Noise from Meaning in Big Data When we rely on data to make decisions, how do we tell what is a meaningful signal and what is merely noise? Data is neither, in and of itself, as Stephen Few reminds us in his latest book: “Signal: Understanding What Matters in a World of Noise.”

Tweet: How to Separate Noise from Meaning in Big Data

Few has written a series of books about harnessing visualization to aid in analysis. In “Signal,” he takes a broader look at analysis, focusing on the idea of “sensemaking” – that is, deriving meaning from data that can be used to empower decision makers. This is especially relevant for data sets that are large and unfamiliar. Here’s how you can apply some of these techniques to help you understand a set of data and what it might be telling you.


The word “signal” is a metaphor for the patterns and meaning that are hiding in data. In electronics, signals must be separated from noise to be useful.  In the age of big – and ever-growing data – more data means more noise and bigger challenges in isolating the signals.

Go back to basics

Few is one of the experts I introduced earlier in this Practical Analysis series.  In “Signal,” he suggests a somewhat back-to-basics approach, emphasizing techniques that proved effective in the days of smaller data. He argues that these are just as relevant to big data due to their potential to amplify signals. The book covers many of the same techniques I discussed in my posts on tools and concepts, though with a greater emphasis on using visualization to both explore and explain.

Explore the data

One important premise of the book is that you need to do a certain amount of analysis just to understand the data to begin with. This is known as exploratory data analysis, and the author likens it to an explorer getting his bearings in a newly discovered land. “When we survey the land, we begin to understand its norms…. This sense of normality – what’s routine – can then serve as a backdrop against which signals – often departures from the norm – will stand out.” Few writes that he usually begins his statistical journeys by examining variation within important categories. For example, a list of products does not tell us much until we add numbers showing which items generate the most sales revenue.

Celebrate 3S’s

While big data is usually associated with 3Vs – volume, velocity, and variety – Few emphasizes the virtues of 3S’s – small, slow, and sure. He argues that only a small amount of data will ever function as signals and while data now comes in many varieties, only a few of these new choices are sure. Few also urges analysts to work slowly and deliberately. He writes, “We must take our time to understand information and act upon it wisely. Speed will, in most cases, lead to mistakes.”

Statistical Process Control

Few’s book introduces readers to the useful practices of statistical process control (SPC), a bit of a departure from pure statistics. SPC helps analysts separate routine variation from exceptional variation. (These are also known as common cause and special cause variation.) Not all outliers are signals, and sometimes the “noise” in big data can manifest in variation. SPC is an easy-to-apply tool to discern signals within variation over time. Here is an example that shows the results of an initiative to reduce hospital mortality in England.

Stewardship

The book ends with a bit of a pep talk emphasizing the responsibility of analysts as stewards of not just the data, but the truth. Those who have the skills and knowledge to effectively organize, analyze, and interpret data have a great responsibility to seek and defend the truth to ensure the best possible use of data toward better informed decisions. It actually makes you want to be an analyst!

“Signal” covers a lot of possibilities for exploring data sets to discover interesting patterns. And the practical examples help shed light on how you would do these things with your own data. Overall, “Signal” helps give readers a comprehensive understanding of data and practical ways to apply it to understand real-world problems.

Note: This blog is part of my Practical Analysis series, in which I explore three topics integral to understanding information: analysis, interpretation, and communication. You can find my initial post here. Next, we will investigate some of the most compelling ways to present and share your analysis. Stay tuned.

You may also like