Practice making actionable insights by replicating the insights in these 3 data sources
If you are learning an analytics tool such as SQL, Python or R, you might ask yourself how to apply your knowledge on real business use cases. You might’ve also come across datasets (for which there are many, sources), but wondered what do with them.
Generating interesting questions is indeed hard - data scientist, analysts and programmers know the solutions, but they often don’t know which problems need solving! Once you spend a year or two in a single industry (such as healthcare, ecommerce, mining) you might get a feel for which questions are interesting.
Your aim should be to learn how to ask good questions. To practice how to make good insights, I recommend replicating insights that other people have made.
To replicate an insight, you want to have both the story and the data. When you read the story, ask yourself questions like:
Always try checking the conclusion first. Checking a conclusion is harder than checking a graph or a single number - because you need to think about which number supports the conclusion
Any story that is supported by data would work. Here are three recommendations on where the data source is (usually) available.
FiveThirtyEight has stories about politics and sports and sometimes they publish data that backs up those stories. Pick any story, download the data, and start checking the insights.
For example, in this story about non-voters, the heading of a graph is ‘Those who almost always vote and those who sometimes vote aren’t that different’. Before checking the graph, think about how you would answer this question from the data. Try plotting some answers that would describe the different voter groups. And after you have tried that, check the graphs they have made, and try recreating those same graphs from the data.
Similar to FiveThirty Eight, BuzzFeedNews has mostly political and sport content.
In this story about swimmers, while reading you will find the sentence ‘And how near is she [Ledecky] to closing the gender gap in athletic performance?’. Try checking that statement by poking the available data. The real world won’t be much different - a bunch of datasets that might or might not contain an answer to somebody’s question. This technique allows us to answer questions that some journalists found interesting!
The dataisbeautiful subreddit surfaces graphs that people find beautiful and insightful.
In this post about median income, a comment mentions that the variability in income between race is explained mostly by education. How would you check that from the data?
Many posts don’t have data as nicely prepared as BuzzFeedNews and FiveThirtyEight, but don’t let that stop you. Digging out data and making it ready for analysis is a common thing.
As a bonus, also take a look at r/dataisugly, and think about how the ugly graphs could be improved.
Lastly, whenever you encounter conclusions that are supported by data, try to check those conclusions yourself. You might be surprised at the conclusions that you arrive to by taking a more detailed look.