The DataHero Blog

5 Beginner Steps to Investigating Your Dataset

November 20th, 2013


Data analysis is a nuanced discipline, and there are enough ways to slice and dice data to make a beginner’s head spin. A common question data analysts hear is “where do I even start in my analysis?” These are some hints to help start you on the right path thinking about a methodical way to uncover answers in your data.

1.) Ask the right questions

Whether it’s survey results, sales data, or an email campaign, you’ve collected data for a specific purpose. By extension, apply this purpose to the questions you’re asking of the data itself. Beginning with some specific questions can keep your research focused and allow you to see the forest through the trees. A question like “what does my revenue look like for the past 3 years” is vague and allows for exploration but also confusion. Instead, something like “which channel brings in the most revenue for the past 3 years” has a clearer answer. Subsequent questions may be: “which department brings in the most revenue per year” or “are sales in climbing gear increasing or decreasing this year?”  It’s important to have a specific question in mind when you begin data analysis so as to provide some structure and avoid stumbling into false positives.

 DataHero Channel Revenue by Year

2.) Analyze different subsets of data

It’s easier to spot relationships if you analyze the data from different subsets. For example, segment your revenue data by channel like the chart above, or by department. Experiment with the subsets and variables that make the most sense to the questions you developed in the previous step.  Using DataHero you can easily add or remove different variables or even filters.  This design focuses on allowing you to stay within your train of thought and smoothly transition from question to question, without tripping up on formatting or equations. It can also be helpful to use what would be referred to as a pivot table in Excel. In our outdoor gear retailer example, you can switch from a quarterly view  to revenue by quarter of the year just by selecting it a drop down menu. The graph below then is an aggregate of each quarter’s revenue between 2010 and 2013.

DataHero Quarterly Revenue by Promotion (1)

3.) Explore trends

Experiment with your time variables. Look at quarter, month or week whichever makes sense based on what you’re looking for. Sometimes what is missing is also just as important as what is there. If there are holes in your data analysis, take note. It can be helpful to take notes through your analysis, reminders of what you’d like to research or discuss with colleagues later.

Take a look at this quarterly analysis of revenue by department. It’s not very helpful because it’s hard to spot trends.

 DataHero Quarterly Revenue by Department (1)

This yearly line graph makes it much easier to see that Climbing is the fastest growing department and Running sales have been decreasing for the past three years.

DataHero Yearly Revenue by Department (3)

4.) Find your blind spots

Do you bump up against a particular question regularly?  There is a fine line between collecting as much data as you can to get answers, and frustrating your users with too many questions. Weigh this consideration when deciding how much data you’d like to collect. Then you can either find a way to gather that information from your users, or at least write it on a data collection wish list for later discussion.

5.) Investigate the whys

After your daily, weekly or quarterly analysis, take your charts, notes and conclusions to the rest of the team and start trying to piece together as much as you can. The data can tell you what is happening, but not the why. The why requires piecing together the backstory. Because so many factors play into your sales data, coming together with your team to discuss insights from your data can lead to a lot more understanding. The marketing manager may know something about the third quarter’s climbing gear sales that the business analyst didn’t.

Data analysis is a continual process and the best way to approach it is to try to get less and less wrong. You probably won’t ever have all the data you want or need to answer every question about your business, but you can at least push toward more answers and better decisions. This continual feedback loop ( question, analyze, investigate, repeat) can be improved but will never be perfect. At DataHero we’re constantly evaluating the data we collect, what we need more of, and how to get the answers we really need for our own internal analysis. If you’d like to give your dataset a shot, sign up today and try out DataHero.

DataHero helps you unmask the answers in your data. There’s nothing to download or install. Simply create an account and connect to the data services you use everyday (like Salesforce, Stripe, MailChimp, Dropbox and Box). DataHero automatically decodes your data and shows you the answers you need through dynamic visualizations.

Investigate My Dataset

By Kelli Simpson

Create my Free DataHero Account

Get the fastest, easiest way to understand your data today.