The Way from Data to Information

Data Mining

Subscribe to Data Mining: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Data Mining: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Data Mining Authors: William Schmarzo, Robin Miller, Progress Blog, Rostyslav Demush, Jnan Dash

Related Topics: EMC Journal, Data Mining, Big Data on Ulitzer

Blog Feed Post

Thinking Like a Data Scientist | @CloudExpo [#BigData #IoT #DevOps]

Identify, brainstorm and/or uncover new variables that are better predictors of business performance

Thinking Like a Data Scientist: Part I

One question I frequently get is: "How do I become a data scientist?"  Wow, tough question.  There are several new books that outline the different skills, capabilities and technologies that a data scientist is going to need to learn and eventually master.  I've read several of these books and am impressed with the depth of the content.

Unfortunately, these books spend the vast majority of their time reviewing and/or teaching things such as the data science processes (such as CRISP: Cross Industry Standard Process for Data Mining), and basic and advanced statistics, data mining and data visualization techniques and tools.

Yes, these are very important data science skills, but they are not nearly sufficient to make our data science teams effective.  The data science teams still need help from the business users - or subject matter experts (SME) - to understand the decisions the business is trying to make, the hypotheses that they want to test and the predictions that they need to produce in support of those decisions and hypotheses.  In essence, to improve the overall effectiveness of our data science teams, we need to teach the business users to think like a data scientist.

So the objective of this blog (which if successful, will make its way into my Big Data MBA curriculum for the University of San Francisco School of Management fall semester) is to define a process that helps business users to "think like a data scientist."

I am also going to test this concept and methodology at my session at EMC World, where I am presenting "Expert Guidance To Achieve Big Data Maturity" on Monday, May 4th at 4:30.  So sharpen your pencils and let's begin the exercise!

Thinking Like a Data Scientist Process
The goal of the "thinking like a data scientist" process is to identify, brainstorm and/or uncover new variables that are better predictors of business performance.  But "business performance" of what?  Our key business initiative, of course.

Step 1:  Identify Key Business Initiative.  Would you expect anything different from me than starting with what's important to the business?  So, how can you spot a key business initiative?

A key business initiative is characterized as:

  • Critical to the immediate-term performance of the organization
  • Documented (communicated either internally or publicly)
  • Cross-functional (involves more than one business function)
  • Owned/championed by a senior business executive
  • Has a measurable financial goal
  • Has a well-defined delivery timeframe (9 to 12 months)
  • Undertaken to delivery significant, compelling and/or distinguishable financial or competitive advantage

I am a big stickler about targeting business initiatives that are focused on the next 9 to 12 months.  Anything longer than 12 months can quickly digress into a "Battlestar Gallatica" or "cure world hunger" project that may have incredible business value, but little chance of success.

For a refresher on how to identify an organizations key business initiatives, read my blog "Big Data MBA: Reading the Annual Report for Big Data Opportunities."  That blog outlines how to leverage publicly available information (e.g., annual reports, analyst calls, executive speeches, company blogs, to uncover an organization's key business initiatives.

For purposes of this exercise, I'm going to pretend that our client is Foot Locker, and that our target business initiative is "Improve Merchandising Effectiveness" as highlighted in their annual report (see Figure 1).


Figure 1: Identifying and Understanding Organization's Key Business Initiatives

Step 2:  Identify Strategic Nouns. Strategic nouns are the key business entities that either impact or are impacted by the organization's key business initiative.  These strategic nouns are critical to our data scientist thinking process because these are the entities for which we want to uncover or gain new, actionable insights, and around which we will ultimately build our analytic profiles.  Examples of strategic nouns include customers, patients, students, employees, stores, products, medication, trucks, wind turbines, etc.

For the Foot Locker "Improve Merchandising Effectiveness" business initiative, the strategic nouns upon which we will focus are:

  • Customers
  • Products
  • Campaigns
  • Stores

Step 3:  Brainstorm Strategic Noun Questions. Probably the hardest part of this exercise - and maybe the hardest part of the "thinking like a data scientist" exercise - is to brainstorm the different questions that you want to ask in support of the targeted business initiative.  For this part of the exercise, we want the business users to brainstorm the business questions for each of the "strategic noun" questions from the perspectives of:

  • Descriptive Analytics:  Understanding what happened
  • Predictive Analytics:  Predicting what is likely to happen
  • Prescriptive Analytics:  Recommending what to do next

See Figure 2 for an example of the evolution from Descriptive to Predictive to Prescriptive.

Figure 2:  Evolution of The Analytic Questions

Figure 2: Evolution of The Analytic Questions

In our Foot Locker "Improve Merchandising Effectiveness" example, we want to brainstorm the "Customer" strategic noun questions as such:

Descriptive Analytics (Understanding what happened)

  • What customers are most receptive to what types of merchandising campaigns?
  • What are the characteristics of customers (e.g., age, gender, customer tenure, life stage, favorite sports) who are most responsive to merchandising offers?
  • Are there certain times of year where certain customers are more responsive?

Predictive Analytics (Predicting what will happen)

  • Which customers are most likely to respond to a Back to School event
  • Which customers are most likely to respond to a BOGOF offer?
  • Which customers are most likely to respond to a 50% off in-store markdown?

Prescriptive Analytics (Recommending what to do next)

  • What personalized offers (recommendations) should I deliver to Anne Smith to get her to come into the store?

Part II of "Thinking Like a Data Scientist" blog series will conclude this "thinking like a data scientist" process and hopefully help us uncover new data sources and metrics that may be better predictors of business performance.

Thinking Like a Data Scientist - Part I
Bill Schmarzo

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business”, is responsible for setting the strategy and defining the Big Data service line offerings and capabilities for the EMC Global Services organization. As part of Bill’s CTO charter, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, avid blogger and is a frequent speaker on the use of Big Data and advanced analytics to power organization’s key business initiatives. He also teaches the “Big Data MBA” at the University of San Francisco School of Management.

Bill has nearly three decades of experience in data warehousing, BI and analytics. Bill authored EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the Vice President of Advertiser Analytics at Yahoo and the Vice President of Analytic Applications at Business Objects.