“Dolls? You’re working at a Doll Factory?!” asked my exasperated father. He was somewhat skeptical of statistics graduate school to begin with—after all, “What’s wrong with farming?” I had finished my Master’s Degree at Ohio State and landed my first job as a Quantitative Analyst within Mattel’s American Girl division (my LinkedIn simply says “Mattel”). The year was 2005 and the analytics space was just heating up. I represented a new breed in the corporate environment. A quantitative analyst could query data like engineers, find predictive relationships in data, present results via scientific-looking ggplot output, yet speak with business language. The world was about to change and I was in for a ride. A (Gartner) Hype Cycle ride, to be specific.

In 2005, Mattel’s e-commerce business was rather new. Engineers had logged millions of events that took place on the website but no one had analyzed it. A database connection string, a SAS prompt, and a few “PROC SQL SELECT …” commands later, I was quite sure I had found the modern day equivalent of Sutter’s Mill. For the following years, dozens of analyses addressed long-standing questions from the business. What was the value of marketing dollars invested into Paid Search? What was the probability of an individual buying a doll given they had browsed 10 pages vs. 5? What was the sales impact of a lifestyle magazine sent to households with young kids? Data volumes grew quickly, but expectations grew faster. 

Several cities, companies, and a few gray hairs later...data science, née analytics, has simply exploded. Google Trends suggests a 4x multiple of data science search activity today vs. 2012.  Glassdoor rates data scientist as the best job in America. Data science Master’s programs (or equivalent) are offered at almost every major university despite not existing 15 years ago. As the ultimate indication of legitimacy, Indeed.com even lists career opportunities for “Chief Data Science Officers”! We’ve arrived.  

We can’t let the temptation of pseudoscience “quick wins” overshadow the need to deliver substance.

But with the golden opportunity to bring disciplined, logical, data-driven decision making to complex problems, I’ve also observed a frightening abdication of common sense and analytic principles by some of our number-crunching brethren. I’m not talking about Bonferroni experimental-wide alpha adjustments for multiple comparisons either. I’m talking about leveraging buzzwords and unnecessary complexity to disempower, confuse, and otherwise alienate our colleagues while not providing business value ourselves. Receiving praise for automating machine learning workflows when ignoring “spurious perfect predictors” in our feature set. Building clustering algorithms that yield output, but not holding ourselves accountable to actually drive business value. Or, in a common yet still egregious violation of our most basic statistical tenants, presenting observed correlations as cause and effect. I’ve lost count of the times that well-reasoned, data-less arguments were overruled by “bad science” masquerading as analytical fact. 

There is too much legitimate value in data science for us to let it fade like the promise of overvalued startup post acquisition. We can’t let the temptation of pseudoscience “quick wins” overshadow the need to deliver substance. 

So what do we do?  Admittedly, it feels strange to admonish or advise fellow industry colleagues, many of whom are more experienced, talented, or accomplished than myself.  Hence may I suggest remembering some guidance from others:

Nate Silver reminded us that while “...ice cream sales and forest fires are correlated… you don’t light a patch of Montana brush on fire when you buy a pint of Haagan-Dazs.” Treat correlative analytical discoveries as opportunities to test hypotheses; avoid the temptation to make your PowerPoint victory lap with only rho as your evidence.
Einstein was attributed the quote, “If you can’t explain it simply, you don’t understand it well enough.” We should empower and enhance those around us through data science, not belittle or confuse them.
Ben Franklin reminded us to “never confuse motion with action.” Don’t allow yourself to be satisfied with automation, refactored code, and elegant end-to-end solutions until value has been established. In other words, focus on the outcome, not just the method.

Like many of you, I’ve worked very hard to be considered a fortunate data scientist. We have such power—and potential—to impact almost every facet of human life. From predicting a malaria outbreak and suggesting preventive action, to the incomprehensibly more common application of predicting doll purchase patterns and the like, data science matters. Let’s take advantage of this opportunity and humbly focus on substance despite the temptation of anything less.

Mike Schumacher
Mike Schumacher

Mike Schumacher currently serves as Vice President of Data Science at Oracle. At Oracle, specifically within the Data Cloud division, Mike oversees several groups of data scientists who build and deploy analytical solutions for some of the world's largest companies on massive amounts of data. In total, Mike has 13 years of applied analytics experience, previously holding analytical leadership positions at Conversant (acquired by Alliance Data), Datalogix (acquired by Oracle), and Mattel. Prior to his working career, Mike earned a MS of Applied Statistics at The Ohio State University and an Economics degree from the University of North Dakota.