The digital age has fostered constant innovation in the realms of marketing and business decision making, but arguably, no single development has been as influential as the adoption of analytical practices known as data science. This unmatched approach to understanding global markets could not be possible without the combination of the seasoned and trusted art of statistics and the ever-booming practice of computer science. Both of these disciplines have deep roots in scientific ideology and, consequently, data science does too. This is the ideology used when entrusting our doctors, engineers, and scientists. It promises accuracy, reliability, and insightful conclusion, all supported by the scientific method. However, when the use of data science began to emerge in financial firms, media, and retail companies, the promises needed to include profit. Widespread adoption happened quickly after executives were sold on the idea that data storage rooms—containing product history, customer information and transaction data—can turn into consumable bites of information that promote prudent, lucrative decision-making. With a fantasy that it is a cure-all money-printing machine, these executives often find themselves disappointed with the initial products data scientists produce, leading to skepticism about investments in their department. Although data science practices can lead to immediate positive results for a company’s bottom line, it often requires patience, perseverance and respect for the integrity of the science itself before the analytical methods at work truly begin to pay dividends.
The surfacing of data science practices in business is due, first of course, to the dramatic development in computational tools, but also in part to effective marketing—give an old idea a new look and sell it as revolutionary. That is not to say that data science is some sort of scheme or a bubble, referring to the latter. Instead, it is a beneficial rebranding of science. Historically stereotyped to be an old man, isolated in a cluttered laboratory surrounded by notebooks stacked ceiling-high, often remembered as your least favorite subject from school, or thought to be just for the Big Bang Theory-type to derive enjoyment. But data science is thought to be new, powerful, intelligent and something to show-off. The mistake is buying into these stereotypes and thinking these generalizations are referring to two different lines of work. Science is a constantly refining practice in which one studies the structure and behavior of the world around them. The scientific method is the term we use to refer to the reliable process which results in just that. In its simplest form, it goes build, test, review and rebuild. This is a process that never stops; it builds on its previous forms over and over again, and data science is no different. In this sense, it is hardly an up-and-coming industry today. The only difference is the word “data”, and a new class of the population making use of it. Scientists have used this evidence-based approach to build models that define the surrounding environment since ancient times. Whether a scientist is uncovering the mysteries of the outer universe or discovering profitable business insights from transaction data, the scientific process of investigation is the same.
“In its simplest form, it goes build, test, review and rebuild.”
In 2014, Amazon—a company known for hosting prominent data scientists and being a pioneer in bringing data science to the forefront—began experimenting with an AI tool that evaluates and sorts incoming resumes to evaluate applicants for open positions within the company. This project exemplifies the cyclical nature of the work data scientists do and how closely they follow the scientific method. Resumes contain organized text data about potential employees, making this task one perfectly fit for a data scientist. Specifically, they contain personal information, education, experiences and skills of an applicant and a well-equipped data scientist at Amazon was able to turn a massive volume of resumes they receive on an hourly basis into a clean, well-formatted spreadsheet with information on thousands of applicants. This data scientist then looked at the history of the open position. For example, what education, experiences and skills did previously successful employees have? An algorithm was then made to apply filters to the applicant resume spreadsheet and voilà, a condensed list of perfectly qualified candidates. Historically, hiring new applicants with these qualifications is a sound method of selecting new hires. Intuitively, this makes perfect sense. In practice, however, the algorithm was initially flawed.
In the years preceding this project, most of the hires for technical and coding positions were predominantly filled by men. So, with the instruction to select candidates that matched previous hires, the selection algorithm curtailed resumes that included any reference to women, including the word “woman” in phrases like, for example, “volunteered with Women Who Code.” This revelation might convince people into believing AI has an inherent nature to influence the world for the worse and cannot be controlled—as this story is commonly spun—however, these views lack scientific rigor. This story is really one of a successful iteration of the scientific method. Amazon tried to define what the best applicant’s resume contains—call it a hypothesis—and implemented their model in a real-life experiment. They found their model was wrong, for it was excluding half the world’s workforce because their training data was imbalanced. In other words, they conducted tests, reviewed the results of the tests, and then used that information to rebuild their model. It should be noted, Amazon states they never used this algorithm in their legitimate hiring practices. Still, the outcome of this experiment is further evidence that data science is a science, not a cure-all, instantaneous problem-solver. It shows us that data science is not a holy grail, but instead, a painstaking process that requires continuous refinement, just like any other field of science.
In order for the benefits of the scientific process to be fully realized economically, it is imperative that new data science departments truly adhere to their stated goals of being “data-driven” and “evidence-based”. To see why, it is important to recognize what an organization’s decision-making protocol is prior to the adoption of scientific thinking. An example from 2011 illustrates the flaws in a conventional business hierarchy. When Ron Johnson was appointed to be a turn-around CEO for JC Penney, the aging American retailer was experiencing hard times. With the rapid growth in online commerce, the golden age of massive department store brands was facing a mortal challenge. Prior to his tenure at JC Penney, Johnson earned himself the reputation of retail legend. He was instrumental in resurrecting of Target’s brand image with the launch of the Michael Graves line of consumer products. Further, he was one of the idea-men behind the opening of Apple Stores—perhaps the biggest retail success story of the 2000s. It was assumed that given his experience, previous successes, connections and insights, he was qualified to make the hard decisions that would drive JC Penney out of the sales ditch they were falling into. However, Johnson also had a reputation for refusing rebranding experimentation, ignoring market research results, and having an apathetic opinion towards the company’s oldest and most loyal customers. These refusals to recognize and implement available data science tools led Johnson to make a series of near-catastrophic decisions for the company. He was ousted 17 months later. Johnson’s demise demonstrates, conventionally, how businesses have relied on direction from senior executives and worked under the assumption that their experience directly translates to their ability to make profitable business decisions. History has proven this assumption to be wrong in numerous cases. But even if that were true, it is unrealistic to believe that these executives always make decisions in the absence of selfishness or political influences. Scientific processes are what allow us to confidently rely on doctors to prescribe us safe and effective medicine and engineers to build steady bridges and skyscrapers. Likewise, these processes can improve confidence and accountability in business decision-making. Data science is a popular and effective way to implement those processes if done correctly. Simply stating that a company is “data-driven” or “evidence-based” will produce no positive effect on the bottom line. Ignoring results of quantitative research—instead relying on intuition—nullifies the positive potential of data science and can even do more harm than good.
“[Scientific processes] can improve confidence and accountability in business decision-making.”
Still, the thought of conducting science in a business context might sound like a contradiction. Trends in business are always changing. Demand is uncertain, financial markets are unpredictable, and science is just that; a science. However, this neglects an important quality of science; it is constantly changing. Scientists will always question if the current model of their environment is the best possible understanding. And so, a new investigation will begin, no matter whether the context is in business or not. Intuit, a company known for its ubiquitous accounting software for small businesses, exemplifies a brilliant dedication to the order of scientific thinking. For example, to probe potential consumers for interest in a new online marketplace, similar to Craigslist, which would allow small businesses to connect with potential customers under the halo of being “Intuit-certified,” company data scientists implemented a low-cost internet ad campaign. When people clicked on the ads for a service that did not yet exist, they were taken to a web page thanking them for taking part in a study. They hypothesized the campaign would generate a 10% click-through-rate. Instead, the results were double that. The test showed its capacity as a worthwhile idea. More importantly, though, they had evidence to affirm their hypothesis. Following this test, the team responsible for the idea further developed its business model, testing all their assumptions along the way. The culture in this company is one focused on continual learning—the main driver behind scientific thinking. In order for companies to make the best use of data science, they must similarly adapt their culture to be better suited to accept the value data science has to offer.
“The culture in this company is one focused on continual learning—the main driver behind scientific thinking”
In summary, data science does have the potential to solve great problems, and the scope of that potential will only grow with further development of the technology. Undoubtedly, though, it will fall short of what popular culture often makes it out to be if it is not properly implemented. The scientific method is not a process that easily meets deadlines or fulfills value quotas. Unfortunately, in highly competitive markets, these objectives are what drive expectations. The pressure sometimes triggers undesirable results from executives and data scientists alike, who may jump to conclusions based on poorly supported insights, or may ignore them altogether if they do not match their intuition. The slow, thoughtful, and often mild tweaks between each of the many cycles are what guarantee the reliability that science has brought to the world for centuries and has led the way for nearly all the technological progress of human-kind. As the adoption of data science continues, it is crucial to recognize exactly what characteristics make it such a valuable tool. The scientific method is powerful. Its order has reigned quietly for centuries, and it will continue to do so. Only those who respect its integrity are properly equipped to last.