Big Data: Different From Small Data
Three factors distinguish big data from the analytics that many executive leaders are familiar with: volume, velocity, and variety. In a recent article that appeared in Harvard Business Review, McAfee and Brynjolfsson¹ make the distinction and open a window on how two companies are harnessing big data to make more accurate predictions, better decisions, and more precise interventions—on an accelerated timetable. To describe the sheer volume of data available today, the authors explain that today, more data cross the Internet each second than were stored anywhere on the Internet in 1992. The retailer Wal-Mart Stores, Inc, for instance, collects more than 2.5 petabytes of customer data every day from its checkout registers. How much information does a petabyte represent? It is equivalent to 20 million filing cabinets of text, the authors explain; multiply that by 1,000 for an exabyte. The authors estimate that 2.5 exabytes of data are created each day. Speed, the second key differentiator, is more important than volume, in many applications. The authors report that a colleague at the Massachusetts Institute of Technology Media Lab used location data from mobile phones to estimate Black Friday sales at Macy’s by inferring how many people were in Macy’s parking lot that day. “Rapid insights like that can provide an obvious competitive advantage to Wall Street analysts and Main Street managers,” the authors write. Variety is the third characteristic that distinguishes big data from traditional analytic activities, including many sources that didn’t exist 10 years ago, such as the messages, updates, and images posted to social networks; readings from sensors; and GPS data from cell phones. Purely through the tools and activities that we engage with today—cell phones, social networks, GPS, and online shopping—each of us is now a walking data generator, the authors point out; because the data are unstructured, traditional structured databases that store much corporate—and health-care—information are unsuited to analyzing big data. Data in Action For skeptics of the notion that having data improves results in business, the authors interviewed executives at 330 public North American companies to determine their organizational and management practices, compared those results with performance data, and found that the most data-driven companies were, on average, 5% more productive and 6% more profitable. Specifically, how are managers using big data to improve performance? In time-sensitive industries such as aviation (and health care, for that matter), improving productivity often turns on finding and eliminating wasted minutes. Historically, the airlines rely on pilots—distracted by the responsibilities of landing an airplane—to provide estimated arrival times. If the plane lands early, pilots and passengers sit on the tarmac, waiting for the ground crew; if it’s late, the ground crew stands idle, waiting for the passengers. PASSUR Aerospace, a provider of decision-support technologies to the aviation industry, is helping airlines eliminate this disconnect by providing more precise estimated arrival times. It collects data from public sources such as weather and flight schedules, as well as proprietary data that include feeds from a network of 155 radar stations that it installed near airports. The company believes that enabling an airline to know exactly when its planes will land results in several million dollars of savings at each airport. Combination brick-and-mortar and online retailer Sears Holdings Corp began an initiative to generate greater value from data collected from sales of Sears, Craftsman, and Lands’ End brands several years ago, and it ran into an obstacle familiar to health care: Data required to make decisions were highly fragmented, housed in many databases and data warehouses maintained by various brands. “Sears required about eight weeks to generate personalized promotions, at which point many of them were no longer optimal for the company,” the writers explain. For Sears, the answer was to borrow techniques from big data: It set up an Apache Hadoop cluster, a group of inexpensive, off-the-shelf servers commanded by an emerging software framework (Hadoop), and it started feeding data from each of its brands—including data from existing data warehouses—into the cluster. The time needed to plan a promotion dropped from eight weeks to one, and the promotions themselves are of higher quality because they are more granular and timely. An added benefit is that data are processed at a fraction of what it would cost using a comparable standard data warehouse. Cultural Changes The managerial challenges of using big data are greater than the technical challenges, the authors believe. One of the most critical is silencing the highest-paid people’s opinions, or HiPPOs. When data were expensive and hard to get, relying on the intuition of upper-level managers made sense, but times have changed. To reinforce a data-driven decision-making culture, managers must begin by asking what the data say, then drilling down to question the source, the types of analyses made, and the confidence in the data. They can also allow themselves to be overruled by the data. “Few things are more powerful for changing a decision-making culture than seeing a senior executive concede when data have disproved a hunch,” the authors write. The role of the domain experts will shift as big-data use advances; the experts’ value lies in their questions, not their answers. In conclusion, the authors have this advice: “In sector after sector, companies that figure out how to combine domain expertise with data science will pull away from their rivals.”