Follow ysgotlieb on Twitter
by RSS
Issues of the Day Big Data Blues

Amid the storm of protests arising from revelations that the US National Security Agency routinely collects metadata about "people of interest" from online sources, it is important to note that the threat of surveillance is not confined to governments: Big Data, the accumulation of massive amounts of data relating to unknowing individuals has become a hallmark of the network society.


According to the International Business Machine (IBM) Corporation, "Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone." This data "comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data," IBM relates.


Since news of NSA online monitoring was revealed last week by former intelligence staffer Edward Snowden, the ensuing backlash has sent Facebook, Twitter, Google and other online mammoths scurrying to offer clarifications about their compliance with secret federal orders to handover data, according to The Wall Street Journal. These digital giants plead they are subjected to force majeure.

Standard Business Practice


The mammoths' plea is ironic given that these companies routinely monitor their customer's communication and purchasing patterns as standard business practice. A report by CBS news  avers that "for years, Google's computers have scanned the content of millions of Gmails … in order to figure out what ads the users might respond to…It's also recording everything you type on the Google search engine and, if you own a smartphone, Google is probably recording where you are."


CBS notes that Google is not unique in using personal data to fine tune its targeted advertising; it also cites Facebook and as similarly engaged in such practices, which have become a mainstay of the information industry.


Data keeping concerning the affairs of ordinary citizens was formerly the mandate of governments. It was generally confined to conducting censuses, recording marriages and deaths, registering births and soliciting tax-related information. However, in today's digital world big business has arrogated to itself similar functions, though not in the public interest but for commercial purposes and without the consent of its millions of unwitting subjects.


These trends are likely to deepen as the Big Data project becomes widely endorsed and adopted, even as its objectives and validity are called into question. 

Big Data Mania

In the US and elsewhere, government has embraced Big Data. In late March 2012 a  White House press release was issued by the President's Office of Science and Technology Policy and announced the Big Data Research and Development Initiative" with initial funding of $200 million.  The Obama Administration undertakes this enterprise because it believes that "By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most pressing challenges."

In a background paper released by the presidential Office of Science and Technology, it is noted that the Big Data initiative will be applied at the Departments of Defense, Homeland Security, and Energy and Health and Human Services, as well as at the Veterans Administration, the Food and Drug Administration, the National Archives, NASA, the National Endowment for the Humanities, the National Institutes of Health, the National Science Foundation, National Security Agency and the US Geological Survey.


Fraught with Problems

A piece published by the ACLU notes eight major problems with Big Data; these are not restricted to issues relating to individual privacy and civil liberties alone.

The problems with Big Data, the mining of information from huge and growing data sets in order to draw conclusions about behavior, patterns and trends is fraught with potential flaws. As Alex Pentland, the director of MIT's Human Dynamics Laboratory noted in the Harvard Business Review's Blog Forum, Big Data makes generalizations and infers connections based on statistical correlations that are not subject to empirical testing and are prone to errors of interpretation.  


Further, Big Data runs the risk of generalizing relationships characteristic of one point in time to others. Lacking controls for such distortive effects as advertising, market manipulation, political developments and stochastic events on individual and social behavior, descriptions of underlying realities based on Big Data must be regarded with skepticism. One example of the kinds of errors produced by data mining and analysis was reported last February in Nature concerning Google Flu Trends, which significantly overestimated the incidence of influenza in the US last winter.  


Reductionist and Risk-Laden


Microsoft Research, in its book The Fourth Paradigm: Data-Intensive Scientific Discovery avers that e-science, the mining of and analysis of big data represents a groundbreaking shift in the way science works and supersedes empirical testing, modeling, and computational simulation as a knowledge discovery tool. The approach seems dubious on ontological and operational grounds. It is also part of the reductionist approach that has become prevalent in the digital age whereby complex phenomena, including climate and biological systems, social and individual behavior and economies are believed to be fully describable solely in statistical terms.


The extent of government surveillance and business monitoring of individuals is a slippery slope and the Big Data project is laden with problems: ethical, social, philosophical, economic and operational. It not only evokes fears of Big Brother, but demands that we question whether its use is more disruptive than instructive and whether the society it portends is one we want.  As Adam Frank writing on NPR's Cosmos & Culture blog writes, "how we deal with the nascent Big Data issues over the next decade or so may very well define what the next stage of culture looks like for long, long time."   

© Yosef Gotlieb, . All rights reserved