Welcome to the Big Data Series on Viewpoint West Partners, LLC. In this series, we will explore various facets of Big Data including emerging Big Data trends in science, engineering, society and public policy. The material for this abridged series comes from a variety of consultations, private talks and seminars hosted by Viewpoint West Partners, LLC.
Big Data is a term used by marketeers and technology wonks alike to describe the situation when data sets grow large and complex enough to become awkward or even impossible to work with using conventional database management tools. In this sense, the bad news is that Big Data describes a negative scenario that is rooted in the failure of prevailing database technology to deal with the onslaught of massive data volumes being produced by the success of other technologies (mobile devices, social networks, sensor networks, the internet, etc.) that produce said data onslaught. The good news is that Big Data is a pervasive 'problem looking for a solution' rather than the much more common converse that is the signature of many (failed) technology companies.
The emergence of the term, 'Big Data,' is due to the prevalence of the problem of massive data volumes now being generated in nearly every sector of society and business on a daily, hourly and minute-to-minute basis. This situation renders conventional tools impotent in dealing with business dashboards and realtime analytics needed to make the data-driven decisions that CEOs take so much pride in. As such, Big Data technology has now become a competitive weapon in business, as well as a fundamental aspect of doing Big Science and making effective public policy decisions worldwide.
The Data Sphere: Beyond The Big Data Paradigm Shift
For the first time in human history, we are seeing a massive shift in decision making perspective that comes from the significant patterns emerging and coalescing in the 'Data Sphere', a term I use to highlight the data ocean that we all swim in. In a quantum sense, each of us continually adds to the Data Sphere, as we leave data wakes that are time-series gold to be mined by those properly equipped and properly authorized (hopefully) or not. It has become increasingly difficult, if not impossible, to not be a human data source if one participates in any aspect of modern society. And in many cases, we are involuntarily participating in decision processes and analytics that in a very real sense will determine our future reality, especially in terms of privacy, economic opportunity, health care, and security. We will explore this Big Data reality in detail as we progress through this series.
Effectively handling Big Data is a three-fold problem that is more than merely a data size problem: Big Data involves data volume (the size of the dataset), data velocity (the incoming data rate), and data variety (the myriad data sources, both structured - Call Detail Records - and unstructured - Images, Audio, Video). Taken together, this triad defines the crux of the problem that most conventional database tools and techniques encounter when dealing with the brave new world of Big Data.
So how big is Big Data? Big Data sizing is typically in terms of petabytes, exabytes and zettabytes. For the purpose of this series, let's focus on the smallest size, the petabyte. One petabyte is one million gigabytes or one thousand terabytes.
Again, it's important to note not just the size, but also the data rate (volume) and the variety of the data sources involved in Big Data. Here are some Big Data benchmarks to consider:
- The world's effective capacity to exchange information through two-way telecommunication networks was 281 petabytes of (optimally compressed) information in 1986, 471 petabytes in 1993, 2,200 petabytes in 2000, and 65,000 (optimally compressed) petabytes in 2007 (this is the informational equivalent of every person in the world exchanging 6 newspapers per day).
- AT&T transfers about 19 petabytes of data through their networks each day. Google processes about 24 petabytes of data per day. The BBC's iPlayer is reported to use 7 petabytes of bandwidth each month. Imgur transfers about 3 petabytes every month. The Internet Archive contains about 5.8 petabytes of data as of December 2010, growing at the rate of about 100 terabytes per month in March 2009.
- The World of Warcraft uses 1.3 petabytes of storage to maintain its game. Steam, a digital gaming service developed by Valve, delivers over 30 petabytes of content monthly. The 2009 movie Avatar took over 1 petabyte of local storage at Weta Digital for the rendering of the 3D CGI effects.
- The experiments in the Large Hadron Collider produce about 15 petabytes of data per year, which will be distributed over the LHC Computing Grid. It is estimated that the human brain's ability to store memories is equivalent to about 2.5 petabytes of binary data. The German Climate Computing Centre (DKRZ) has a storage capacity of 60 petabytes of climate data.
- The Teradata Database 12 has a capacity of 50 petabytes of compressed data. In August 2011, IBM built the largest storage array ever, with a capacity of 120 petabytes. And in January 2012, Cray began construction of the Blue Waters Supercomputer, which will have a capacity of 500 petabytes that will make it the largest storage array ever, once built.
One can plainly see that we are squarely in the 'Petabyte Age' of Big Data, rapidly moving to exabyte scale. And note that an exabyte is a billion gigabytes or one million terabytes.
Measuring The Noosphere Via The 'Data Sphere'
At this point in human history, we are generating more data from more sources faster than ever before, at an accelerating rate that is constantly challenging our technological and societal capacities to deal with the ramifications of it all. Why is all this happening?
Pierre Teilhard de Chardin SJ was a French philosopher and Jesuit priest who trained as a paleontologist and geologist and took part in the discovery of Peking Man. Teilhard conceived the idea of the Omega Point and developed Vladimir Vernadsky's concept of Noosphere. Teilhard set forth a sweeping account of the unfolding of the cosmos and abandoned traditional interpretations of creation in the Book of Genesis in favor of the idea of the Universe as a "Living Host".
For Teilhard, the Noosphere emerges through and is constituted by the interaction of human minds. The Noosphere has grown in step with the organization of the human mass in relation to itself as it populates the earth. As mankind organizes itself in more complex social networks, the higher the Noosphere will grow in awareness. This concept is an extension of Teilhard's Law of Complexity/Consciousness, the law describing the nature of evolution in the universe. Teilhard argued the Noosphere is growing towards an even greater integration and unification, culminating in the Omega Point, which he saw as the goal of human history: an apex of thought/consciousness. In this context, Big Data is actually the Data Sphere which models the Noosphere in much the same manner as the footprints in the sand model the beachcombers.
Next Stop: The Omega Point
The Omega Point is a maximum level of complexity and consciousness towards which Teilhard believed the universe was evolving. The universe is constantly developing towards higher levels of material complexity and consciousness, a theory of evolution that Teilhard called the Law of Complexity/Consciousness. For Teilhard, the universe can only move in the direction of more complexity and consciousness if it is being drawn by a Supreme Point of complexity and consciousness.
Thus Teilhard postulates the Omega Point as this supreme point of complexity and consciousness, which in his view is the actual cause for the universe to grow in complexity and consciousness. The Omega Point exists as supremely complex and conscious, transcendent and independent of the evolving universe. Teilhard argued that the Omega Point resembles the Christian Logos, namely Christ, who draws all things into Himself, who in the words of the Nicene Creed, is "God from God", "Light from Light", "True God from true God," and "through Him all things were made." Does the Big Data milieu of today actually foreshadow the approaching Omega Point of Teilhard?
The Prediction Wall: Singularity On Board
Transhumanists argue that accelerating technological progress inherent in the Law of Accelerating Returns will lead to what Vernor Vinge called a technological singularity or "prediction wall." We will soon enter a time in which we must make the transition to a "runaway positive feedback loop" of very high-level autonomous machine computation. Our technological and computational tools eventually completely surpass human capacities. Some transhumanist writings refer to this moment as the Omega Point, paying homage to Teilhard's prior use of the term. Other transhumanists, in particular Ray Kurzweil, refer to the technological singularity as simply "The Singularity".
We can plainly see that as Big Data gets even bigger, our computational tools will, by absolute necessity, become far more sophisticated in terms of pattern recognition, machine learning, and most importantly, complete automation. Already we have High Frequency Trading (HFT) computers that can cause enormous market swings (at the drop of an errant button press) that are beyond the capacity of human traders to fully understand and control. This is just a mere prelude to a computational reality that would make even the best science fiction writers give up their craft and tend sheep.
Big Data And The Seldon Plan
Science Fiction always predicts the future, for better or worse. At the very least, it supplies us with cautionary tales that become important parts of the zeitgeist. In this tradition, we have some very prescient material to work with as we ponder the Big Data age.
The Foundation Series is a science fiction series by Isaac Asimov. The premise of the series is that mathematician Hari Seldon spent his life developing a branch of mathematics known as psychohistory, a sort of mathematical sociology. Using the laws of mass action, psychohistory can predict the future, but only on a very large scale; it is error-prone on a small scale. Psychohistory works on the principle that the behavior of a mass of people is predictable if the quantity of this mass is very large (equal to the population of the galaxy, which has a population of quadrillions of humans, inhabiting millions of star systems). The larger the number, the more predictable the future.
Using psychohistory techniques, Seldon foresees the imminent fall of the Galactic Empire, which encompasses the entire Milky Way, and a dark age lasting thirty thousand years before a second great empire arises. Seldon's psychohistory also foresees an alternative where the intermittent period will last only one thousand years. To ensure his vision of a second great Empire comes to fruition, Seldon creates two Foundations—small, secluded havens of all human knowledge—at "opposite ends of the galaxy".
The focus of the series is on the First Foundation and its attempts to overcome various obstacles during the formation and installation of the Second Empire, all the while being silently guided by the unknown specifics of The Seldon Plan.
Another fundamental assumption of psychohistory is that the population itself is unaware of the existence of psychohistory. We shall come back to this fascinating twist later in the Big Data Series when we discuss individual privacy and anonymity.
That's A Wrap
From petabytes, exabytes, zettabytes and the Data Sphere, to the Noosphere, the Omega Point, Psychohistory and the Singularity, Big Data is much more significant than a mere marketing term on a glossy brochure (or PDF file). The rate of Big Data acceleration is itself accelerating, and we are all on a runaway technology ride that is already changing our everyday reality. What will come of it?
Next in the Big Data Series, will we explore Reality Mining, Honest Signals, the Quantified Self, and much much more. Please Stay Tuned.