Search

Follow William Luciw on Quora

 

Tuesday
May292012

The Big Data Series: Reality Mining, Honest Signals, and the Quantified Self

In the previous episode of the Big Data Series, we defined the somewhat elusive term, Big Data, and gave some perspective on the emergence and nature of Big Data, its dimensions and salient characteristics, and its cosmological and galactic relevance. Additionally, we explored the notion of the ubiquitous Data Sphere, its emergence as part of our modern technological existence and how it was adumbrated by both a priest and a science fiction writer.

Now we will look inside the content of some Big Data efforts to explore their potential economic and social value. Since it is all too easy to raise warning flags regarding abuses of Big Data, we will save that discussion for a later episode of the Big Data Series. Instead we will focus on how Big Data is reaching out and touching each one of us for the good of all, presumably.

Specifically, we will now explore Reality Mining, Honest Signals, and the Quantified Self in this installment of the Big Data Series. Although one might consider these topics as distinct from Big Data per se, it will hopefully become clear that the nature and scale of Big Data fundamentally transforms each of these concerns in terms of their impact and intrinsic contribution to the ubiquitous Data Sphere that we all live in.


Reality Mining: The Social Spelunkers

MIT professor Sandy Pentland says Reality Mining "is all about paying attention to patterns in life and using that information to help with things like setting privacy patterns, sharing things with people, notifying people -- basically, to help you live your life."

Pentland's Reality Mining is now a common occurrence, thanks to the proliferation sophisticated Smartphones. These handheld devices now have the processing power of low-end desktop computers and collect varied data, thanks to devices such as GPS chips that track location. Researchers such as Pentland are getting better at making sense of all that information, detecting underlying patterns that can reveal much more information than the data alone may suggest. For example, with the aid of some statistical algorithms, that information can identify things to do or new people to meet. It can also make devices easier to use -- by automatically determining security settings, for example. Smartphone data can also shed light on workplace dynamics and on the well-being of communities. It can even help project the course of disease outbreaks and provide clues about individuals' health.

To create an accurate model of a person's social network, Pentland's team combines a phone's call logs with information about its proximity to other people's devices, which is continuously collected by Bluetooth sensors. With the help of factor analysis, a statistical technique commonly used in the social sciences to explain correlations among multiple variables, the team identifies patterns in the data and translates them into maps of social relationships. Such maps are used to accurately categorize the people in your address book as friends, family members, acquaintances, or coworkers. In turn, this information is used to automatically establish privacy settings -- for instance, allowing only your family to view your schedule. With location data added in, the smartphone can predict when you will be near someone in your network.

Reality Mining defines the collection of machine-sensed environmental data pertaining to human social behavior. This new paradigm of data mining makes possible the modeling of conversation context, proximity sensing, and temporospatial location throughout large communities of individuals. Mobile phones (and other innocuous devices) are used for data collection, opening social network analysis to brave new methods of empirical stochastic modeling. These datasets are enormous and Big Data techniques are fundamental to the Reality Mining effort.

Some fundamental social questions that Reality Mining can answer:

  • How do social networks evolve over time?

  • How entropic (predictable) are most people's social lives?

  • How does social information flow?

  • Can the topology of a social network be inferred from only proximity data?

  • How can we change a group's social interactions to promote better functioning?


All Roads Lead To Big Data

Reality Mining can help city planners unravel traffic snarls and public health officials track and prevent the spread of illnesses, such as severe acute respiratory syndrome, or SARS. "There is so much societal good that can come from this," says Pentland. "Suddenly we have the ability to know what is happening with the mass of humanity and adapt society to accommodate the trends we can detect, and make society work better." Hari Seldon himself could not have said it any better.

Inrix tracks some 750,000 vehicles traversing 55,000 miles of roadway in 129 cities to gather real-time traffic congestion data that is then used in a variety of ways, such as providing live traffic information to devices made by Garmin (GRMN) and TomTom. Over time, all that data shows useful patterns. "We can build a model for major sporting events that shows what happens if you build the stadium in one place or another," Inrix CEO Mistele says. "We've found that in most cities the biggest determining factor in traffic is school schedules. In other cases, like Washington, D.C., the legislative calendar is very important. We can correlate our data to practically any other variable."

Smartphones can be useful in gathering health-related information, says Alex Kass, a researcher at Accenture. "It's one of the application areas that focus well both on the individual and on large groups," he says. "Researchers can use data on a sample population over a given period—say, a week or a month—and then assume some of them are sick, to provide a more accurate picture of how widely an illness could spread," Kass says. "Information on a particular individual or group could help build more accurate models to predict how an illness spreads from one person to another. People could also use the data to keep better tabs on themselves," Kass says. (More on the Quantified Self later in this episode.)

So as a scientific field of inquiry, Reality Mining is a solid step toward a mathematical sociology that resonates with Hari Seldon's psychohistory, as described in the previous episode of this series. Big Data makes Reality Mining much more powerful in its reach and ultimate omniscience. The Big Data influence is so transformative that some believe that this "Petabyte Age" in which we live in, the Data Sphere, is all you need and that scientific theories, including those used in Reality Mining, be damned. Huh?


Theory?  What Theory?

Chris Anderson is the editor in chief of Wired magazine. He believes that Big Data's "Data Deluge" makes the scientific method obsolete. Spoken just like a disgruntled former physicist.

The scientific method is built around testable hypotheses. The hypothesized models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years. Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms, i.e. the model, that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise. Big Data without models is an overwhelming cacophony. But according to Anderson, faced with massive data, this approach to science -- hypothesize, model, test -- is becoming obsolete.

Anderson advises to forget taxonomy, ontology, and psychology. "Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves."

Anderson suggests a better way. He suggests the petabytes themselves allow us to say: "Correlation is enough." He says we can stop looking for models and can analyze the data without hypotheses about what it might show. Anderson says we can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot. Whaaa?!?

Unfortunately, Anderson neglects to mention that the underlying "statistical algorithms" used to sift through the petabytes are not exactly "model-free estimators". In other words, many statistical models are formalizations of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but stochastically related. So Mr. Anderson, "Thar Be Models" in many of the statistical algorithms sifting through the petabytes of collected data. And a model is just a theory, after all. D'oh!


Honest Signaling

Statistical models of intention, goals and values form the basis of analysis and prediction in Honest Signaling, the patterns of how we socially interact. Researchers say they can get a more accurate picture of what people do, where they go, and with whom they communicate from a device they carry than from more subjective sources, including what people say about themselves. In short, people lie -- smartphones don't. Or so the thinking goes.

But how can you know when someone is bluffing? Paying attention? Genuinely interested? The answer, according to Sandy Pentland in Honest Signals, is that the subtle patterns of how we interact with other people reveal our attitudes toward them. These unconscious social signals are not just a back channel or a complement to our conscious language; they form a distinct communication network. Biologically based “honest signaling” evolved from ancient animal signaling mechanisms, offers an unmatched window into our intentions, goals, and values. If we understand this ancient animal channel of communication, Pentland claims, we can accurately predict the outcomes of situations ranging from job interviews to first dates.

Pentland monitored and analyzed the back-and-forth patterns of signaling among groups of people and found that this second channel of communications, revolving not around words but around social relations, profoundly influences the major decisions of our lives -- even though we remain largely unaware of it. Pentland shows how by “reading” our social networks we can become more successful at pitching an idea, getting a job, or closing a deal. Using this “network intelligence” theory of social signaling, Pentland describes how we can harness the intelligence of our social network to become better managers, workers, and communicators.

The key component of Pentland's methodology is that observational data collected from social groups can be "normalized" along the lines of Honest Signaling in order to support or discount the observational data with patterns gleaned from this second channel of communication. This makes the collection of massive amounts of data more robust and "noise-free" in its predictive power and reinforces any social analysis with an almost uncanny accuracy that transcends individual actions per se and the apparent volition of any sole individual. This is how Big Data can become Big Brother in a most insidious manner. Our animal biology can betray us. As Curly has prophetically stated on many occasions, "I'm a victim of circumstance!"


Are You Receiving Me?  Cues and Signals

In biology, signals are traits, including structures and behaviors, that have evolved specifically because they change the behavior of receivers in ways that benefit the signaller. Traits or actions that benefit the receiver exclusively are called cues. When an alert bird deliberately gives a warning call to a stalking predator and the predator gives up the hunt, the sound is a signal. When a foraging bird inadvertently makes a rustling sound in the leaves that attracts predators and increases the risk of predation, the sound is a 'cue'.

Signaling systems are shaped by the extent to which signallers and receivers have mutual interests. An alert bird warning off a stalking predator is communicating something useful to the predator: that it has been detected by the prey; it might as well quit wasting its time stalking this alerted prey, which it is unlikely to catch. When the predator gives up, the signaller can get back to other important tasks. Once the stalking predator is detected, the signalling prey and receiving predator have a mutual interest terminating the hunt. Within species, mutual interests generally increase with kinship. Kinship is central to models of signalling between relatives, for instance when broods of nestling birds beg and compete for food from their parents.

Biologists use the phrase ”honest signals” in a statistical sense. Biological signals, like warning calls or resplendent tail feathers, are considered honest if they are correlated with, or reliably predict, something useful to the receiver. In this usage, honesty is a useful correlation between the signal trait (which economists call ”public information” because it is readily apparent) and the unobservable thing of value to the receiver (which economists refer to as “private information” and biologists often refer to as “quality”). Honest biological signals do not need to be perfectly informative, reducing uncertainty to zero; they only need to be honest “on average” to be potentially useful.

Honest Signaling in animal communication has direct correlates to human communication, especially in social groups. To the extent that human signals are in fact honest, group behavior not only becomes coordinated but also predictable, and can reliably define social affinity groups. These affinity groups can reliably predict mass behavior in ways that social researchers exploit, especially using volumes of Big Data. But what happens when there is intentional deception by bad actors such as fraudulent behavior?


Dishonest Signals

Because there are both mutual and conflicting interests in most animal signalling systems, the fundamental problem in evolutionary signalling games is dishonesty or cheating. Why don’t foraging birds just give warning calls all the time, at random (false alarms), just in case a predator is nearby? If peacocks with bigger tails are preferred by peahens, why don’t all peacocks display big tails? Too much cheating would disrupt the correlation at the foundation of the system, causing it to collapse. Receivers should ignore the signals if they are not useful to them and signallers shouldn’t invest in costly signals if they won’t alter the behavior of receivers in ways that benefit the signaller. What prevents cheating from destabilizing signalling systems? It might be apparent that the costs of displaying signals must be an important part of the answer. However, understanding how costs can stabilize an “honest” correlation between the public signal trait and the private signalled quality has turned out to be a long, interesting process. If many animals in a group send too many dishonest signals, then their entire signalling system will collapse, leading to much poorer fitness of the group as a whole. Every dishonest signal weakens the integrity of the signalling system, and thus weakens the fitness of the group.

An example of dishonest signalling comes from Fiddler crabs, which have been shown to bluff in regards to their fighting ability. Upon regrowing a lost claw, a crab will occasionally regrow a weaker but larger claw that intimidates crabs with smaller but stronger claws. The proportion of dishonest signals is low enough that it is not worthwhile for crabs to test the honesty of such signals, as combat can be dangerous and expensive.

So the key observation gleaned from animal signaling systems that can be directly applied to human social networks is that every dishonest signal directly weakens both the integrity of the signalling system and the overall effectiveness of the group. Bad actors can thus be detected and partitioned using these social criteria to model dishonest behavior and preserve the overall integrity of the data set itself, however large. Big Data actually makes it even harder for bad actors to hide in a statistical sense since outliers are in fact liars, from the social group perspective.


The Quantified Self

The Quantified Self is a social movement to incorporate technical data acquisition on aspects of a person's daily life in terms of inputs (e.g. food consumed, quality of surrounding air), states (e.g. mood, arousal, blood oxygen levels), and performance (mental and physical). The movement was started by Wired Magazine editors Gary Wolf and Kevin Kelly in 2007, as "a collaboration of users and tool makers who shared an interest in self knowledge through self-tracking". The primary methodology of self-quantification is data collection, followed by visualization, cross-referencing and the discovery of correlations. “Almost everything we do generates data,” says Mr Wolf. At the moment, he says, "data from phones, computers and credit cards are mostly used by companies to target advertising, recommend products or spot fraud. But tapping into the stream of data they generate can give people new ways to deal with medical problems or improve their quality of life in other ways." Mr Wolf draws an analogy with the Homebrew Computer Club, which met in Silicon Valley in the 1970s and went from a hobbyists’ group to the basis of a new industry. “We were inspired by our knowledge of this history of personal computing,” he says. “We asked ourselves what would happen if we convened advanced users of self-tracking technologies to see what we could learn from each other.”

With the exceptions of people who are trying to lose weight or improve their fitness, most people do not routinely record or otherwise measure their moods, sleeping patterns or activity levels, track how much alcohol or caffeine they drink or chart how often they walk the dog or have a bowel movement. But some people are doing these things and they are an eclectic mix of early adopters, fitness freaks, technology evangelists, personal-development junkies, hackers and patients suffering from a wide variety of health problems. They share a belief that gathering and analyzing data about their everyday activities can help them improve their lives — an approach known as “self-tracking”, “body hacking” or “self-quantifying”. New technologies make it simpler than ever to gather and analyze personal data. Sensors have shrunk and become cheaper. Accelerometers, which measure changes in direction and speed, are now cheap and small enough to be routinely included in smartphones. All this makes it much easier to take the quantitative methods used in science and business and apply them to the personal Data Sphere.


Quantified Self Business Opportunities

As populations age and health care costs increase, there will be a greater emphasis on monitoring, prevention and maintaining “wellness” in future, with patients taking a more active self-monitoring role. This approach sometimes called “Health 2.0” where the devotees of self-tracking could end up as pioneers of this health care model. Quantified Self Silicon Valley start-ups are launching new devices and software aimed at self-trackers. The future of health care places a greater emphasis on self-monitoring using a variety of sensors and mobile devices to reduce medical costs by preventing disease and prolonging lives.

Tens of thousands of patients around the world are already sharing information about symptoms and treatments for hundreds of conditions on websites such as PatientsLikeMe and CureTogether. This has yielded valuable results, such as the finding that patients who suffered from vertigo during migraines were four times more likely to have painful side effects when using a particular migraine drug. The growing number of self-tracking devices now reaching the market will increase the scope for large-scale data collection, i.e. Big Data, enabling users to analyze their own readings and compare and aggregate them with those of other people.

The thumb-sized Fitbit clips onto a belt and uses an accelerometer and altimeter to measure activity levels and sleep patterns. A readout shows steps walked, stairs climbed and calories burned. Information is also uploaded wirelessly to a website that analyzes and displays the data and lets users compare notes with their friends. Jawbone has released the Up, a wristband that communicates with an iPhone and can also measure physical activity and sleep patterns. Basis is about to launch a wristwatch-like device capable of measuring heart rate, skin conductance (related to stress levels) and sleep patterns, all of which can then be displayed on a “health dashboard”.

GreenGoose offers tiny motion sensors that can be attached to everyday items, sending a wireless signal to a base-station whenever the item is used. A sensor can be attached to a toothbrush, or a watering can, or the collar of a dog, or a toilet seat making it possible to measure and track how often you brush your teeth, water your plants, walk your dog, or use the toilet. The company’s aim is to establish a platform for the gamification of everyday activities.

Large technology companies are also keeping an eye on self-tracking technology. Philips, Vodafone and Intel all regard health-tracking as a promising area for future growth. Philips has launched Vital Signs, an experimental app for Apple devices that uses the built-in camera to measure the user’s heart rate and breathing rate, and chart them over time. Intel has developed an app called Mobile Therapy that pops up randomly and asks users to record their mood, to see how it varies during the week.

The examples listed represent millions of dollars already invested in various attempts to monetize the Quantified Self. The expectations are that "Health 2.0" will ultimately leverage the billions of dollars being spent on health care now and in the near future as well as inform policy makers with detailed, massive amounts of health-related data and predictive models of uncanny accuracy. This could be a boon to future highly effective health care programs or a bust in the abuse of data privacy and trust by private parties trying to gain an unfair economic advantage. Whatever the outcome, Big Data will no doubt play a major role.

The one thing all of these Quantified Self startups have in common is the highly personal data they are all collecting, at a massive Big Data scale. This very personal data also comes with significant hazards. Fitbit inadvertently published the sexual activity of its members because of a privacy oversight of their social strategy that led to very public and thus very embarrassing moments for the company and its customers (see the screenshot). More on Big Data privacy concerns in the next installment of the Big Data Series.

Finally, as a next generation Big Data goal and thought experiment for our collective Quantified Selves, suppose we could encode all the genomes of the 7 billion humans on Earth, requiring about 5.6 exabytes of storage for our collective global genetic information. (Recall from the previous episode of this series, an exabyte is a billion gigabytes or one million terabytes.) Imagine the types of correlations and predictions that would be gleaned from this colossal data source, especially when the genotypes are correlated with phenotype data. That's not just Big Data, that's HUGE DATA. And HUGE social issues as well. Again, we will explore some of these social issues in the next episode.


That's A Wrap!

That's all for this episode of the Big Data Series. Next time we will explore Big Data and Predictive Crime, Total Information Awareness, and the Stellar Wind. Spoiler Alert: Stellar Wind has nothing to do with astronomy in this case. Stay Tuned!

Tuesday
May222012

The Big Data Series: Noosphere, The Omega Point and The Seldon Plan

Welcome to the Big Data Series on Viewpoint West Partners, LLC. In this series, we will explore various facets of Big Data including emerging Big Data trends in science, engineering, society and public policy. The material for this abridged series comes from a variety of consultations, private talks and seminars hosted by Viewpoint West Partners, LLC.

Big Data is a term used by marketeers and technology wonks alike to describe the situation when data sets grow large and complex enough to become awkward or even impossible to work with using conventional database management tools. In this sense, the bad news is that Big Data describes a negative scenario that is rooted in the failure of prevailing database technology to deal with the onslaught of massive data volumes being produced by the success of other technologies (mobile devices, social networks, sensor networks, the internet, etc.) that produce said data onslaught. The good news is that Big Data is a pervasive 'problem looking for a solution' rather than the much more common converse that is the signature of many (failed) technology companies.

The emergence of the term, 'Big Data,' is due to the prevalence of the problem of massive data volumes now being generated in nearly every sector of society and business on a daily, hourly and minute-to-minute basis. This situation renders conventional tools impotent in dealing with business dashboards and realtime analytics needed to make the data-driven decisions that CEOs take so much pride in. As such, Big Data technology has now become a competitive weapon in business, as well as a fundamental aspect of doing Big Science and making effective public policy decisions worldwide.

 

The Data Sphere: Beyond The Big Data Paradigm Shift
For the first time in human history, we are seeing a massive shift in decision making perspective that comes from the significant patterns emerging and coalescing in the 'Data Sphere', a term I use to highlight the data ocean that we all swim in. In a quantum sense, each of us continually adds to the Data Sphere, as we leave data wakes that are time-series gold to be mined by those properly equipped and properly authorized (hopefully) or not. It has become increasingly difficult, if not impossible, to not be a human data source if one participates in any aspect of modern society. And in many cases, we are involuntarily participating in decision processes and analytics that in a very real sense will determine our future reality, especially in terms of privacy, economic opportunity, health care, and security. We will explore this Big Data reality in detail as we progress through this series.

 

Size Matters
Effectively handling Big Data is a three-fold problem that is more than merely a data size problem: Big Data involves data volume (the size of the dataset), data velocity (the incoming data rate), and data variety (the myriad data sources, both structured - Call Detail Records - and unstructured - Images, Audio, Video). Taken together, this triad defines the crux of the problem that most conventional database tools and techniques encounter when dealing with the brave new world of Big Data.

So how big is Big Data? Big Data sizing is typically in terms of petabytes, exabytes and zettabytes. For the purpose of this series, let's focus on the smallest size, the petabyte. One petabyte is one million gigabytes or one thousand terabytes.

Again, it's important to note not just the size, but also the data rate (volume) and the variety of the data sources involved in Big Data. Here are some Big Data benchmarks to consider:

  • The world's effective capacity to exchange information through two-way telecommunication networks was 281 petabytes of (optimally compressed) information in 1986, 471 petabytes in 1993, 2,200 petabytes in 2000, and 65,000 (optimally compressed) petabytes in 2007 (this is the informational equivalent of every person in the world exchanging 6 newspapers per day).

  • AT&T transfers about 19 petabytes of data through their networks each day. Google processes about 24 petabytes of data per day. The BBC's iPlayer is reported to use 7 petabytes of bandwidth each month. Imgur transfers about 3 petabytes every month. The Internet Archive contains about 5.8 petabytes of data as of December 2010, growing at the rate of about 100 terabytes per month in March 2009.

  • The World of Warcraft uses 1.3 petabytes of storage to maintain its game. Steam, a digital gaming service developed by Valve, delivers over 30 petabytes of content monthly. The 2009 movie Avatar took over 1 petabyte of local storage at Weta Digital for the rendering of the 3D CGI effects.

  • The experiments in the Large Hadron Collider produce about 15 petabytes of data per year, which will be distributed over the LHC Computing Grid. It is estimated that the human brain's ability to store memories is equivalent to about 2.5 petabytes of binary data. The German Climate Computing Centre (DKRZ) has a storage capacity of 60 petabytes of climate data.

  • The Teradata Database 12 has a capacity of 50 petabytes of compressed data. In August 2011, IBM built the largest storage array ever, with a capacity of 120 petabytes. And in January 2012, Cray began construction of the Blue Waters Supercomputer, which will have a capacity of 500 petabytes that will make it the largest storage array ever, once built.

One can plainly see that we are squarely in the 'Petabyte Age' of Big Data, rapidly moving to exabyte scale. And note that an exabyte is a billion gigabytes or one million terabytes. 

 

Measuring The Noosphere Via The 'Data Sphere'
At this point in human history, we are generating more data from more sources faster than ever before, at an accelerating rate that is constantly challenging our technological and societal capacities to deal with the ramifications of it all. Why is all this happening?

Pierre Teilhard de Chardin SJ was a French philosopher and Jesuit priest who trained as a paleontologist and geologist and took part in the discovery of Peking Man. Teilhard conceived the idea of the Omega Point and developed Vladimir Vernadsky's concept of Noosphere. Teilhard set forth a sweeping account of the unfolding of the cosmos and abandoned traditional interpretations of creation in the Book of Genesis in favor of the idea of the Universe as a "Living Host".

For Teilhard, the Noosphere emerges through and is constituted by the interaction of human minds. The Noosphere has grown in step with the organization of the human mass in relation to itself as it populates the earth. As mankind organizes itself in more complex social networks, the higher the Noosphere will grow in awareness. This concept is an extension of Teilhard's Law of Complexity/Consciousness, the law describing the nature of evolution in the universe. Teilhard argued the Noosphere is growing towards an even greater integration and unification, culminating in the Omega Point, which he saw as the goal of human history: an apex of thought/consciousness. In this context, Big Data is actually the Data Sphere which models the Noosphere in much the same manner as the footprints in the sand model the beachcombers.

 

Next Stop: The Omega Point
The Omega Point is a maximum level of complexity and consciousness towards which Teilhard believed the universe was evolving. The universe is constantly developing towards higher levels of material complexity and consciousness, a theory of evolution that Teilhard called the Law of Complexity/Consciousness. For Teilhard, the universe can only move in the direction of more complexity and consciousness if it is being drawn by a Supreme Point of complexity and consciousness.

Thus Teilhard postulates the Omega Point as this supreme point of complexity and consciousness, which in his view is the actual cause for the universe to grow in complexity and consciousness. The Omega Point exists as supremely complex and conscious, transcendent and independent of the evolving universe. Teilhard argued that the Omega Point resembles the Christian Logos, namely Christ, who draws all things into Himself, who in the words of the Nicene Creed, is "God from God", "Light from Light", "True God from true God," and "through Him all things were made." Does the Big Data milieu of today actually foreshadow the approaching Omega Point of Teilhard?

 

The Prediction Wall: Singularity On Board
Transhumanists argue that accelerating technological progress inherent in the Law of Accelerating Returns will lead to what Vernor Vinge called a technological singularity or "prediction wall." We will soon enter a time in which we must make the transition to a "runaway positive feedback loop" of very high-level autonomous machine computation. Our technological and computational tools eventually completely surpass human capacities. Some transhumanist writings refer to this moment as the Omega Point, paying homage to Teilhard's prior use of the term. Other transhumanists, in particular Ray Kurzweil, refer to the technological singularity as simply "The Singularity".

We can plainly see that as Big Data gets even bigger, our computational tools will, by absolute necessity, become far more sophisticated in terms of pattern recognition, machine learning, and most importantly, complete automation. Already we have High Frequency Trading (HFT) computers that can cause enormous market swings (at the drop of an errant button press) that are beyond the capacity of human traders to fully understand and control. This is just a mere prelude to a computational reality that would make even the best science fiction writers give up their craft and tend sheep.

 

Big Data And The Seldon Plan
Science Fiction always predicts the future, for better or worse. At the very least, it supplies us with cautionary tales that become important parts of the zeitgeist. In this tradition, we have some very prescient material to work with as we ponder the Big Data age.

The Foundation Series is a science fiction series by Isaac Asimov. The premise of the series is that mathematician Hari Seldon spent his life developing a branch of mathematics known as psychohistory, a sort of mathematical sociology. Using the laws of mass action, psychohistory can predict the future, but only on a very large scale; it is error-prone on a small scale. Psychohistory works on the principle that the behavior of a mass of people is predictable if the quantity of this mass is very large (equal to the population of the galaxy, which has a population of quadrillions of humans, inhabiting millions of star systems). The larger the number, the more predictable the future.

Using psychohistory techniques, Seldon foresees the imminent fall of the Galactic Empire, which encompasses the entire Milky Way, and a dark age lasting thirty thousand years before a second great empire arises. Seldon's psychohistory also foresees an alternative where the intermittent period will last only one thousand years. To ensure his vision of a second great Empire comes to fruition, Seldon creates two Foundations—small, secluded havens of all human knowledge—at "opposite ends of the galaxy".

The focus of the series is on the First Foundation and its attempts to overcome various obstacles during the formation and installation of the Second Empire, all the while being silently guided by the unknown specifics of The Seldon Plan.

Another fundamental assumption of psychohistory is that the population itself is unaware of the existence of psychohistory. We shall come back to this fascinating twist later in the Big Data Series when we discuss individual privacy and anonymity.

 

That's A Wrap
From petabytes, exabytes, zettabytes and the Data Sphere, to the Noosphere, the Omega Point, Psychohistory and the Singularity, Big Data is much more significant than a mere marketing term on a glossy brochure (or PDF file). The rate of Big Data acceleration is itself accelerating, and we are all on a runaway technology ride that is already changing our everyday reality. What will come of it?

 

Next in the Big Data Series, will we explore Reality Mining, Honest Signals, the Quantified Self, and much much more. Please Stay Tuned.

Sunday
May202012

Let's Party Like It's 1999

No matter how many times I check, the calendar still says 2012.

But honestly, I just can't shake that 1999 feeling. Facebook finally IPOed and got over $100 Billion valuation but not before a billion dollar purchase of its own, namely, Instagram.

Other recent Facebook acquisitions, though smaller, were mobile social discovery service Glancee and the mobile social gift-giving app Karma.

With all the bubbly flowing, it's hard not to ponder the dreaded Bubble Question, especially for those of us who witnessed the last 'big pop.'  But hey, exuberance is nothing if not irrational.  Given all the economic bad news, it is nice to have a little island in the sun here in Silicon Valley.  Even if it is temporary.

So tonite I'm gonna party like it's 1999!

 

UPDATE: Well, that was a very short Facebook party. Facebook stock lost more than 10% on Monday, May 21st 2012:

By Monday, sentiment had apparently turned against Facebook so thoroughly that underwriters seeking to unload the shares were forced to take substantial losses as the market marked the shares down.

Having seen a number of investment funds buy the shares on Friday as fund managers loaded up in the expectation that Facebook would bring a boost to their portfolio, the remaining buyers in the market on Monday were less willing to pay a premium – leaving the underwriters with no option but to accept a loss if they wanted to pass the shares on. 

Henry Blodget has this to say:

Henry Blodget, the former Wall Street analyst who, ahead of the IPO, called the shares "muppet bait", said on his Business Insider site on Monday that the lack of big jump in first-day trading was probably good news for millions of small investors, who had been discouraged from piling into the stock. He reckoned that a fair value for the company would be somewhere between $16 and $24 a share, depending on its results. That would value Facebook at between $50bn and $85bn – a substantial amount, but far from the $104bn (£65.8bn) that the $38 share price put on it.