The Impact of Big Data

Part 1 of The Big Data Analytics Series

Aarsh Koul
MIETCRIE

--

Photo by Franki Chamaki on Unsplash

Reports suggest an estimated 1 billion people entered the “middle class” between 1990 and 2005, which holds that incidentally, higher literacy rates ran the gamut for a surge in information growth. The extent at which this confluence of ideas was stretching the already faltering information paradigm to an almost virtual standstill is evidently intriguing, as the users pushed the worldwide data ceiling from a measly 281 petabytes in 1986 to a whopping 667 exabytes in 2014. Around a third of this data is comprised solely of alphanumeric text and still images, which are the formats eagerly sought after by Big Data analysts, closely followed by video and audio data, both brimming with potential as compelling as the former. However implausible it may sound, Big Data can help tremendously with the restoration of digital artifacts, piecing together intermittent data corresponding to a common subject through Data Fusion, a process that requires Big Data repositories.

Merely data collection, or creating such repositories isn’t fruitful unless it helps researchers and organizations understand the needs of their customers better. At least that’s how retail outlets work their way through using the concept of Big Data analytics. American retail giant Walmart processes an excess of 1 million transactions every hour. This data is imported into a database pool estimated to contain 2,560 Terabytes of data, which is then researched upon to weed out products yielding little or no sale and/or profit, or to increase a product outreach.

The general consensus regarding Big Data and its impact upon the populace is that of a two-pronged approach. While some, like Mayor-Schonberger, Victor and Cukier (2013) maintain that “datafication”, as they call it, is an inevitable human tendency that long predates digitization, others, like Zuboff (2015) have expressed concerns, going as far as calling it “surveillance capitalism”. Recall that the mammoth data ceiling that nearly caused the worldwide web to go defunct was invariably, in all respects, a close call. But data will only continue to grow tenfold so long as there are distributed file systems like Hadoop and an analysis protocol that goes by the name of MapReduce to the rescue.

They say whatever goes up must come down. Perhaps that was the case with the world wide web when it was in its infancy. But with the dawn of an age of data retrieval and storage using indexing, which includes, but not limited to using a hyper-textual search engine, the brainchild of Google’s co-founder, Sergey Brin, to assemble an index of “every word on every page of every site” and the storage of such vast amounts of data in inexpensive servers using technologies such as Hadoop, the proverb seems too distant a thing of the past, at least in the realm of data science. In the words of John Deighton (2019), “the experience of privacy is paradoxical”. According to a recent Pew Report, 91% of adults in the US agree their data is at stake and that “consumers have lost control of their personal data”. One can add a tool such as Ghostery to protect him from the prying eyes of online surveillance entities that feed on this data to deliver personalized services and such.

India, too, has set sail for Big Data utilization even as we stand oblivious to the phenomenon. According to a recent study, Big Data was pivotal in ensuring a landslide mandate for the BJP in the 2014 General Lok Sabha elections, taking the political world by storm. The government utilized data from billions of users in an action that purportedly was within the confines of national interest, for ascertaining how the electorate, young voters in particular, was responding to government action and further chalk out ideas for policy augmentation.

We might just be standing at the threshold of an imminent breakthrough in the domain of primary data generators, of which mobile sensors and chip-based cameras might seem to be the forerunners. As more people generate digital data, it isn’t too far-fetched an idea to hypothesize that for each person, there will be a hundred cost-effective sensors devoted to different tasks such as monitoring pets and children, looking after senior citizens, tracking vehicular and pedestrian traffic and a whole lot of things. This outburst of data will challenge our notions of a society. Imagine your car being as powerful a data tool as your smartphone. In conjunction with other cars in the region through a local area network, you’d be able to navigate past road constructions, find out if there’s a charging station nearby and if there’s another car being charged at the moment, or find the nearest parking spot available.

As the world begins to rely more on the technological prowess of the IoT and its associated concepts, the amount of data is expected to double every two years. Analyzing this data requires automated systems that facilitate techniques such as cloud computing, data stream processing, quantum computing, data mining, machine learning, statistical analysis and intelligent analysis, though exploiting parallelism of computer architectures for data mining might still be a far cry for data analysts, but a worthy one to be researched upon, as models obtained through this parallelism like neural networks and their hybrids obtained after combining two unsuspecting models have been found to be exceptionally good at representing data. Technologies that unearth only the relevant data along the lines of Application Programming Interfaces (APIs) provided by corporations excelling in big data, such as Google are also a call of the hour. Using primitive methods in data analytics like Google Trends enabled Tobias Preis and his colleagues to devise a method to identify online precursors for stock market moves. As such, designing programming language abstractions to brandish the concept of parallelism are an immediate requirement to expedite the processing of big data and touch upon other areas of interest sprouting from big data.

—Written by Aarsh Koul, Student, CSE, MIET Jammu and edited by Purnendu Prabhat, Asst. Prof., CSE, MIET Jammu

References:

  • Deighton, John (2019): Big Data, Consumption Markets and Culture, 22:1, 68–73
  • Acharjya, D.P. and Kauser Ahmed P. (2016): A survey on Big Data Analytics: Challenges, Open Research Issues and Tools
  • Hilbert M, López P. (April 2011):The world’s technological capacity to store, communicate and compute information
  • Data, data everywhere (http://www.economist.com/node/15557443)
  • Hilbert, Martin (2014): What is the content of the world’s Technologically Mediated Information and Communication Capacity: How much Text, Image, Audio and Video?
  • Kitchin, Rob; McArdle, Gavin (17 February 2016): What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets. Big Data and Society. 3(1): 205395171663113
  • Mayer-Schonberger, Victor and Kenneth Cukier (2013): Big Data: A revolution that will Transform How We Live, Work, and Think
  • Zuboff, Shoshana (2015): Big Other: Surveillance Capitalism and the Prospects of an Information Civilization. Journal of Information Technology 30(1): 75–89
  • Brin, Sergey (1998): The Autonomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30(1–7): 107–117
  • News: Live Mint (http://www.livemint.com/industry/bUQo8xQ3gStSAy5II9lxoK/Are-Indian-companies-making-enough-sense-of-Big-Data.html)
  • Acharjya, D.P.; Dehuri, S. and S. Sanyal (2015): Computational Intelligence for Big Data Analysis
  • Reips, Ulf-Dietrich; Matzat, Uwe (2014): Mining “Big Data” using Big Data Services (http://www.ijis.net/ijis9_1/ijis9_1_editorial_pre.html). International Journal of Internet Science. 1(1): 1–8
  • Preis T, Moat HS, Stanley HE (2013): Quantifying trading behaviour in financial markets using Google Trends (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3635219)

--

--