Data in the current perspective:

From a use perspective, data has intrinsic value, but is of no use until its value is discovered. Most organizations have been using data to reaffirm the obvious information. But it is increasingly discovered that underneath that surface level data lies a treasure trove of useful information, which if extracted, can over turn the face of business. As commented in a published report, if data is the currency of business then data analytics is the new language of business.

Analytics has emerged as a useful scientific process of sourcing the data, turning that data into information, using that information to generate insights, and then implement those insights to monetize the data. The process is mechanized through the use of algorithms that are designed to work over the data sets.

Big data refers to the field that treats with the ways to analyze and extract information from a too large and complex data sets. And whereas, the technique of Business Intelligence (BI) uses mathematical tools/statistical analyses to detect trends in the data sets, which are mostly the present and past data, Big data uses mathematical analyses and optimizations to reveal relationships and dependencies and perform prediction for future data sets.

The field of Data Science that includes Big data and Analytics, is an inter-disciplinary field that uses scientific methods, processes and algorithms to extract knowledge and insights from the structured/unstructured data sets. It is thus related to data curation, data mining, statistics, machine learning, and artificial intelligence.

Challenges and implications of Big data:

The important attributes associated with the data sets are;

  •  Data is a transmissible and storable computer information; knowledge derives an understanding from the data; and more data, more information, therefore, more knowledge embedded in it.
  • Traditionally data was mostly structured and small in size such that it could be analyzed to discover information and knowledge by simple BI tools like Data Warehousing, RDBMS etc., but these are far too inadequate to handle large and complex data sets that are presently found to exist in a number of fields of business or social activities.
  • Today the available data is mostly unstructured or semi structured and extremely large (the size definition may vary for different uses, but may fall in tera bytes, peta bytes or even exa bytes range).
  •  The data is generated from multiple types and widely dispersed sources. Hence its collection and dissemination poses a challenge.
  • The data with its 3Vs; volume, velocity, variety, and also another 3Vs; veracity, variability and value, as defined by Gartner, presents a great wealth for Businesses to monetize, and a single biggest driver of Big data investments.
  • Data processing (cleaning, parsing, analyzing, mining) of large data sets is an extremely complex and time consuming process, hence need advanced tools to determine patterns and relationships and apply statistics to determine if the hypotheses (algorithms) are correct to make right decisions and predictions.
  •  Data from the various fields of activities will continue to rise (currently growing at 40% an year as per IDC estimate, to reach a level of 163 zettabytes by 2025 from present 44 zettabytes), and will continue to put demand on the systems to secure data, analyze and put it to effective use.
  • Data as a raw material is recognized as a potential wealth generating component of any Analytics system.
  • It is indeed a struggle to keep pace with data generation. Data curation on a large data sets is a time consuming exercise as it is important to determine which data represents signal and which data a noise. Quality check on the data before it is put to exploit thus assumes importance.
  • There are data ownership, privacy and security issues that pose a constant threat to the development and deployment of usable and safe applications.
  • Even as the efforts have largely been to protect IT application systems against intrusion, seeking to steal data or compromise systems, an area of continuing concern is the integrity and reliability of data which is in fact used for training of data sets relied by the machine learning tool.
  • Data poisoning is used by the attackers to influence the training data to manipulate results of a predictive model and create a misguided algorithm. It is achieved by various methods such as, logic corruption, modification, manipulation, data injection, transfer learning. As per Microsoft estimate 30% of AI cyber attacks are expected to leverage training data poisoning. Efforts are underway to minimize their impacts by working on data health checks, correlating training data from previously used training data that resulted in a known algorithm etc.


Underlying Technologies:

Having discussed the context of data in the current time, let us look at the technologies that are available and emerging to handle the ever increasing data in size and complexity that make them useful in different application environments.

The technologies used are primarily for analysis, processing and visualization. Among them data analytics is at the core. Four types of data analytics are known;

Descriptive - that determines the events that have happened,

Diagnostic - that analyses why the particular event has occurred,

Predictive – that forecasts what event is likely to happen, and

Prescriptive- which suggests a course of action

For analysis the techniques of machine learning (ML) and deep learning (DL), as subsets of artificial intelligence (AI), and natural language processing (NLP) are popularly used. These enable to unearth hidden values and trends and relationships from the data sets, and therefrom prediction is carried out using training data first, and thereafter predictive analysis is carried out on complete data sets. Text mining, information retrieval and speech processing are also germane to this technology concept. It is here that the data is fully analyzed with as much a level of granularity as possible.

For processing the available business intelligence (BI) tools and the concept of Cloud computing are deployed. Cloud has emerged as a convenient and cost effective platform to process data, however large it may be, and use of large computing powers available on the cloud. Here hundreds or thousands of computers are dispersed over a network in a distributed processing architecture. Hadoop, Spark and multiple layer are popular distributed processing frameworks. Cloud has now become inseparable from data, computing power and smart algorithms as much that the AI tools are made available on Cloud to service the intended applications. Towards this, Software as a service (SAAS) is gaining momentum in the information technology eco system and Cloud is at the center of innovation to serve a host of data processing applications. Parallel processing architecture is also used in applications that require huge computing power, of the orders of petaflops or more, with extremely low inter-processor communication latencies, of the order of nanosecs.

For visualization techniques of data reporting and display are used. 3D displays and Dashboard are popular examples of presenting the results of data analysis in convenient visual formats. It also aids an important decision support function.

Data generating devices and Sources of data:

Data is generated and sourced from across multiple and dispersed sources based on the application to be serviced.

The data generating devices are, but not limited to;

  • IoTs - Lap Tops, RFIDs, Mobiles, Built in Sensors, Wireless Sensors, Cameras
  • Computers and Network devices 
  • Moving platforms e.g. Automobiles, Drones, Airplanes, Satellites, Ships
  • Robots
  • Home appliances
  • Manufacturing lines equipped with monitoring devices
  • Service lines equipped with data capture devices

And the sources of data as visualized are:

  • Products we manufacture
  • Transactions we deal with
  • Processes we adopt
  • Services we avail from Government and other organizations
  • Consumers we interact with
  • Civic infrastructure we use
  • Government agencies we interact with
  • Banks & Financial institutions we engage with
  • Utility services we are served with
  • Communications leaving a digital footprint on the internet
  • Society we live in interacting with various social systems

Range of Activities and Applications Opportunities of Big Data:

Basis the multitudes of devices and sources that emanate data, the range of activities of Big data are;

  • Develop predictive models
  • Teaching rather than programming in machine learning
  • Help monetize data towards business end
  •  Predictive maintenance of plant and machines
  •  Improve operational efficiencies of manufacturing lines
  •  Fraud detection and compliance to security regulations
  •  Drive innovation based on range and size of information
  • Gauge customer experience and preferences
  •  Better quality reporting for decision support
  • Scientific experimentations and dissemination of useful information

The example application areas in social and economic sectors, where opportunities exist for Big data to be usefully monetized, are as wide as;

  • Supply chain - is one of the foremost application areas of data that enable cost and quality effective sourcing at one end, and efficient supply at the user end.
  • Retail- generating large volumes of data from the consumers that makes it a useful application area of AI/ML/DL.
  •  Banking & Financial services- is a traditionally recognized area of data that is put to use for safe and efficient servicing of banking needs of customers.
  • Insurance- dealing with reconciliation, claim processing, fraud detection, health check etc.
  • Healthcare-enormity of data renders it possible for computer aided diagnostics and predictive analysis to service an important social need.
  • e Governance- a variety of Public services eg. Public distribution, records & registration, tax collection, infrastructure creation and use etc.
  • Media- serving with relevant information to a large cross section of society based on consumer needs and preferences.
  • Manufacturing- industries generate data from its various operations that is consumed by them to improve productivity and minimize down times.
  • Security services- disaster management and mitigation, and protection of public assets and information that deals with enormous amount of data.
  • Farming & Agriculture- data from weather patterns, sowing patterns, soil characteristics, plant diseases and land holdings are useful inputs for analysis.
  • Education-data used for content delivery, supply and demand for professionals, pedagogies, institutions and teachers make it a useful analysis.
  • IoT- as these devices generate enormous amount of data, which is largely unstructured, it is curated and put to use in many areas, such as in manufacturing lines to suggest -if we had devices with relevant data we would know when to repair and predict possible failures.
  • Scientific experiments such as Super Collider Project of CERN, Human Genomic Sequencing, Giant Radio Telescopic observations, Weather predictions, are some examples where an enormous data generated has been leveraged to decipher some very useful inferences from these events.


Big Data- a sought after discipline:

Looking at the range of potential applications of Big data in the social and economic sectors, and the leverage provided by the available technologies to service such applications, it is a much sought after discipline with the following attributes;

  • It is considered as an ‘Upper Class’ discipline.
  • Most sought after jobs of the twenty first century.
  • One of the highest paid jobs in the Industry.
  • Has a potential to transform the Industry or a Service line that relies on data.
  • Demand for the professionals has doubled over the year and is poised to grow exponentially
  • Need a range or professionals – data curators, technologists for machine learning, deep learning, artificial intelligence.
  • Professionals for technologies of natural language processing, information retrieval, text mining, speech processing, data architects, statisticians, mathematic modelling, and smart programmers.
  • It is a multi-disciplinary field – Computer Science, Cognitive science, Mathematics,Statistics and Analytics, and therefore requires a team to work together to address a plethora of data related issues and their applications.
  • There is a role for everyone.

A Peep into the future:

A look at the Gartner Magic Quadrant for BI and Analytics companies reveals a large number of them figuring in the third and fourth quadrant of being Visionaries and Leaders. Companies are engaging in building analytics systems by unlocking data insights with the use of AI, ML, DL, NLP, BI and Cloud for their customers in the opportunities that did not exist five years ago. Their efforts are directed to;

Modernize – legacy reporting solutions which were largely built on using data at surface level,

Engage- build data culture and increase use acceptance of the utility of analytics,

Automate –discover and identify every critical change in data that is a value worth realizing, and

Innovate –create new products and services and revenue streams for the customers using data to its best.

As companies are engaging in this task, they have realized a woeful shortage of data professionals, millions in the global market. In India, a NASSCOM estimated 200,000 professionals are in short supply. A global opportunity of US $ 200 Billion, rising at 10% an year, is waiting to be tapped, as per an IDC estimate. Data and technologies will play a crucial role in realizing this large business in social and economic sectors, while research efforts continue to address the data related issues discussed in this Paper.



 





Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE