Nowadays, when we or any of our family members are sick, how often do we go to a doctor with just an MBBS degree for diagnosis or treatment? At the most, we might get their opinion in case of an emergency or would want them to direct us to the right specialist. Doctors with just an MBBS degree may also be hard to find these days. Today, there are specialized doctors for several parts of our body, like cardiologists, neurologists, pulmonologists, etc. There are specialists for different age groups as well, like pediatric, adolescent, etc. There are also separate doctors for other living creatures. The human body itself is so complex, the more we are able to discover it with advancements in technology, the more has been the need for specialists.

Now, let's try to understand this in the context of a Data Scientist. Data can be of several types; image, audio, text, sensor, etc, and data is always associated with a domain. Medicine is one such domain and there are hundreds more, like agriculture, manufacturing, design, defense, marketing, economics, politics, art, etc. The domain is basically a field that plays a role in society. In order to make proper use of data, it is important to have adequate knowledge about the domain. An AI expert is one who has sufficient knowledge of Math, Domain, and Computer. Too much knowledge in any one of these would push him towards being a Mathematician, a Domain Expert, or a Software Engineer. The knowledge of Math (Algorithms) and Computer is a basic skill that any data scientist needs to have and Domain is his specialization. The knowledge of a specific domain helps interpret the data, the knowledge of math/algorithms helps formulate a solution and the knowledge of computers helps process this data.

In the digital world, data is a numerical quantity and data is subject to interpretation by the domain expert. Just a sequence of numbers simply means nothing. A computer stores data as 1’s and 0’s irrespective of its type (an image, audio, time series, or text).

There are two kinds of transformation that data goes through to make sense. The first is the conversion from binary to the format it belongs to. An image stored in a computer is converted from 1’s and 0’s into a raw image format. The computer would never complain if the same sequence is converted into raw audio. At the core, a computer does not distinguish data as an image, text, audio, or time series. It is just 1’s and 0’s. The second is the presentation of the data for interpretation. A raw image would make sense to a human if it is displayed on the screen. Again, the computer would never complain if this image is converted to raw audio and sent to the speaker. From the human interpretation point of view, it would just not make any sense if you do so. The domain plays a significant role in data interpretation. Data can be generic to a machine but not to a human and it is because computers don't understand data humans should.

If a problem in the medical domain is presented to a data scientist with no knowledge about that specific domain, it is unlikely that he would come up with a superior solution for it. It is important to have the ability to mine relevant data from whatever is collected. Modern deep learning algorithms are so powerful in fitting the data, any garbage in would result in garbage out. No wonder Neural Network algorithms are called Universal Function Approximators.

When a domain such as medicine which is a subset of data science demands specialists, why isn’t there a mandate within data science for specialization? An aspiring data scientist should start by acquiring the skills in math and computers, which is the science required to process data. He should then pick a domain to work on. The choice of a particular domain usually comes out of passion. Each of the domains is so vast that it would require him to specialize in a particular category of problems in that domain. He should then pick a specific type of data like image, audio, etc to work with. Broadening of his scope (if at all) should happen from right to left than from left to right in the below figure.

A typical designation of a data scientist who specializes in cancer and ortho would be

Now, anyone wanting to solve problems in any specific field would exactly know the data scientist to be approached.

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE