DATA SCIENCE
What is Data Science?
“We have lots of data – now what?”
(How can we unlock real value from our data?)
Data science is a multidisciplinary blend of data inference, algorithm development, and technology in order to solve analytically complex problems.
At the core is data. Troves of raw information, streaming in and stored in enterprise data warehouses. Much to learn by mining it. Advanced capabilities we can build with it. Data science is ultimately about using this data in creative ways to generate business value:
Data Warehouse is divided into-
- Discovery of Data Insight
Quantitative data analysis to help steer strategic business decisions
- Development of Data Product
Algorithm solutions in production, operating at scale (e.g. recommendation engines)
These both together gives Business Value.
Data science – discovery of data insight
This aspect of data science is all about uncovering findings from data. Diving in at a granular level to mine and understand complex behaviors, trends, and inferences. It’s about surfacing hidden insight that can help enable companies to make smarter business decisions. For example:
- Netflix data mines movie viewing patterns to understand what drives user interest, and uses that to make decisions on which Netflix original series to produce.
- Target identifies what are major customer segments within it’s base and the unique shopping behaviors within those segments, which helps to guide messaging to different market audiences.
- Proctor & Gamble utilizes time series models to more clearly understand future demand, which help plan for production levels more optimally.
How do data scientists mine out insights? It starts with data exploration. When given a challenging question, data scientists become detectives. They investigate leads and try to understand pattern or characteristics within the data. This requires a big dose of analytical creativity.
Then as needed, data scientists may apply quantitative technique in order to get a level deeper – e.g. inferential models, segmentation analysis, time series forecasting, synthetic control experiments, etc. The intent is to scientifically piece together a forensic view of what the data is really saying.
This data-driven insight is central to providing strategic guidance. In this sense, data scientists act as consultants, guiding business stakeholders on how to act on findings.
Data science – development of data product
A “data product” is a technical asset that: (1) utilizes data as input, and (2) processes that data to return algorithmically-generated results. The classic example of a data product is a recommendation engine, which ingests user data, and makes personalized recommendations based on that data. Here are some examples of data products:
- Amazon’s recommendation engines suggest items for you to buy, determined by their algorithms. Netflix recommends movies to you. Spotify recommends music to you.
- Gmail’s spam filter is data product – an algorithm behind the scenes processes incoming mail and determines if a message is junk or not.
- Computer vision used for self-driving cars is also data product – machine learning algorithms are able to recognize traffic lights, other cars on the road, pedestrians, etc.
This is different from the “data insights” section above, where the outcome to that is to perhaps provide advice to an executive to make a smarter business decision. In contrast, a data product is technical functionality that encapsulates an algorithm, and is designed to integrate directly into core applications. Respective examples of applications that incorporate data product behind the scenes: Amazon’s homepage, Gmail’s inbox, and autonomous driving software.
Data scientists play a central role in developing data product. This involves building out algorithms, as well as testing, refinement, and technical deployment into production systems. In this sense, data scientists serve as technical developers, building assets that can be leveraged at wide scale.
What is data science – the requisite skill set
Data science is a blend of skills in three major areas:

Mathematics ans Statistics Expertise–
At the heart of mining data insight and building data product is the ability to view the data through a quantitative lens. There are textures, dimensions, and correlations in data that can be expressed mathematically. Finding solutions utilizing data becomes a brain teaser of heuristics and quantitative technique. Solutions to many business problems involve building analytic models grounded in the hard math, where being able to understand the underlying mechanics of those models is key to success in building them.
Also, a misconception is that data science all about statistics. While statistics is important, it is not the only type of math utilized. First, there are two branches of statistics – classical statistics and Bayesian statistics. When most people refer to stats they are generally referring to classical stats, but knowledge of both types is helpful. Furthermore, many inferential techniques and machine learning algorithms lean on knowledge of linear algebra. For example, a popular method to discover hidden characteristics in a data set is SVD, which is grounded in matrix math and has much less to do with classical stats. Overall, it is helpful for data scientists to have breadth and depth in their knowledge of mathematics.
Technology, Coding and Hacking
First, let’s clarify on that we are not talking about hacking as in breaking into computers. We’re referring to the tech programmer subculture meaning of hacking – i.e., creativity and ingenuity in using technical skills to build things and find clever solutions to problems.
Why is hacking ability important? Because data scientists utilize technology in order to wrangle enormous data sets and work with complex algorithms, and it requires tools far more sophisticated than Excel. Data scientists need to be able to code — prototype quick solutions, as well as integrate with complex data systems. Core languages associated with data science include SQL, Python, R, and SAS. On the periphery are Java, Scala, Julia, and others. But it is not just knowing language fundamentals. A hacker is a technical ninja, able to creatively navigate their way through technical challenges in order to make their code work.
Along these lines, a data science hacker is a solid algorithmic thinker, having the ability to break down messy problems and recompose them in ways that are solvable. This is critical because data scientists operate within a lot of algorithmic complexity. They need to have a strong mental comprehension of high-dimensional data and tricky data control flows. Full clarity on how all the pieces come together to form a cohesive solution.
Substantive Business and Marketing Acumen
It is important for a data scientist to be a tactical business consultant. Working so closely with data, data scientists are positioned to learn from data in ways no one else can. That creates the responsibility to translate observations to shared knowledge, and contribute to strategy on how to solve core business problems. This means a core competency of data science is using data to cogently tell a story. No data-puking – rather, present a cohesive narrative of problem and solution, using data insights as supporting pillars, that lead to guidance.
Having this business acumen is just as important as having acumen for tech and algorithms. There needs to be clear alignment between data science projects and business goals. Ultimately, the value doesn’t come from data, math, and tech itself. It comes from leveraging all of the above to build valuable capabilities and have strong business influence.
Let’s Understand Why We Need Data Science

- Traditionally, the data that we had was mostly structured and small in size, which could be analyzed by using the simple BI tools. Unlike data in the traditional systems which was mostly structured, today most of the data is unstructured or semi-structured. Let’s have a look at the data trends in the image given below which shows that by 2020, more than 80 % of the data will be unstructured.
This data is generated from different sources like financial logs, text files, multimedia forms, sensors, and instruments. Simple BI tools are not capable of processing this huge volume and variety of data. This is why we need more complex and advanced analytical tools and algorithms for processing, analyzing and drawing meaningful insights out of it.
This is not the only reason why Data Science has become so popular. Let’s dig deeper and see how Data Science is being used in various domains.
- How about if you could understand the precise requirements of your customers from the existing data like the customer’s past browsing history, purchase history, age and income. No doubt you had all this data earlier too, but now with the vast amount and variety of data, you can train models more effectively and recommend the product to your customers with more precision. Wouldn’t it be amazing as it will bring more business to your organization?
- Let’s take a different scenario to understand the role of Data Science in decision making. How about if your car had the intelligence to drive you home? The self-driving cars collect live data from sensors, including radars, cameras and lasers to create a map of its surroundings. Based on this data, it takes decisions like when to speed up, when to speed down, when to overtake, where to take a turn – making use of advanced machine learning algorithms.
- Let’s see how Data Science can be used in predictive analytics. Let’s take weather forecasting as an example. Data from ships, aircrafts, radars, satellites can be collected and analyzed to build models. These models will not only forecast the weather but also help in predicting the occurrence of any natural calamities. It will help you to take appropriate measures beforehand and save many precious lives.
What is a data scientist – curiosity and training
The Mindset
A common personality trait of data scientists is they are deep thinkers with intense intellectual curiosity. Data science is all about being inquisitive – asking new questions, making new discoveries, and learning new things. Ask data scientists most obsessed with their work what drives them in their job, and they will not say “money”. The real motivator is being able to use their creativity and ingenuity to solve hard problems and constantly indulge in their curiosity. Deriving complex reads from data is beyond just making an observation, it is about uncovering “truth” that lies hidden beneath the surface. Problem solving is not a task, but an intellectually-stimulating journey to a solution. Data scientists are passionate about what they do, and reap great satisfaction in taking on challenge.
Training
There is a glaring misconception out there that you need a sciences or math Ph.D to become a legitimate data scientist. That view misses the point that data science is multidisciplinary. Highly-focused study in academia is certainly helpful, but doesn’t guarantee that graduates have the full set of experiences and abilities to succeed. E.g. a Ph.D statistician may still need to pick up a lot of programming skills and gain business experience, to complete the trifecta.
In fact, data science is such a relatively new and rising discipline that universities have not caught up in developing comprehensive data science degree programs – meaning that no one can really claim to have “done all the schooling” to be become a data scientist. Where does much of the training come from? The unyielding intellectual curiosity of data scientists push them to be motivated autodidacts, driven to self-learn the right skills, guided by their own determination.
Analytics and machine learning – how it ties to data science
There are a slew of terms closely related to data science that we hope to add some clarity around.
What is Analytics?
Analytics has risen quickly in popular business lingo over the past several years; the term is used loosely, but generally meant to describe critical thinking that is quantitative in nature. Technically, analytics is the “science of analysis” — put another way, the practice of analyzing information to make decisions.
Is “analytics” the same thing as data science? Depends on context. Sometimes it is synonymous with the definition of data science that we have described, and sometimes it represents something else. A data scientist using raw data to build a predictive algorithm falls into the scope of analytics. At the same time, a non-technical business user interpreting pre-built dashboard reports (e.g. GA) is also in the realm of analytics, but does not cross into the skill set needed in data science. Analytics has come to have fairly broad meaning. At the end of the day, as long as you understand beyond the buzzword level, the exact semantics don’t matter much.
What is the difference between an analyst and a data scientist?
“Analyst” is somewhat of an ambiguous job title that can represent many different types of roles (data analyst, marketing analyst, operations analyst, financial analyst, etc). What does this mean in comparison to data scientist?
- Data Scientist: Specialty role with abilities in math, technology, and business acumen. Data scientists work at the raw database level to derive insights and build data product.
- Analyst: This can mean a lot of things. Common thread is that analysts look at data to try to gain insights. Analysts may interact with data at both the database level or the summarized report level.
- To be a Data Scientist one needs to have robust business acumen and visualization skills to process insights into a business story whereas a Data Analyst needn’t have specialized business skills and basic visualization skills would suffice in his case.
- A Data Scientist should be very proficient in machine learning and in building statistical models. Such models find huge applications in spatial models, recommendation systems, predictive modeling, supervised classification, clustering. In the case of Data Analyst however he’s not required to be proficient in these processes.
- Predictive analytics is a process which the Data Scientist needs to excel in. Deriving highly accurate future predictions from past datasets is one of his primary responsibilities. A Data Analyst on the other hand derives valuable insights from huge data.
- A Data Scientist’s job requires him to make sense of the unknown aspects of the business while a Data Analyst works on the known business aspects from fresh perspectives. This is one of the reasons why being a Data Scientist is twice the hard work than being a Data Analyst. It also answers why Data Scientists are paid almost twice than Data Analysts.
- A Data Scientist approaches business issues and moreover picks up those issues which have greater business value while a Data Analyst just approaches business issues.
- A Data Scientist should be well grounded in statistics, mathematics, data mining, correlation. A Data Analyst needs to excel in data architecture’s tools and components.
- Applying rank, median like analytical functions on data sets is one of Data Scientist’s many jobs. A Data Analyst needs only excel in data storing and retrieving tools.
- Expertise on database systems especially on NoSQL systems is required by the Data Scientist. A Data Analyst needs to know business intelligence and data warehousing concepts.

Thus, “analyst” and “data scientist” is not exactly synonymous, but also not mutually exclusive. Here is our interpretation of how these job titles map to skills and scope of responsibilities:

The Applications of Each Field
Applications of Data Science:
- Internet search: Search engines make use of data science algorithms to deliver best results for search queries in a fraction of seconds.
- Digital Advertisements: The entire digital marketing spectrum uses the data science algorithms – from display banners to digital billboards. This is the mean reason for digital ads getting higher CTR than traditional advertisements.
- Recommender systems: The recommender systems not only make it easy to find relevant products from billions of products available but also adds a lot to user-experience. A lot of companies use this system to promote their products and suggestions in accordance with the user’s demands and relevance of information. The recommendations are based on the user’s previous search results.
Applications of Data Analysis:
- Healthcare: The main challenge for hospitals with cost pressures tightens is to treat as many patients as they can efficiently, keeping in mind the improvement of the quality of care. Instrument and machine data is being used increasingly to track as well as optimize patient flow, treatment, and equipment used in the hospitals. It is estimated that there will be a 1% efficiency gain that could yield more than $63 billion in the global healthcare savings.
- Travel: Data analytics is able to optimize the buying experience through the mobile/ weblog and the social media data analysis. Travel sights can gain insights into the customer’s desires and preferences. Products can be up-sold by correlating the current sales to the subsequent browsing increase browse-to-buy conversions via customized packages and offers. Personalized travel recommendations can also be delivered by data analytics based on social media data.
- Gaming: Data Analytics helps in collecting data to optimize and spend within as well as across games. Game companies gain insight into the dislikes, the relationships, and the likes of the users.
- Energy Management: Most firms are using data analytics for energy management, including smart-grid management, energy optimization, energy distribution, and building automation in utility companies. The application here is centered on the controlling and monitoring of network devices, dispatch crews, and manage service outages. Utilities are given the ability to integrate millions of data points in the network performance and lets the engineers use the analytics to monitor the network.
The Skills you Require
To become a Data Scientist:
- Education: 88% have a Master’s Degree and 46% have PhDs
- In-depth knowledge of SAS and/or R: For Data Science, R is generally preferred.
- Python coding: Python is the most common coding language that is used in data science along with Java, Perl, C/C++.
- Hadoop platform: Although not always a requirement, knowing the Hadoop platform is still preferred for the field. Having a bit of experience in Hive or Pig is also a huge selling point.
- SQL database/coding: Though NoSQL and Hadoop have become a major part of the Data Science background, it is still preferred if you can write and execute complex queries in SQL.
- Working with unstructured data: It is most important that a Data Scientist is able to work with unstructured data be it on social media, video feeds, or audio.
To become a Data Analyst:
- Programming skills: Knowing programming languages are R and Python are extremely important for any data analyst.
- Statistical skills and mathematics: Descriptive and inferential statistics and experimental designs are a must for data scientists.
- Machine learning skills
- Data wrangling skills: The ability to map raw data and convert it into another format that allows for a more convenient consumption of the data.
- Communication and Data Visualization skills
- Data Intuition: it is extremely important for professional to be able to think like a data analyst.
The Responsibility of Each Field
Responsibilities of Data Science:
- Mine and analyze data from company databases to drive optimization and improvement of product development, marketing techniques, and business strategies
- Use predictive modeling to increase and optimize customer experiences, revenue generation, ad targeting, and more
- Develop custom data models and algorithms
- Develop processes and tools to monitor and analyze model performance and data accuracy
- Assess the effectiveness and accuracy of new data sources and data-gathering techniques
- Develop company A/B testing framework and test model quality
- Coordinate with different functional teams to implement models and monitor outcomes
Responsibilities of Data Analyst:
- Conduct consumer data research and analytics
- Work with customer-centric algorithm models and tailor them to each customer as required
- Extract actionable insights from large databases
- Perform recurring and ad hoc quantitative analysis to support day-to-day decision making
- Support reporting and analytics, such as KPIs, financial reports, and creating and improving dashboards
- Help translate data into visualizations, metrics, and goals
- Write SQL queries to extract data from the data warehouse.
Data Scientist vs. Data Analyst: How Much Do They Earn?
How Much Does a Data Analyst Make?
A recent study by PWC estimated that there will be 2.7 million job postings for data analyst and data science roles by 2020. The study goes on to say that candidates must be “T-shaped,” which means they must not only have the analytical and technical skills, but also “soft skills such as communication, creativity, and teamwork.”
Finding someone who has the ideal blend of right-brain and left-brain skills is not an easy task, which is one reason why data analysts are paid well. According to Glassdoor, the average salary for a data analyst is $84,000. Like all jobs, however, data analyst salaries vary by industry. Find out which industry pays the highest data analyst salary.
How Much Does a Data Scientist Make?
We previously gave some examples of what a data scientist in Silicon Valley and New York City can make, and it’s not far from the average. According to Glassdoor, the average annual salary for a data scientist is $162,000.
Becoming a data scientist isn’t easy, yet the demand for data science skills continues to grow. According to LinkedIn’s August 2018 Workforce Report, “data science skills shortages are present in almost every large U.S. city. Nationally, we have a shortage of 151,717 people with data science skills, with particularly acute shortages in [tech hubs such as] New York City, the San Francisco Bay Area, and Los Angeles.” Given the demand, it’s not surprising that it’s such a lucrative career.
Final Thoughts
To summarize the questions we posed at the beginning:
- Data analyst vs. data scientist: do they require an advanced degree?
- A data scientist does, but a data analyst does not.
- Data analyst vs. data scientist: what do they actually do?
- A data scientist works programs, coding, and more as well as analyzes numbers, while a data analyst is more likely to just analyst number.
- Data analyst vs. data scientist: which has a higher average salary?
- A data scientist has a higher average salary.
More work goes into becoming a data scientist than a data analyst, but the reward is a lot greater as well. If you excel in math, statistics, and programming and have an advanced degree in one of those fields, then it sounds like you’d be a perfect candidate for a career in data science.
However, if you are early in your career and are great with numbers but still need to hone your data modeling and coding skills, then you’d be better suited for a job as a data analyst. You can think of a data analyst as a stepping stone to becoming a data scientist, if that is your final goal.