Data Analyst, Engineer, Scientist

It’s no secret that big data is, well, getting bigger. As more companies recognize the vital importance of having a robust data intelligence practice, opportunities within the field have exploded. The McKinsey Global Institute has estimated that, by 2018, the U.S could have 1.5 million positions in the world of data that go unfilled due to a lack of adequate training.

This raises an important question that many are too afraid to ask: what exactly is the difference between data analysts, data scientists, and data engineers? While the three positions often have overlapping duties, they are distinct jobs with distinct skillsets and workloads.

Data Analysts

Data analysts are, in essence, junior data scientists. While data scientists will often be responsible for the development of data algorithms and higher-level decisions about how to engage in data analysis, data analysts are largely in charge of the nitty-gritty of data analysis. They will be the figurative on-the-ground employees of the data world, ensuring that a company is working with accurate, well-scraped data. They will then use the tools given to them to engage in data analysis of those datasets.

This isn’t to say, however, that data analysts don’t need a well-rounded toolkit of skills in order to excel at their job. Data analysts should be competent at programming and data visualization in addition to their core statistics skills, and since they will be responsible for ensuring the fidelity and accuracy of massive datasets, attention to detail is also a must.

Data Scientists

The difference between data scientists and data analysts is largely one of degree, not of kind. This is often literally true: data scientists will regularly have an advanced degree in a quantitative field (e.g. a Ph. D. in computer science or physics), and will also have a job that is a degree of magnitude more complex than that of a data analyst. Data scientists are the ones who give form to the analysis that they, along with data analysts, will conduct: they tackle open-ended questions and attempt to find important trends within datasets.

The skillset required of a data scientist is much broader than that that required of an analyst. While a data analyst may simply have an understanding of several programming languages, data scientists have to work with a wide swath of computerized data tools. They may be required to have a familiarity with data programming languages (e.g. R), database programs (e.g. MySQL, NoSQL programs), mapping software (e.g. D3.js, Tableau), and far too many other toolsets to list here.

Data Engineers

While similar in name to data analysts and data scientists, data engineers have a set of duties more distinct from those of the other two positions discussed. Data engineers are basically software engineers, responsible for the construction of the pipeline that funnels messy raw datasets into easily usable and well-organized databases and applications. Data engineers, in essence, construct the infrastructure upon which data analysts and data scientists conduct their analyses.

Data engineers, require a deep skillset in the construction and efficient maintenance of databases. They will use SQL languages such as MySQL and NoSQL database structures like MongoDB, work to both query and build out API’s, and look at ways to create streamlined databases with human-fault-tolerant pipelines. By building a data structure that continuously integrates with a variety of tools, data engineers can make the lives of the rest of the data team vastly easier.

In Conclusion...

Ultimately, data analysts, data scientists, and data engineers all have a vital role to play on any data team. Data engineers set the stage by building the infrastructure that stores and maintains all the data, and data scientists then use that infrastructure make the top-level decisions which guide data analysis. Finally, data analysts take the ball over the goal line by ensuring that the analysis itself goes smoothly. By understanding the different strengths and capabilities of all three positions, a company can maximize the value of its data.