This post was reviewed by: Luis Moneda, Tiago Magalhães, Jessica Sousa, Cristiano Breuel and Henrique Lopes
Data Scientists (DS) and Machine Learning Engineers (MLE) have been around for some time now (at least by tech standards) but that doesn’t mean the specific definitions and expectations for each role are well agreed upon in the industry as a whole.
Far from it. Very often people aren’t sure how exactly they differ – and where they overlap.
In this post we will share our take on this issue – that is, the scope of these roles at Nubank.
As you’ll see in the next sections, there are some dimensions to be analyzed here, so there is a short answer with the most important insights and a long answer where we explain what happens in more detail.
So how are DS and MLE roles different? Where do the similarities lie and where do they differ?
Short answer: It’s a spectrum of skills, and they overlap
Long answer: It depends
It depends a lot – mostly on the type of team you’re in.
Although general guidelines for each role can be defined, there will be a lot of variation on the typical day-to-day activities of a DS or MLE at Nubank, mostly depending upon the type of teams they are working on.
The two most important distinctions are usually:
- a) how mature the team is with respect to DS/ML solutions and
- b) whether that team is more focused on real-time or batch models.
As a rule of thumb, the more experience a team has in applying ML to business problems, the less overlap there is between data scientists and machine learning engineers.
In higher-maturity teams, the focus usually changes from ad-hoc implementations to scalable and cost-effective solutions.
The DS/BA overlap is present in all teams. Although DSs get to work in more specialized modelling as the team maturity grows, these two roles are key because they connect the “data world” to the “business world”.
In real-time/streaming oriented squads, MLEs will be closest to regular software engineers, as they will be executing many similar tasks here. In contrast, MLEs will be closer in scope to analytics engineers (AEs) and data engineers in squads where models run in batch or in long-running jobs.
The diagram below shows what we understand these team differences to be. We’ll analyze them in detail over the next sections.
Types of teams
1) Low-maturity, realtime focus
In a squad that is working on its first realtime model there is a lot of ambiguity in the air. It’s not yet clear what tasks need to be done – and by whom – so this results in lots of overlap between roles. Everyone is expected to play a more “generalist” part throughout.
- The MLE connects the “engineering world” and the “data world”. Communication skills are key here.
- They will need to wear several hats, so lots of overlap with Software Development Engineers (SDEs) and Site Reliability Engineers (SREs)
- Considerable overlap between data scientists and machine learning engineers!
- They will have to work together to agree on implementation details, iron out tradeoffs (performance vs speed, performance vs complexity, performance vs time-to-market, etc) and fix issues such as train-serve skew.
- Lots of overlap between data scientists (DS) and business analysts (BA) – lots of interaction needed to translate sometimes vague business problems into models, policies and actions to be taken.
- DSs will probably be expected to provide Product Managers (PMs) with technical expertise to help them prioritize tasks and define the squad backlog.
2) High-maturity, real-time focus
These are teams that have already had experience applying real-time models to a couple of business problems. People understand what each role is responsible for and what the usual challenges are. Focus shifts from implementation to maintenance, optimization and efficiency.
- Overlap between DS and MLE is greatly decreased, but there are still some tasks they may share, such as monitoring and debugging data problems.
- MLEs will spend a lot of time maintaining, debugging and monitoring existing models, so SRE skills gain in importance
- SDE skills are still very useful for MLEs – especially for building tools to make the processed simpler and more efficient (e.g. extracting common patterns into libraries, refactoring code, etc)
- Overlap between DS and BA decreases – the business problems are now clearer and data scientists can usually spend more time optimizing and tuning models.
- There will probably be no more overlap between DS and PM at this point – the BA will probably be able to translate business objectives into technical tasks (a tech manager may also participate here)
3) Low-maturity, batch focus
Batch-focused teams with no previous ML models will generally try and adapt their data routines (i.e. ETL flows) and/or scheduling managers (e.g. cron jobs, airflow) to support scoring in batch. Once again, lots of overlap between several roles is to be expected.
- As in the real-time scenario, there’s considerable overlap between MLE and DS roles – they need to communicate frequently so that they are aware of what each other is doing.
- Again, BAs and DSs will need to be in close contact to make sure that ML models will actually deliver business value.
- Analytics Engineering (AE) skills will come in handy for MLEs.
- Data preprocessing, cleaning, feature engineering will consume a lot of time to build and this is usually done in tools like Spark or relational databases (i.e. lots of SQL)
4) High-maturity, batch focus
A high-maturity, batch focused team will already have several models in production and most of the initial problems (deploying, data integrity, monitoring) will already have been solved for individual models, so focus turns to scaling/efficiency. Overlap between DS/MLE is decreased.
- Less overlap between MLE and DS, as the team already understands the types of tasks needed and people have clearer expectations.
- Slightly less overlap between BA and DS, but they will still need to communicate, as in high-maturity, real-time focus teams.
- DSs will be able to focus on more technical tasks such as optimizing models, adding new features, etc.
- Similarly, less overlap between DS and PM, for the same reason.
- Overlap between MLE and Analytics Engineers does not go away. Once you reach a given level of maturity in batch-oriented teams, the focus usually turns to making things more precise, more efficient and cheaper.
- This usually includes things like optimizing queries, ETL routines, and other flows related to data management.
- General software engineering (i.e. SDE) skills become important again for MLEs
- Efficiency and cost-effectiveness are also achieved through writing ad-hoc tools and refactoring/deduplicating responsibilities across models.
5) Horizontal/support teams
There are Machine Learning Engineers (rarely, also Data Scientists) who work in horizontal support teams. It’s hard to fit them into the above descriptions, so what happens to them?
Well, first of all, what do we mean by horizontal teams?
In the realm of data science/machine learning, horizontal teams are cross-squad teams that work with several business units at a time, providing support and building tooling and platforms for the rest of the company to use.
This is as of recently also been referred to as MLOps (ML Operations).
- In support/platform teams, there usually are no data scientists. Most platform/tooling work is not so different from regular SDE/SRE work, so it can be done by machine learning engineers.
- It’s important, however, for horizontal MLEs to at least understand the basics of data scientists’ work so they can best support those.
- The SDE overlap is big here – horizontal teams are usually responsible for building/maintaining tools used by other teams – so general software engineering skills are essential.
- Similarly, for AE skills – batch flows will also need tooling, support, etc.
- SRE skills are also important, because cross-team tools and processes will inevitably fail, requiring support, on-call rotations, monitoring, debugging, etc.
- Even if the team has a dedicated PM, MLEs will need to wear a PM hat to help prioritize what needs to be done.
Suggested Role descriptions
With all of these specificities in mind and regardless of what type of team, there are still some core activities that are unambiguously within the scope of Data Scientists and Machine Learning Engineers, respectively.
Data Scientist: Suggested Role Description
- Data preparation: data preprocessing, data cleansing, building tables, feature extraction, feature selection
- Modeling work: sampling strategies, modelling, training, evaluating, optimization
- Communication: explain and present analyses, conclusions and tradeoffs to stakeholders and decision-makers.
- Business focus: Help business experts define which problems should be handled by DS/ML (and how).
- Model monitoring/debugging
- Analyses and estimates related to the business impact of models (e.g. define policies, score thresholds, etc)
Machine Learning Engineer: Suggested Role Description
- Implementation: All work related to taking a model and integrating it to wherever it is going to be run in production (regardless of whether it’s batch or real-time).
- Deployment/lifecycle support: Take care of CI/CD routines and fix problems as they arise. Develop ad-hoc solutions such as scripts.
- Communication: bridge the gap between engineering (batch and real-time), data scientists and business stakeholders.
- Business focus: Help DS and stakeholders understand the implementation effort/cost of decisions, help them by suggesting tradeoffs.
- Model monitoring/debugging
- Help stakeholders and leaders define which problems should be handled by DS/ML (and how).
A modern company is made up of several roles and of course we did not include all of them in the diagrams. Among those that may somewhat interact with DS/MLEs, we have:
- Software Development Engineers (SDE)
- The SDE is your regular generalist software engineer. An SDE builds systems, works on regular (non-ML) systems and performs related tasks. SDE responsibilities mostly overlap with those of MLEs.
- Site Reliability Engineers (SRE) (aka “Production Engineer”)
- SREs are responsible for monitoring, troubleshooting systems, dealing with outages, on-call support, etc. Again, MLEs will often wear the “hat” of SREs – mostly where ML-based systems and tools are concerned.
- Business Analysts (BA)
- BAs are generalist analysts whose main job is to push business objectives using data as the main source of information. Very often, data scientists interact closely with BAs to work out business objectives and how they related to models, etc.
- Analytics Engineers (AE)
- Also called “data analysts” in some other companies, they are usually tasked with maintaining data integrity from a semanticpoint of view. Their responsibilities include database management/maintenance, query optimization, SQL-based tooling and general ETL routines.
- Product managers (PM)
- These are people who think about the customer’s needs and they usually drive the backlog. A data scientist may be expected to communicate/translate modelling concepts to PMs. Senior data scientists may even “wear a PM hat” from time to time.
- Data Engineers (DE)
- Data engineers work ensuring the integrity and overall quality of the data used by the company as a whole (not just DSs and MLEs). Although DEs usually work at a lower level of abstraction (optimizing databases, tables and the like), there may be times where an MLE (or even DS) may have to interact with them to troubleshoot data problems (usually in batch-focused squads).
- Operations Analysts (Ops)
- Operations teams are the user-facing positions for B2C companies, dealing directly with end-users and solving their problems in an individual manner, via social media, CRM tools, chat/email, etc. Both data scientists and machine learning engineers may interact with Ops Analysts because they are sometimes the downstream consumers or stakeholders for models.
All information here is to be taken as a rough guideline only! While we have tried to make the text as widely applicable as possible, what works for us at Nubank may not necessarily work for everyone.