most read
Software engineering
Going agile: do less to deliver more Aug 14
Software engineering
Why We Killed Our End-to-End Test Suite Sep 24
Culture & Values
The Spark Of Our Foundation: a letter from our founders Dec 9
Careers
We bring together great minds from diverse backgrounds who enable discussion and debate and enhance problem-solving.
Learn more about our careers
This post was reviewed by: Luis Moneda, Tiago Magalhães, Jessica Sousa, Cristiano Breuel and Henrique Lopes
Data Scientists (DS) and Machine Learning Engineers (MLE) have been around for some time now (at least by tech standards) but that doesn’t mean the specific definitions and expectations for each role are well agreed upon in the industry as a whole.
Far from it. Very often people aren’t sure how exactly they differ – and where they overlap.
In this post we will share our take on this issue – that is, the scope of these roles at Nubank.
As you’ll see in the next sections, there are some dimensions to be analyzed here, so there is a short answer with the most important insights and a long answer where we explain what happens in more detail.
Short answer: It’s a spectrum of skills, and they overlap
Check our job opportunies
Long answer: It depends
It depends a lot – mostly on the type of team you’re in.
Although general guidelines for each role can be defined, there will be a lot of variation on the typical day-to-day activities of a DS or MLE at Nubank, mostly depending upon the type of teams they are working on.
The two most important distinctions are usually:
As a rule of thumb, the more experience a team has in applying ML to business problems, the less overlap there is between data scientists and machine learning engineers.
In higher-maturity teams, the focus usually changes from ad-hoc implementations to scalable and cost-effective solutions.
The DS/BA overlap is present in all teams. Although DSs get to work in more specialized modelling as the team maturity grows, these two roles are key because they connect the “data world” to the “business world”.
In real-time/streaming oriented squads, MLEs will be closest to regular software engineers, as they will be executing many similar tasks here. In contrast, MLEs will be closer in scope to analytics engineers (AEs) and data engineers in squads where models run in batch or in long-running jobs.
The diagram below shows what we understand these team differences to be. We’ll analyze them in detail over the next sections.
Types of teams
1) Low-maturity, realtime focus
In a squad that is working on its first realtime model there is a lot of ambiguity in the air. It’s not yet clear what tasks need to be done – and by whom – so this results in lots of overlap between roles. Everyone is expected to play a more “generalist” part throughout.
Key points
2) High-maturity, real-time focus
These are teams that have already had experience applying real-time models to a couple of business problems. People understand what each role is responsible for and what the usual challenges are. Focus shifts from implementation to maintenance, optimization and efficiency.
Key points
3) Low-maturity, batch focus
Batch-focused teams with no previous ML models will generally try and adapt their data routines (i.e. ETL flows) and/or scheduling managers (e.g. cron jobs, airflow) to support scoring in batch. Once again, lots of overlap between several roles is to be expected.
Key points
4) High-maturity, batch focus
A high-maturity, batch focused team will already have several models in production and most of the initial problems (deploying, data integrity, monitoring) will already have been solved for individual models, so focus turns to scaling/efficiency. Overlap between DS/MLE is decreased.
Key points
5) Horizontal/support teams
There are Machine Learning Engineers (rarely, also Data Scientists) who work in horizontal support teams. It’s hard to fit them into the above descriptions, so what happens to them?
Well, first of all, what do we mean by horizontal teams?
In the realm of data science/machine learning, horizontal teams are cross-squad teams that work with several business units at a time, providing support and building tooling and platforms for the rest of the company to use.
This is as of recently also been referred to as MLOps (ML Operations).
Key points
Suggested Role descriptions
With all of these specificities in mind and regardless of what type of team, there are still some core activities that are unambiguously within the scope of Data Scientists and Machine Learning Engineers, respectively.
Data Scientist: Suggested Role Description
Should do
May do
Machine Learning Engineer: Suggested Role Description
Should do
May do
Related
Other roles
A modern company is made up of several roles and of course we did not include all of them in the diagrams. Among those that may somewhat interact with DS/MLEs, we have:
All information here is to be taken as a rough guideline only! While we have tried to make the text as widely applicable as possible, what works for us at Nubank may not necessarily work for everyone.
Check our job opportunies