How we scale our data platform efficiently and reliably

Integrating Analytics Engineers into all cross-functional teams: a key success factor to scale a data platform efficiently and reliably

Nubank Office

Nubank is one of the world’s largest digital banks with 35M customers in Latin America as of April 2021. Its products range from credit cards and loans to bank accounts and life insurance. Data is an inherent part of every decision: we use it for analytics, monitoring, automation and product personalization.

As the company grew, we kept evolving our data organization to meet its needs. First, creating a central team to build the data infrastructure. Second, decentralizing the creation and consumption of data to users in cross-functional teams (squads). Third, increasing the central team’s scope and staffing to provide company-wide tools and standards.

As a self-service platform, the data ownership was spread out in all business units (BUs); however it wasn’t clear who was accountable for maintenance and applying those new standards. 

Yet, maintaining quality data was key for all the products and reporting depending on it, as well as for the efficiency of teams consuming that data on a daily basis.

We needed to solve that challenge: how can we make sure our organization structure reflects the company’s data goals of agility and quality?

In this article, we’ll share with you: 

1 – The governance challenges of a self-service data platform

2 – Designing a hybrid organization to scale innovation with quality guarantees

3 – Implementation and impact

The focus of the data team over time

Part 1 : The governance challenges of a self-service data platform

Quality controls vs. low-friction

The goal of a data platform is to foster innovation with a user-friendly and efficient way to find, use, add and create data.

It needs to do so while limiting any risk that could come from data, whether that’s the quality of the contributions and meta data, the security and protection of any personal and confidential data, the integrity of the data processing or the processing costs.

In our platform, we decided to provide a very low friction experience to add tables to our data pipelines. This resulted in many tables created – in April 2021, we had 35K non-raw tables created by 500 contributors across the company’s squads – and many opportunities seized thanks to this data. 

However, while we didn’t want to impose strict quality rules to all contributions, we did expect the key data to apply our high-quality standards and tools, to control risks and improve data consumers efficiency.

The main complexity lay in defining and assigning those quality responsibilities to reach our goals, without creating unnecessary bureaucracy.

Data quality ownership

As explained in a previous article, the data platform team successfully kicked off a data quality initiative by creating a layer of quality data (“core datasets”), new libraries to structure the code and the documentation, and tools to get alerted on quality checks anomalies. 

To launch by example, the central team created the first “core datasets” – half a year later, 75% of all datasets were using those high-quality datasets. However, it aimed to focus on the platform tooling and hand over the ownership of high-quality datasets to the business units.

The Data BU did not have the domain expertise nor the capacity to curate and maintain all the key datasets, while most cross-functional BUs focused on new applications and analysis for their products rather than engaging in data quality actions.

We had to make sure the responsibility for data quality implementation would not fall into the cracks of the organization.

Data impact awareness

As the platform was self-service, most teams were owning some tables, created most frequently by analysts and data scientists. However, teams did not necessarily realize the impact their data could have on others; how much they were dependent on other teams’ datasets; or how much time was spent on data crunching.

We understood we needed to provide more visibility on the responsibilities and risks at stake when creating or using a dataset.

We created a data governance visibility dashboard that showed for each team or business unit, the main metrics for data inventory and the main interdependencies they had up and downstream.

We also gave recommendations on what to do: what are the datasets you should maintain, create, review, add documentation to, etc. Moreover, we ran surveys to show the increasingly high time spent on finding data as the data inventory grew in terms of volume and complexity.

As the importance of data became more measurable and visible, it became clearer that we needed an accountable team to implement those data governance recommendations and design a coherent data strategy with each business unit.

Extract of our data governance visibility dashboard 

Part 2: A hybrid organization to scale innovation with quality guarantees

The central data platform team

Our data platform team focuses on providing the infrastructure, tools and rules that enable consumption and production of data in the most effective and safe way, rather than creating datasets for consumers. It is accountable for the platform objectives of user satisfaction, data quality, integrity, privacy and costs.

We considered scaling the team focused on data quality and strategy within the central data team, however, we decided for a distributed organization for two main reasons: 

  • First, the company has a culture of multi-functional teams, which proves very proficient for team motivation and deliveries. 
  • Second, the diversity and complexity of our financial products increased the importance of domain expertise.

The Analytics Engineering function

That’s where the Analytics Engineer chapter (ie. function) comes into play: its main goal and responsibility is to define a data strategy, create and maintain datasets and pipelines, making sure they are robust, accurate, data-protection compliant, cost-efficient and well documented.

The Analytics Engineers are embedded in cross-functional teams but report primarily to the chapter. They are accountable to deliver value to the squads they are embedded in, but also for the quality of data for the company as a whole.

Not only do they have the skills to translate a business need into a data strategy with deliverables, they also have the engineering skills to create robust code, with testing. They understand the intricacies of data pipelines, the impact of dependencies between datasets as well as database architecture and modeling concepts.

Keeping the chapter close-knit while being integrated in multi-functional teams enables us to define a coherent data strategy and modeling across the company.

One of the key aspects of guaranteeing quality of the data throughout the company, and subsequently on all the decisions made on top of it, was to scale the Analytics Engineering chapter into all the different business units.

Simplified representation of our organisation: in practice, we have ~50 Analytics Engineers and ~60 Data Platform Team members and we keep growing

Shared objectives and responsibilities

The data platform central team and the Analytics Engineering chapter have common goals and shared responsibilities to reach them, officialized in a data governance policy.


In each business unit, the Analytics Engineering team makes sure there is a good balance between prioritizing creating new data for new applications, and having the right quality guarantees on the data that this team owns.

For the contribution process and rules, it is also coordinated by both teams, and evolving the rules as the company changes.

We kept the general contribution rules to a minimum and enforced stricter rules for certain kinds of datasets such as the high-quality layer of core datasets. In this case, the analytics engineers conduct a more thorough review both in terms of business logic and code optimization, documentation of the datasets and its columns, etc.

Part 3: Implementation and impact

Explaining the role and impact

Analytics Engineers’ work is not necessarily as well known as other functions like Business Analysts or Software Engineers. So a key part of increasing data responsibilities in each business unit was to explain the importance of that work, create some incentives to prioritize it, and hire Analytics Engineers.

By the end of 2020, we had successfully deployed Analytics Engineers in various multi-functional teams. Together with the different business units, we planned to increase analytics engineering coverage in 2021 by doubling our team to over 70 AEs.

The impact is more quality data, with more structure, meta data, documentation and verified calculations. This translates into more efficiency, less risks and more collaboration for data users that consume this curated data. Moreover, we can design and implement a coherent global data strategy, ultimately resulting in more agility to create innovative products and to make data-driven decisions reliably.

A successful implementation example

The Marketing business unit is sending personalized communications to engage our customers with relevant content. Marketing analysts, data scientists, communicators, designers are involved. 

Our customers’ data is key to make sure the models built by the data scientists and the analysis by our marketing analysts are accurate. The analytics engineers in the marketing team are responsible for designing the architecture of the tables, naming and calculating the metrics, documenting and classifying data, making sure any personal data is safely treated, and maintaining the dataset for all the company to be able to build upon it too.

Being embedded in the BU has enabled the analytics engineers to suggest adapted solutions by getting all the context, and being part of the chapter organization ensures the reliability and quality of the data is taken into account, whilst minimizing the processing of personal data. 

The project was kicked off successfully and many other teams have access to the source-of-truth for marketing communications.

Conclusion

As it scales, a data platform does not only need to adapt its technology stack, but also the people organisation around it: how the teams consume and publish data, who is accountable for quality, integrity, maintenance, costs, user experience etc. Over time, the governance system adapts to the company stage, organisation structure, and business goals.

After focusing our efforts in making the data platform frictionless and easy-to-use, we created a hybrid data organisation to reach our data governance goals.
A central data platform team responsible for the infrastructure, tooling, and visibility. An analytics engineering team, embedded into the business units, accountable for data quality, privacy and strategy. 

One of the challenges is making sure all the BUs have the incentives to hire those Analytics Engineers into their teams, and collaborate on defining their roadmap. So far, visibility, tooling and successful cases in many BUs have convinced the others to scale those capabilities, and we are excited about growing the team across the whole organization! 

Many credits to the teams of the Data BU as well as the Analytics Engineering function!

Enter your name

  • Scaling Data Analytics With Software Engineering Best Practices - Building Nubank
    July 15, 2021 - 3:31 pm
    […] terms of team organization, we created the Analytics Engineering role, a dedicated function for data strategy and management, […]