Global expansion of Machine Learning Models: how to distribute them as software products?

How Software Engineering has helped us to abstract the domain of Customer Experience in the context of Machine Learning to easily distribute models.

Nubankers working together in the Nubank office. They are working on a notebook, sitting in front of a desk.

Written by: Ricardo Ocampo, Carlos Pegueros and Gabriel Ferreira

At Nu, we have a customer-centric culture, where we seek to provide customers with a delightful experience in each of the steps of their journey with us. Every aspect of our products are thought to be as simple and straightforward as possible, with the goal of empowering customers and making them love us fanatically 💜. As it would be expected, we treat Customer Support very seriously: having incredible customer service is one of the pillars of Nu and has already earned us several awards.

When we talk about Customer Services, there are numerous systems that need to function behind the curtains in order to provide the fantastic experience that our customers deserve and expect. 

Are you curious about these systems and how challenging it is to apply them to new geographies? Keep reading this article! 

Ticket routing

Currently, Nu has three main channels where customers can contact us: phone, email and chat. Every time a customer contacts us via any of those channels, we have the task of directing that customer to the most appropriate team to answer their question. We call this process “routing”.

A good routing is very important as it enables us to provide a better service to customers: they are answered by people specialized in the subject of their interest. They also receive a faster service and  don’t have to wait while being transferred (which can be very annoying). In general, when this process works well, we have faster service and more customer satisfaction.

If you want to go deeper into how this system works at Nu, don’t miss this other article.

Automatic replies

Whenever customers contact us by chat and email, we want to be able to provide thoughtful and fast answers to simple and common questions. This helps us ensure that customers don’t have to wait much to have their doubts solved. We trigger this kind of message when our systems evaluate that a pre-defined, personalized message could be enough to clarify the issue at hand. Our focus, again, is always on customer satisfaction: we want to solve their problems above anything. That’s why we don’t offer any friction in case the customer wants to talk to an agent.

The two mentioned systems (Routing and Auto-reply) are powered by Machine Learning (ML) models. The models receive customer inputs (i.e. what they typed in chat or in the email body) and they return a topic classification (which identifies the subject the customer is talking about). Such output is then used by our systems to decide whether we should send an automatic reply or route the customer to talk to an agent.

Models’ outputs and possible outcomes

Global complexity

Currently, Nu operates in Brazil, Colombia and Mexico, having over 70M customers. When we are talking about internationalization of such systems for different countries, the decision making systems and microservices are usually simpler to adapt and reuse. However, the Machine Learning Models may become a bottleneck and a challenge, as it’s not easy to reuse the model created in one country directly to another.

This difficulty in reusing models happens mainly due to:

  • Data sharing limitations related to each country’s regulations;
  • Differences in language across countries (e.g.: Portuguese vs Spanish);
  • Even among Spanish speaking countries, differences in vocabulary make reusing the model a non-trivial task;
  • Differences in the “stage of operations” in different countries:
    • products available (e.g.: credit card, savings account, investments);
    • maturity (e.g.: volume of tickets, size of the customer service team).

During the rest of this article we’ll explain in more detail the reusability problem of Machine Learning Models, how it affects scalability and how we are solving this problem at Nu.

Facing Scalability Problems

In the beginning, the team focused on fast iterations and acquired some technical debt along the way. Therefore, for each of our downstream tasks (ticket routing and ticket auto-reply) and channels (email, chat, phone), there was a model implemented with its own code. 

Having different implementations of the same problem caused scalability issues that impacted not only code maintenance, but also workforce redundancy. As for each downstream task and channel we had a different model, we were required to assign more human resources to each model as we were expanding to Mexico and Colombia.

But not everything was lost, because Nubank recognizes the value that MLOps add to the organization. By the time that we started to tackle the scalability issues, a framework called Sheep was already in place. The framework handles the lifecycle of a model’s development, is model agnostic, and provides the necessary tools to train and deploy models to our environments.

The framework has enabled us to perform lots of experiments and embrace the quick-iterating nature of Data Science.

By the time we started operations in Mexico, Brazil’s customer experience team already had around 6 independent models to address ticket routing and auto-reply for each of the three channels. On top of that, we followed the same pattern and added another 2 in Mexico (chat routing and auto-reply). After deploying such models, we started to experience some issues generated by code duplication. For example, if we found a bug in the ticket routing model, we had to fix it also in the auto-reply model. But not only that, sometimes a bug was not fixed in both code bases, and they started to diverge. The latter made it very difficult to maintain them.

Besides that, the code propagation was alarming. The preprocessing, training, inference and even the unit tests were almost identical. The only differences were the target and the preprocessing. The debugging process was very time-consuming, because even though the models were solving similar problems, each of the members that implemented the models added their own special flavor. Naming conventions, function definition, their scope and implementation were different across the models. This made even Pull Requests very hard to perform.

Fast forward to today and it seems that assuming models are completely independent from each other, as was in the original conception of Sheep, isn’t enough. On the other hand, thinking of an intermediate layer of abstraction that turns not only the code and structure of the models, but of the whole domains as black boxes, is beneficial.

How did we solve this?

The problem could be generalized as a text classification that is applied to different domains or downstream tasks. For example, (1) for ticket routing, the input is the customer’s description of the problem they are facing (text) and the output is the specialized team that can help them. (2) for ticket auto-reply, the input is the same and the output is the response that attempts to solve their question.

With all these considerations, we decided to tackle this problem focusing on the 4 key principles of how we build Machine Learning models at Nubank, plus 2 new additions:

  1. Validation should reflect real-life situations;
  2. Production models should match validated models;
  3. Models should be production-ready with few extra steps;
  4. Reproducibility and in-depth analysis of model results should be easy to achieve;
  5. Don’t Repeat Yourself (new);
  6. Small changes are preferable to large changes (new).

So, we decided to build a library to centralize all the common code across these models, following a config-driven approach composed of configurable steps and pipelines.

Why config-driven?

Because it allows us to expose models via a declarative configuration framework which comes with two advantages: configuration files don’t need to be tested (apart from making sure the configurations are valid) and the less code we expose to our users, the less room for bugs we leave.

Why steps and pipelines?

Because machine learning models can be well understood as a series of operations (a.k.a steps) that are chained together in pipelines that make up a model. This approach has two constraints:

  • To make the framework extensible, pipelines must be treated as a particular case of a step to make sure they can be composed if necessary;
  • Care must be taken when deciding what makes into to the library (i.e. what has its own “step” created) as we want it to be versatile and extensible, but only up to the point where we don’t sacrifice simplicity by overfitting to specific use-cases that don’t add value domain wise. Maintainers of the library play the role of gatekeepers, which is key to keeping the library aligned with the main goal of reducing maintenance costs of the models in production.

Along with the requirement to make the library play well with our current infrastructure, the declarative API was built on top of pydantic (a framework that provides data validation and settings management using Python-type annotations) to leverage three core benefits that make it a perfect bridge between our philosophy and Sheep:

  1. Pydantic instances are easy to serialize as they are JSON objects;
  2. Pydantic comes with a powerful validation framework out-of-the-box;
  3. Pydantic enforces the immutability of its instances.

With these 3 ideas in mind, you can picture the library as an abstraction framework to enclose Learner pipelines (the abstraction for models we use at Nubank) into immutable JSON-serializable configurations that provide completeness, in the sense that they are enough to instantiate a model.

In practice, these pipelines would be the black boxes that contain the domain knowledge of Customer Excellence models and can be understood as serializable factories of Learner pipelines. As these factories are immutable, it is guaranteed there is reproducibility so that two factories will always create the same model and the configurations are validated upon creation to make sure that, if something can be predicted to fail, it is.

Once a factory is serialized, distributing models to be trained and deployed in specific tasks is just a matter of distributing JSONs and to plug them into our infrastructure. They are passed to Sheep along with the appropriate parser functions built in the library that know how to turn them into Customer Excellence models.

The centralization of the code has made it easy to distribute models across different use-cases and it improves maintenance while keeping the models clean. It also allows faster code reviews, easier validation and fewer merge conflicts, at the cost of losing some flexibility. Finding the sweet spot between how much of it is worth losing has been a process of several months of trial and error with different levels of abstraction, but it has enabled us to learn a lot from the Customer Excellence domain and, even though there is always room for improvement, all of the benefits started to shine even since the prototyping stage.

The Success Story

When the first version of the library was released, it started to be used in 4 different models in Mexico that were deployed to production. Shortly after, Nubank decided to start integrating similar models in Colombia. Even with all the uncertainties and low headcount that comes with opening a new business in unknown territories, we were able to launch 2 different models in no time with such a low maintenance cost, that they could be seamlessly managed by the existing team. We are currently working with the Brazil operations team to test the solution there.

Today, this sums up to 6 different models, covering tasks such as Routing and Auto-reply (for Chat and Email) deployed into Mexico and Colombia using the developed library. The most surprising fact is that all of them use exactly the same code (thanks to the Library) and the only difference is the specific configuration and the underlying data used, which varies according to the downstream task, channel and country. 

This has reduced code duplication by 85% and, as our models are mostly composed of declarative configuration files, spreading a new feature (or a bugfix) to all of our models is just a matter of upgrading the version of the library.

This ability to quickly pay technical debt or propagate big changes without a great amount of code, reduces costs by cutting the amount of engineering hours having to be dedicated to maintenance. Besides that, it also gives the team time to improve our data and to experiment with all the new ideas that are emerging in the fast-paced field of Natural Language Processing, which has been key to deliver what has been a signature of Nubank since its creation: deliver a great customer service.

Enter your name