There is a book by three economists that I really like called Prediction Machines. They nailed it in figuring what machine learning does and why it is so valuable. In simple terms, ML provides us with cheap predictions.
This might not sound very impressive, but what we need to realize is that predictions are in more places than we might realize. Prediction is not only about the future (although it could be, for example, if you want to predict future demand to manage your storage), it’s about generating information about the present and the past.
This happens when prediction classifies credit card transactions as fraudulent, a tumor in an image as malignant, or whether a person holding an iPhone is the owner.
It’s no different at Nubank.
Prediction Machines at Nubank
We use prediction machines to predict the topic of a customer’s email message, allowing us to better route them to the most adequate customer support specialist; we predict if someone will default on a loan before giving them credit; and, of course, we use prediction to figure out if a credit card transaction is fraudulent.
Knowing that ML is, essentially, prediction also allows us to better communicate with other teams. We can explain that, if a business problem can be framed as a prediction problem, we can probably solve it. Not only that, but when we explain ML as this superpower of prediction, non-technical leaders can better allocate data scientists, without getting caught up in the hype that we can do all sorts of magical miracles.
When all you have is a Hammer…
Seeing ML through the eyes of prediction is a powerful limitation. It bounds us to a specific set of problems, making us data scientist specialists in it. But it is, nonetheless, a limitation. As it turns out, a modern company has tons of problems that we can’t frame as prediction tasks (I suspect the majority of them).
A lot of questions a company like Nubank needs to answer involves tricky decision-making optimization: we need to choose, from a set of possible options, what is the best action to take. What credit limit should I give to each customer in order to maximize long-term value (NPV)? What interest rate should I charge on a loan? How many marketing emails should I send so that I can maximize conversion without pissing people off?
Generally speaking, these are decision making problems where you have a bunch of options to choose from and you want to know which one is the best. Often, you have some sort of business metric that you can’t directly control (sales, # of customers, PNL..), but that you can influence through some sort of lever (advertisement, price, customer service). The question then becomes how should you set this lever.
At first, we tried to answer those questions with what we were good at: predictive models. But we quickly realized that not only that approach didn’t work, but it also made the problem harder. As our Chief Data Officer once told me: “when all you have is a hammer, everything starts to look like a thumb”.
Attracting more customers
To see why that is, consider a problem every tech company has: attracting more customers.
Here, let’s say you have some costly marketing strategy, like giving a coupon for new users. Some people will only use the coupon and then become long term customers (user-1 in the image), but some will just use the coupon and churn right away (user-2).
The obvious question is then, who should you send the coupons to? What is not so obvious is how to answer this question. One might think that we can use a predictive ML model here. OK. Let’s say we do. Let’s say we create a super accurate model that predicts who will convert and who will churn. We use a bunch of features, boosted trees, do the correct cross validation and managed to get a model that predicts, with very high AUC, who will convert.
So, for each new customer, your model gives a number that tells who is more likely to convert. Now what? You still haven’t answered the original question: who should you give coupons to?
Should you give them to the customers that are likely to convert? But what if they will convert anyway, why waste money giving them a coupon? Should you then give them to those with low propensity to convert? But what if they won’t convert regardless, even with the coupons? Then you are just wasting money again!
If it looks hard to answer this question, that’s because your model is not predicting what you want. Your model is very good at distinguishing between those who will from those who won’t convert. But what you really want is to distinguish between those whose conversion will increase the most with a coupon vs those whose conversion will increase the least.
In other words, you don’t care about conversion probabilities, you care about how they change with coupons. To make this clear, let’s walk through an example.
Prediction machines: which customer would convert?
Let’s say you have two types of customer. The first type (green) has a higher baseline conversion and doesn’t need a ton of coupons to convert, while the second type (blue) has a lower baseline conversion and will only convert with more coupons (or high valued coupons, depending on how you frame the problem).
Your model predicts conversion, so, if you use it to optimize your coupon strategy, say, by giving them to those with higher conversion, you will be giving them mostly to customers who don’t need a ton of coupons to convert (green).
What if you give them instead to those with lower conversion probabilities? In this case, you will target those who in fact need more coupons to convert, so you would be doing the correct thing.
But notice how this is accidental. Here, it turns out the type with lower baseline conversion also needs more coupons to convert, but you could very well have a situation where the type with lower conversion (blue) also needs less coupons to convert, as in the following image.
In this second case, the type with higher baseline conversion (green) also needs more coupons to move the needle, while the type with lower conversion (blue) needs less coupons to be happy.
The key takeaway here is that conversion is not necessarily distinguishing delta conversion. Or, in other words, conversion is not necessarily distinguishing how much each customer will respond to more or fewer coupons.
As a consequence, using predictions for an optimization task like the above can be suboptimal at best and very costly at worst. Prediction models are fascinating tools, but that doesn’t mean they are suitable for every task.
Causal inference to the rescue
Fortunately, you don’t need to go on hammering your thumb. I’m here to show you a more fruitful path. The trick here is realizing that you don’t need to estimate conversion. Instead, you want to estimate the derivative of conversion or marginal conversion, as economists would call it.
Marginal conversion tells you how much conversion will increase given a small (unit) increase in coupons. This is great for optimization because it shows where you can get more bang for your buck.
For instance, the point where the derivative is maximized corresponds to the point in the curve where conversion is the steepest. There, an increase in coupons will give a huge increase in conversion.
There is only one little problem. It is not like people go out with sensitivity tattooed on their foreheads. As it turns out, this is not an observable quantity. While we DO observe if people converted or not given a coupon quantity, we DON’T observe how sensitive they are to coupons. This is what we call the fundamental problem of causal inference.
We can never see the same person under two different treatment regimes (coupons here). As a consequence, we can’t solve this problem with a predictive model where we plug X as the features and Y as the outcome, simply because the Y that we want, that is, sensitivity, is not unobservable.
For that, we turn to causal inference, which is precisely the technique we can use to estimate sensitivity (also called the treatment effect) and answer what-if questions. Causal Inference appears at the top of the list of most important statistical ideas in the last 50 years, which is well deserved.
Casual interference: responding to what-if questions
We all have heard that correlation doesn’t imply causation. Causal inference helps us say what does. It is a fundamental concept in epidemiology and econometrics, but it is now becoming widespread in the Data Science field. I guess because it solves a huge class of problems that traditional ML struggled at.
Causal inference answers what would happen if I do this instead of that, which is a key thing to know in every action taking or personalization question we need to answer.
Learning from data what would happen under different possible decisions we wish to make is also present at every Nubank’s business units, which is why we research causal inference so hard. We need to know how the spend level of a particular type of customer will change given a limit increase and if that will compensate for the associated risk increase.
We need to know which type of customers will respond better to a Lending cross-sell email versus an investment email. We care a lot how customers respond to a change in interest rates. All of those are causal problems because they ask what-if questions.
Cracking them is much more involved than throwing everything into an ML pipeline and getting predictions. We need to craft well designed, randomized control trials, control for biases that tricks us into finding causation where there is only correlation and leveraging natural experiments.
Causal inference is such a huge and interesting topic that I can’t cover everything in one post. But rest assured. As causal inference has become more and more relevant to the Data Science team in Nubank, you can bet we will be releasing more stuff on it. In the meantime, if you want to learn more about it, I’ve curated a list of interesting (and obviously biased towards Economics) lists of resources. Enjoy!