Debanjan Mahata, Ph.D.
Senior Machine Learning Research Engineer
Bloomberg
FOR A LONG time, to the average person “machine learning” just seemed like one of those techy buzzwords with a vague meaning and an even more vague relevance. But in the few short months since AI jumped out of sci-fi movies into our everyday lives, the technology behind it has taken its rightful place at the center of our existence, especially our work. To try to get a basic understanding of ML, we turned to an old friend, Debanjan Mahata, who earned his doctorate at UA Little Rock and now applies his knowledge for Bloomberg.
When you first got interested in machine learning, what was it about this particular field that grabbed you so much?
It started back when I was doing work for my graduate degree in India. At that time, machine learning was not as popular as it is now. I was looking for a good thesis topic that I could build on, and I was mostly interested in areas like computer networking and operating systems. What intrigued me about machine learning at the time was that it presented a lot of opportunities for me to work on and develop it. The more I read about it, the more I thought that machine learning was probably the right area for me, so I started exploring that field and have never looked back.
You studied with Dr. Nitin Agarwal at UA Little Rock, right?
Yes, for my initial two years. During that time, I was looking for a Ph.D. position, and he was a student of Huan Liu at Arizona State University. After Dr. Agarwal graduated and moved to UA Little Rock, I approached his former adviser, who said, “Why don’t you go work with Dr. Agarwal? He has a new lab and he wants students.” So I moved to Arkansas and started my studies with Dr. Agarwal. I especially remember the social media mining course that he taught, but the course that was closest to machine learning was the data mining course taught by Dr. Ningning Wu. Later on, I obtained my Ph.D. under the supervision of Dr. John Talburt, in which I applied machine learning in the area of social media mining.
Well, tell me—why isn’t machine learning called machine teaching? Aren’t you teaching the machine?
Yes, we do teach machines by feeding them lots of data, and the machines learn from it. Data is certainly the new fuel.
The term “machine learning” was first coined in the 1950s by Arthur Samuel, who was a pioneer in the field of artificial intelligence, and that’s the term which got adopted by the community. What we do is produce algorithms to automate the process of helping the machine learn. In this field, a “machine” is more like a software application or program that learns from data. And just as we would teach any small child that, “This is A, this is B, and this is C; this is 1, this is 2, and this is 3,” we can show pictures of the different letters and digits and have the machine learn to distinguish between them. So, through these algorithms, the machine forms an internal latent representation of those pictures.
There’s another component to that part, which is known as a loss function. In a way, this is the teacher. It gives feedback to the machine. We start by getting the machine to randomly guess something. Say we show a picture of a cat and the machine randomly guesses that it’s something which is not a cat. We say, “Okay, this was wrong.” Then the machine takes this feedback and starts to learn about the actual representation, which is narrowed to the representation of a cat.
Similarly, when it starts learning and starts predicting in a correct way, we also give it feedback that this is correct. The idea is to optimize these loss functions in such a way that the machine can learn from the feedback and slowly converge toward the correct representation.
Is it very hard to do, very tedious? The machine learning that we know now can do all kinds of things, but how long has it taken to get to this stage?
It has taken some time. Many of the techniques that we use today were developed in the 1990s. By that, I am referring to the core underlying techniques and the math behind them. However, for those techniques to become effective, they require lots of data to learn, and this data was not readily available at that time. When giant tech companies like Google, Meta (previously Facebook), Yahoo, and others started looking into attracting consumers, things like recommendation systems, marketing, and especially search generated a lot of revenue for them. So, they started thinking about how to make more money by understanding everything they could about consumer behavior. The route they took for this was to leverage machine learning by developing infrastructure that can not only generate data, but also store and process this data, and make it easier for these machine learning algorithms to consume this data and learn from them.
That’s how these technologies got developed over the last decade. Now we’re in a position where we’re throwing all the data from the entire Internet into these machine learning algorithms, and you can see the impact. ChatGPT is one outcome.
For industry, the main motivation was obviously to make money. For academia, it was the problem of how to mimic how the human brain learns. What we currently know about the human brain is that there are these connections between neurons. So, when we learn some things, certain connections get triggered, and these connections get stronger over time as we learn more and practice something.
Neural networks, which are the most popular machine learning models today, are built following the same concept. All these neurons are connected with one another, and as these networks see more and more data, they learn from the data and some connections get strengthened, and other connections get loosened, depending on what you want to learn.
There’s a lot of math under the hood here. All the algorithms and all the mechanisms that we’ve seen in machine learning are taken from three major areas of mathematics. One is linear algebra, the other is calculus, and the third area is probability and statistical theory. We try to develop new algorithms that can learn better representations. To do that, a researcher or someone who wants to develop these new algorithms must have quite a good grasp of these mathematical fields. They also need to have a good grip on computer science, which gives one the ability to engineer these solutions—because you’re going to need to use computing infrastructure to train these models and deploy them.
I’ve heard the term prompt engineer—is that what you are?
No, I don’t do prompt engineering.
What’s the difference between what you do and prompt engineering? Prompt engineering is apparently a very hot job right now.
As you’ve said, prompt engineering is a new and quite lucrative area, and the entry barrier is quite low. You don’t have to have any kind of deep machine learning knowledge to be a prompt engineer.
Why do they get paid so much if they don’t have to know all this stuff?
The prompt engineering job became more prominent as large language models gained attention. The machine learning community has been studying language models for quite some time, and the major motivation behind these language models is for the machine to learn human language—the way we speak, the way we write, and so on. One of the fundamental ways language models learn is by being given a piece of a sentence and then asked to predict the other part or predict the next word.
So now people have played around with this kind of methodology and objectives and they’ve come up with various new objectives, and they’ve finally developed the technological innovations necessary to enable them to scale this process to massive amounts of data—all the data on the Internet, as I was saying earlier. So, these systems are capable of processing and learning from all this data so that the machine can mimic our writing and our speech.
This area is known as generative AI. What people do is deploy these models and provide them as a service, and then it’s necessary to “prompt” these models. When I say prompt, you can think of it as questioning these models and expecting the model to give back the right answer. Suppose you’re a writer and you have a piece of writing, and you prompted a large language model—for example, ChatGPT—and you said, “Check the grammar of this particular paragraph.” In this way, you’re interacting with this large language model by questioning it or asking it to do something. In layman’s terms, that’s what we call prompting.
So now this prompt is converted into a representation which the language model understands, and it triggers certain layers and representations in those layers within the language model, and it gives you back the answer, which could be a grammatically correct piece of text. This is prompting, and all the engineering behind it is known as prompt engineering.
What’s the engineering part? Different large language models behave differently depending upon how they are trained, what data they are trained on, what the number of parameters is, and what different tasks they’re trained on. So given that these are very general models, and they have knowledge about everything, if you want to get something very specific out of it, you must ask an intelligent, well-formed question. Asking this question or prompting the language model in an intelligent way with a proper understanding of how the underlying model works is the essence of prompt engineering.
What makes a good prompt engineer, and what makes a bad one?
A bad prompt engineer could be any person asking random questions to a large language model without understanding what these models are capable of, what these models can give you, or how to properly frame the questions. There are so many of these models, and each one of them is trained in a different way with a different data set. Some of the models are instruction fine-tuned, which means they’ve already been trained on certain instructions; others are not instruction fine-tuned, and they can just spit out anything without properly understanding the context that you provide. A good prompt engineer would have the background knowledge to understand what they are prompting and which model to use in order to get their task done.
Tell me about the machine learning engineering work you do for Bloomberg.
I can talk in general terms about what a machine learning engineer at Bloomberg does. I joined Bloomberg as a machine learning research engineer. As that position title indicates, you need to do both research and engineering. There are many other companies where someone does only research, someone else does only engineering, but that’s not the culture at Bloomberg. It’s a very open culture, which is all about getting the work done, so we get to do things from end-to-end. This is something I particularly enjoy about the role.
I do machine learning research whenever I need to, because there are many tasks that I cannot solve just using the existing machine learning models out there. What we work on is very specific to Bloomberg’s clients’ needs, and the data is finance-focused, so it is very different from what is commonly used in academic ML tasks. That sometimes makes these tasks more challenging and I have to formulate how I’m going to solve these machine learning problems. That’s where my machine learning research background comes in. The engineering part comes when I execute the plan I’ve developed during my research. In order to make a product available to our clients, we have to do a lot of engineering—that’s where our computer science and engineering background really gets used.
How much better can machine learning become?
That’s a tough question, but I think at this point it’s quite clear that if you feed the machine a lot of data, it still has more capability to learn. Today there’s a trend of building extremely large models, which have 175 billion parameters (GPT-3) or so, and there’s a rumor that the newest version of GPT might have one trillion parameters. Even all the data on the Internet is not enough for these models.
We’ve already seen how machines can mimic human writing, human speaking, human vision, and many other things that humans do, so we know for sure that machine learning works. This makes it a very promising area to dive into in order to teach machines to do a lot more of the things that we humans do.
What does machine learning mean to people who don’t have a company and don’t create data-gathering products?
We’re slowly seeing machine learning being made increasingly available in different applications to different kinds of consumers like you and me for use in our daily lives. One of the most important use cases for machine learning is being an assistant. For example, a Tesla car has self-driving capabilities, which makes driving so comfortable and pleasant. It also acts as an assistant driver—if you look at the underlying technology inside it, it’s using things like object detection and many other machine learning technologies.
Likewise, for a writer, ChatGPT is already being used in many writing applications. It could be proofreading your writing: “Give me all the spelling mistakes, give me all the grammar corrections that I need, suggest something where I can improve.” All these functions can be easily automated.
Let me add one more thing. Although this isn’t something for a common person like you and me, we have made great advancements in modeling sequences. So, if you look at language models, they learn sequences of text, sequences of characters, sequences of words, and sequences of sentences. This fundamental technique is also being applied to sequences of amino acids for protein modeling, DNA sequences, and whatnot. These technologies are also becoming very advanced and are being applied by the pharmaceutical industry and various other industries, so it’ll also slowly get into medicine and various therapies, as well as various other ways that will influence our lives when we take treatments maybe 50 years or so in the future—or maybe even sooner than that. Hopefully, this will bring great improvements to healthcare, as well as to other parts of our lives.