March 28, 2022

Direct Line with Saurabh Tiwary: What’s Next For Large Foundational Models?


Todd Graham

Working with M12 means access to some of the amazing minds at Microsoft. This has given us the privilege to name thoughts leaders like Saurabh Tiwary, Corporate Vice President for Microsoft Turing as a scientific advisor to M12. 
I recently had the chance to sit down with Saurabh to discuss his views on large language models and implications for natural language processing. In this third part of our three-part series, we discussed what the Microsoft Turing team sees as next for language learning models.
Don’t miss the first two parts of our series, focused on why large language models matter, and how large language models work.
The amount of compute required relies on having the resources to train them. Where do you see this going in the future? If startups are only able to fine tune foundational large models for their use cases, how do you see startups being able to control their own destiny?

Yes, there is a bit of uncertainty in th startup and academic community relating to where this heading. The economics are making it untenable for most people except the most well-funded organizations to invest in large language models. I will make the comparison to the semiconductor ecosystem. If you look at fabrication economics for semiconductor chips, they cost tens to hundreds of millions of dollars and have relatively short lifetimes. One needs very large volume usage to justify manufacturing a custom ASIC (Application Specific Integrated Circuits). Thus, we do not have that many companies fabricating chips. However, we have an entire software and systems eco-system which relies on these chips that have built massive industries around them. And, if you look at the biggest companies in the world (maybe, except Apple), they have very little to do with ASIC design and fabrication as part of their core business. I think a similar eco-system would pan out in the large-scale modeling space as well. We would have a few well-funded companies that would be training these extremely large and reusable models and other companies would build applications and services reusing and customizing these models. If you look at the recent Azure OpenAI API announcement, it is a first step in that direction. I think there will be more activity along similar lines where these large-scale AI capabilities would be further democratized. There are significant opportunity for startups to build the ecosystem – both in terms of platform capabilities and new applications. 

MSFT recently announced GPT-3 inference access on Azure. Is there anything especially relevant to the startup community you would like to share around this new offering?

Just as I alluded to in my previous response, this is a step along the lines of democratization of AI capabilities and moving them from walled gardens to general access. Today, one can build an MVP based on cutting edge AI in days-to-weeks. We still need to work on product-market fit and the right experience. But the advanced technology substrate can be leveraged through these APIs in an almost frictionless fashion. There are already quite a few startups that are well along this path and benefiting from this change.

On a more philosophical note, I think we, as a human society, might be transformed in terms of how we live and work based on the impact that these large-scale AI models are going to have. Most of the applications that we talk about today are still remnants of traditional machine learning. We would very likely see a spurt in the creation of new experiences like the code completion CoPilot offering. As an example, think of the promise of a personalized tutor for kids that helps them learn at their grade level caliber. Given the strength of these models, experiences like that are very much in the realm of feasibility. Also, as the models themselves become more democratized, I think they will reach and impact almost all aspects of our lives – kind of like what the semiconductor revolution has done for us today by ushering in an era of computers and internet. The only difference being that the changes might happen a lot quicker than over multiple decades. 

Looking ahead, what do you see as the next key achievement for large deep learning models?

As deep learning becomes more commonplace there are a few trends that are worth calling out. First, with a glass half empty view. Even though we are noticing large improvements in quality of results, we will start noticing saturation in some of these jumps for classical tasks in the next 1-2 years. By classical tasks, I am collectively referring to a large set of machine learning problems which were developed in the pre-deep learning era. Things like search, classification, entity tagging, etc. Most of these have already benefited from deep learning improvements. However, we would start hitting saturation due to limited headroom.

On the other hand, the trends for large, strong models would, very likely, continue going strong with two specific outcomes – unified models and stateful AI experiences.
Large models will start converging across modalities. Currently, we have English language pre-trained models that outperform universal models for English tasks. We are already observing some initial results that universal models outperform English specific models in English tasks. Similarly, multi-modals models are currently built to support multi-modal tasks. However, we will see convergence there as well. Ultimately, we should have a distilled, compact and ideally a single, pre-trained model which is then adapted for various applications. 

On the application side, we should see stateful AI experiences. If we think about it, most of the experiences today are state-less. If one takes search as an example, every time you type a query, it goes through the same process over billions of documents of retrieval, ranking, localization, freshness etc. We humans are stateful. Most likely, our first search query and the second one has some relationship with one-another. Things like personalization or localization introduce some of notion of state, but it is bare bones. In future, as models become powerful, we will see new applications where users interact with machines in a highly interactive way. The emergent behavior that we observe in large language models at particular sizes show trends relating to this more and more. It will open up a new set of opportunities for applications and services to be built. Really exciting and early days in this space.