Categories
AI

OpenAI is reducing the price of the GPT-3 API — here’s why it matters

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.


OpenAI is slashing the price of its GPT-3 API service by up to two-thirds, according to an announcement on the company’s website. The new pricing plan, which is effective September 1, may have a large impact on companies that are building products on top of OpenAI’s flagship large language model (LLM).

The announcement comes as recent months have seen growing interest in LLMs and their applications in different fields. And service providers will have to adapt their business models to the shifts in the LLM market, which is rapidly growing and maturing.

The new pricing of the OpenAI API highlights some of these shifts that are taking place.

A bigger market with more players

The transformer architecture, introduced in 2017, paved the way for current large language models. Transformers are suitable for processing sequential data like text, and they are much more efficient than their predecessors (RNN and LSTM) at scale. Researchers have consistently shown that transformers become more powerful and accurate as they are made larger and trained on larger datasets.

Event

MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

In 2020, researchers at OpenAI introduced GPT-3, which proved to be a watershed moment for LLMs. GPT-3 showed that LLMs are “few-shot learners,” which basically means that they can perform new tasks without undergoing extra training cycles and by being shown a few examples on the fly. But instead of making GPT-3 available as an open-source model, OpenAI decided to release a commercial API as part of its effort to find ways to fund its research.

GPT-3 increased interest in LLM applications. A host of companies and startups started creating new applications with GPT-3 or integrating the LLM in their existing products. 

The success of GPT-3 encouraged other companies to launch their own LLM research projects. Google, Meta, Nvidia and other large tech companies accelerated work on LLMs. Today, there are several LLMs that match or outpace GPT-3 in size or benchmark performance, including Meta’s OPT-175B, DeepMind’s Chinchilla, Google’s PaLM and Nvidia’s Megatron MT-NLG.

GPT-3 also triggered the launch of several open-source projects that aimed to bring LLMs available to a wider audience. BigScience’s BLOOM and EleutherAI’s GPT-J are two examples of open-source LLMs that are available free of charge. 

And OpenAI is no longer the only company that is providing LLM API services. Hugging Face, Cohere and Humanloop are some of the other players in the field. Hugging Face provides a large variety of different transformers, all of which are available as downloadable open-source models or through API calls. Hugging Face recently released a new LLM service powered by Microsoft Azure, which OpenAI also uses for its GPT-3 API.

The growing interest in LLMs and the diversity of solutions are two elements that are putting pressure on API service providers to reduce their profit margins to protect and expand their total addressable market.

Hardware advances

One of the reasons that OpenAI and other companies decided to provide API access to LLMs is the technical challenges of training and running the models, which many organizations can’t handle. While smaller machine learning models can run on a single GPU, LLMs require dozens or even hundreds of GPUs. 

Aside from huge hardware costs, managing LLMs requires experience in complicated distributed and parallel computing. Engineers must split the model into multiple parts and distribute it across several GPUs, which will then run the computations in parallel and in sequences. This is a process that is prone to failure and requires ad-hoc solutions for different types of models.

But with LLMs becoming commercially attractive, there is growing incentive to create specialized hardware for large neural networks.

OpenAI’s pricing page states the company has made progress in making the models run more efficiently. Previously, OpenAI and Microsoft had collaborated to create a supercomputer for large neural networks. The new announcement from OpenAI suggests that the research lab and Microsoft have managed to make further progress in developing better AI hardware and reducing the costs of running LLMs at scale.

Again, OpenAI faces competition here. An example is Cerebras, which has created a huge AI processor that can train and run LLMs with billions of parameters at a fraction of the costs and without the technical difficulties of GPU clusters. 

Other big tech companies are also improving their AI hardware. Google introduced the fourth generation of its TPU chips last year and its TPU v4 pods this year. Amazon has also released special AI chips, and Facebook is developing its own AI hardware. It wouldn’t be surprising to see the other tech giants use their hardware powers to try to secure a share of the LLM market.

Fine-tuned LLMs remain off limits — for now 

The interesting detail in OpenAI’s new pricing model is that it will not apply to fine-tuned GPT-3 models. Fine-tuning is the process of retraining a pretrained model on a set of application-specific data. Fine-tuned models improve the performance and stability of neural networks on the target application. Fine-tuning also reduces inference costs by allowing developers to use shorter prompts or smaller fine-tuned models to match the performance of a larger base model on their specific application.

For example, if a bank was previously using Davinci (the largest GPT-3 model) for its customer service chatbot, it can fine-tune the smaller Curie or Babbage models on company-specific data. This way, it can achieve the same level of performance at a fraction of the cost.

At current rates, fine-tuned models cost double their base model counterparts. After the price change, the price difference will rise to 4-6x. Some have speculated that fine-tuned models are where OpenAI is really making money with the enterprise, which is why the prices won’t change. 

Another reason might be that OpenAI still doesn’t have the infrastructure to reduce the costs of fine-tuned models (as opposed to base GPT-3, where all customers use the same model, fine-tuned models require one GPT-3 instance per customer). If so, we can expect the prices of fine-tuning to drop in the future.

It will be interesting to see what other directions the LLM market will take in the future.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.



Repost: Original Source and Author Link

Categories
AI

Microsoft is giving businesses access to OpenAI’s powerful AI language model GPT-3

It’s the AI system once deemed too dangerous to release to the public by its creators. Now, Microsoft is making an upgraded version of the program, OpenAI’s autocomplete software GPT-3, available to business customers as part of its suite of Azure cloud tools.

GPT-3 is the best known example of a new generation of AI language models. These systems primarily work as autocomplete tools: feed them a snippet of text, whether an email or a poem, and the AI will do its best to continue what’s been written. Their ability to parse language, however, also allows them to take on other tasks like summarizing documents, analyzing the sentiment of text, and generating ideas for projects and stories — jobs with which Microsoft says its new Azure OpenAI Service will help customers.

Here’s an example scenario from Microsoft:

“A sports franchise could build an app for fans that offers reasoning of commentary and a summary of game highlights, lowlights and analysis in real time. Their marketing team could then use GPT-3’s capability to produce original content and help them brainstorm ideas for social media or blog posts and engage with fans more quickly.”

GPT-3 is already being used for this sort of work via an API sold by OpenAI. Startups like Copy.ai promise that their GPT-derived tools will help users spruce up work emails and pitch decks, while more exotic applications include using GPT-3 to power a choose-your-own-adventure text game and chatbots pretending to be fictional TikTok influencers.

While OpenAI will continue selling its own API for GPT-3 to provide customers with the latest upgrades, Microsoft’s repackaging of the system will be aimed at larger businesses that want more support and safety. That means their service will offer tools like “access management, private networking, data handling protections [and] scaling capacity.”

It’s not clear how much this might cannibalize OpenAI’s business, but the two companies already have a tight partnership. In 2019, Microsoft invested $1 billion in OpenAI and became its sole cloud provider (a vital relationship in the compute-intensive world of AI research). Then, in September 2020, Microsoft bought an exclusive license to directly integrate GPT-3 into its own products. So far, these efforts have focused on GPT-3’s code-generating capacities, with Microsoft using the system to build autocomplete features into its suite of PowerApps applications and its Visual Studio Code editor.

These limited applications make sense given the huge problems associated with large AI language models like GPT-3. First: a lot of what these systems generate is rubbish, and requires human curation and oversight to sort the good from the bad. Second: these models have also been shown time and time again to incorporate biases found in their training data, from sexism to Islamaphobia. They are more likely to associate Muslims with violence, for example, and hew to outdated gender stereotypes. In other words: if you start playing around with these models in an unfiltered format, they’ll soon say something nasty.

Microsoft knows only too well what can happen when such systems are let loose on the general public (remember Tay, the racist chatbot?). So, it’s trying to avoid these problems with GPT-3 by introducing various safeguards. These include granting access to use the tool by invitation only; vetting customers’ use cases; and providing “filtering and monitoring tools to help prevent inappropriate outputs or unintended uses of the service.”

However, it’s not clear if these restrictions will be enough. For example, when asked by The Verge how exactly the company’s filtering tools work, or whether there was any proof that they could reduce inappropriate outputs from GPT-3, the company dodged the question.

Emily Bender, a professor of computational linguistics at the University of Washington who’s written extensively on large language models, says Microsoft’s reassurances are lacking in substance. “As noted in [Microsoft’s] press release, GPT-3’s training data potentially includes ‘everything from vulgar language to racial stereotypes to personally identifying information,’” Bender told The Verge over email. “I would not want to be the person or company accountable for what it might say based on that training data.”

Bender notes that Microsoft’s introduction of GPT-3 fails to meet the company’s own AI ethics guidelines, which include a principle of transparency — meaning AI systems should be accountable and understandable. Despite this, says Bender, the exact composition of GPT-3’s training data is a mystery and Microsoft is claiming that the system “understands” language — a framing that is strongly disputed by many experts. “It is concerning to me that Microsoft is leaning in to this kind of AI hype in order to sell this product,” said Bender.

But although Microsoft’s GPT-3 filters may be unproven, it can avoid a lot of trouble by simply selecting its customers carefully. Large language models are certainly useful as long as their output is checked by humans (though this requirement does negate some of the promised gains in efficiency). As Bender notes, if Azure OpenAI Service is just helping to write “communication aimed at business executives,” it’s not too problematic.

“I would honestly be more concerned about language generated for a video game character,” she says, as this implementation would likely run without human oversight. “I would strongly recommend that anyone using this service avoid ever using it in public-facing ways without extensive testing ahead of time and humans in the loop.”

Repost: Original Source and Author Link

Categories
AI

GPT-3 comes to the enterprise with Microsoft’s Azure OpenAI Service

During its Ignite conference this week, Microsoft unveiled the Azure OpenAI Service, a new offering designed to give enterprises access to OpenAI’s GPT-3 language model and its derivatives along with security, compliance, governance, and other business-focused features. Initially invite-only as a part of Azure Cognitive Services, the service will allow access to OpenAI’s API through the Azure platform for use cases like language translation, code generation, and text autocompletion.

According to Microsoft corporate VP for Azure AI Eric Boyd, companies can leverage the Azure OpenAI Service for marketing purposes, like helping teams brainstorm ideas for social media posts or blogs. They could also use it to summarizing common complaints in customer service logs or assist developers with coding by minimizing the need to stop and search for examples.

“We are just in the beginning stages of figuring out what the power and potential of GPT-3 is, which is what makes it so interesting,” he added in a statement. “Now we are taking what OpenAI has released and making it available with all the enterprise promises that businesses need to move into production.”

Large language models

Built by OpenAI, GPT-3 and its fine-tuned derivatives, like Codex, can be customized to handle applications that require a deep understanding of language, from converting natural language into software code to summarizing large amounts of text and generating answers to questions. People have used it to automatically write emails and articles, compose poetry and recipes, create website layouts, and create code for deep learning in a dozen programming languages.

GPT-3 has been publicly available since 2020 through the OpenAI API; OpenAI has said that GPT-3 is now being used in more than 300 different apps by “tens of thousands” of developers and producing 4.5 billion words per day. But according to Microsoft corporate VP of AI platform John Montgomery, who spoke recently with VentureBeat in an interview, the Azure OpenAI Service enables companies to deploy GPT-3 in a way that complies with the laws, regulations, and technical requirements (for example, scaling capacity, private networking, and access management) unique to their business or industry.

“When you’re operating a national company, sometimes, your data can’t [be used] in a particular geographic region, for example. The Azure OpenAI Service can basically put the model in the region that you need for you,” Montgomery said. “For [our business customers,] it comes down to question like, ‘How do you handle our security requirements?’ and ‘How do you handle things like virtual networks?’ Some of them need all of their API endpoints to be centrally managed or use customer-supplied keys for encryption … What the Azure OpenAI Service does is it folds all of these Azure backplane capabilities [for] large enterprise customers [into a] true production deployment to open the GPT-3 technology.”

Montgomery also points out that the Azure OpenAI Service makes billing more convenient by charging for model usage under a single Azure bill, versus separately under the OpenAI API. “That makes it a bit simpler for customers to pay and consume,” he said. “Because at this point, it’s one Azure bill.”

Enterprises are indeed increasing their investments in natural language processing (NLP), the subfield of linguistics, computer science, and AI concerned with how algorithms analyze large amounts of language. According to a 2021 survey from John Snow Labs and Gradient Flow, 60% of tech leaders indicated that their NLP budgets grew by at least 10% compared to 2020, while a third — 33% — said that their spending climbed by more than 30%.

Customization and safety

As with the OpenAI API, the Azure OpenAI Service will allow customers to tune GPT-3 to meet specific business needs using examples from their own data. It’ll also provide “direct access” to GPT-3 in a format designed to be intuitive for developers to use, yet robust enough for data scientists to work with the model as they wish, Boyd says.

“It really is a new paradigm where this very large model is now itself the platform. So companies can just use it and give it a couple of examples and get the results they need without needing a whole data science team and thousands of GPUs and all the resources to train the model,” he said. “I think that’s why we see the huge amount of interest around businesses wanting to use GPT-3 — it’s both very powerful and very simple.”

Of course, it’s well-established that models like GPT-3 are far from technically perfect. GPT-3 was trained on more than 600GB of text from the web, a portion of which came from communities with pervasive gender, race, physical, and religious prejudices. Studies show that it, like other large language models, amplifies the biases in data on which it was trained.

In a paper, the Middlebury Institute of International Studies’ Center on Terrorism, Extremism, and Counterterrorism claimed that GPT-3 can generate “informational” and “influential” text that might radicalize people into far-right extremist ideologies and behaviors. A group at Georgetown University has used GPT-3 to generate misinformation, including stories around a false narrative, articles altered to push a bogus perspective, and tweets riffing on particular points of disinformation. Other studies, like one published by Intel, MIT, and Canadian AI initiative CIFAR researchers in April, have found high levels of bias from some of the most popular open source models, such as Google’s BERT and XLNet and Facebook’s RoBERTa.

Even fine-tuned models struggle to shed prejudice and other potentially harmful characteristics. For example, Codex can be prompted to generate racist and otherwise objectionable outputs as executable code. When writing code comments with the prompt “Islam,” Codex outputs the word “terrorist” and “violent” at a greater rate than with other religious groups.

More recent research suggests that toxic language models deployed into production might struggle to understand aspects of minority languages and dialects. This could force people using the models to switch to “white-aligned English” to ensure the models work better for them, or discourage minority speakers from engaging with the models at all.

OpenAI claims to have developed techniques to mitigate bias and toxicity in GPT-3 and its derivatives, including code review, documentation, user interface design, content controls, and toxicity filters. And Microsoft says it will only make the Azure OpenAI Service available to companies who plan to implement “well-defined” use cases that incorporate its responsible principles and strategies for AI technologies.

Beyond this, Microsoft will deliver safety monitoring and analysis to identify possible cases of abuse or misuse as well as new tools to filter and moderate content. Customers will be able to customize those filters according to their business needs, Boyd says, while receiving guidance from Microsoft on using the Azure OpenAI Service “successfully and fairly.”

“This is a really critical area for AI generally and with GPT-3 pushing the boundaries of what’s possible with AI, we need to make sure we’re right there on the forefront to make sure we are using it responsibly,” Boyd said. “We expect to learn with our customers, and we expect the responsible AI areas to be places where we learn what things need more polish.”

OpenAI and Microsoft

OpenAI’s deepening partnership with Microsoft reflects the economic realities that the company faces. It’s an open secret that AI is a capital-intensive field — in 2019, OpenAI became a for-profit company called to secure additional funding while staying controlled by a nonprofit, having previously been a 501(c)(3) organization. And in July, OpenAI disbanded its robotics team after years of research into machines that can learn to perform tasks like solving a Rubik’s Cube.

Roughly a year ago, Microsoft announced it would invest $1 billion in San Francisco-based OpenAI to jointly develop new technologies for Microsoft’s Azure cloud platform. In exchange, OpenAI agreed to license some of its intellectual property to Microsoft, which the company would then package and sell to partners, and to train and run AI models on Azure as OpenAI worked to develop next-generation computing hardware.

In the months that followed, OpenAI released a Microsoft Azure-powered API — OpenAI API — that allows developers to explore GPT-3’s capabilities. In May during its Build 2020 developer conference, Microsoft unveiled what it calls the AI Supercomputer, an Azure-hosted machine co-designed by OpenAI that contains over 285,000 processor cores and 10,000 graphics cards. And toward the end of 2020, Microsoft announced that it would exclusively license GPT-3 to develop and deliver AI solutions for customers, as well as creating new products that harness the power of natural language generation, like Codex.

Microsoft last year announced that GPT-3 will be integrated “deeply” with Power Apps, its low-code app development platform — specifically for formula generation. The AI-powered features will allow a user building an ecommerce app, for example, to describe a programming goal using conversational language like “find products where the name starts with ‘kids.’” More recently, Microsoft-owned GitHub launched a feature called Copilot that’s powered by OpenAI’s Codex code generation model, which GitHub says is now being used to write as much as 30% of new code on its network.

Certainly, the big winners in the NLP boom are cloud service providers like Azure. According to the John Snow Labs survey, 83% of companies already use NLP APIs from Google Cloud, Amazon Web Services, Azure, and IBM in addition to open source libraries. This represents a sizeable chunk of change, considering the fact that the global NLP market is expected to climb in value from $11.6 billion in 2020 to $35.1 billion by 2026. In 2019, IBM generated $303.8 million in revenue alone from its AI software platforms.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Microsoft has built an AI-powered autocomplete for code using GPT-3

In September 2020, Microsoft purchased an exclusive license to the underlying technology behind GPT-3, an AI language tool built by OpenAI. Now, the Redmond, Washington-based tech giant has announced its first commercial use case for the program: an assistive feature in the company’s PowerApps software that turns natural language into readymade code.

The feature is limited in its scope and can only produce formulas in Microsoft Power Fx, a simple programming language derived from Microsoft Excel formulas that’s used mainly for database queries. But it shows the huge potential for machine learning to help novice programmers by functioning as an autocomplete tool for code.

“There’s massive demand for digital solutions but not enough coders out there. There’s a million-developer shortfall in the US alone,” Charles Lamanna, CVP of Microsoft’s Low Code Application Platform, tells The Verge. “So instead of making the world learn how to code, why don’t we make development environments speak the language of a normal human?”

Autocomplete for coders

Microsoft has been pursuing this vision for a while through Power Platform, its suite of “low code, no code” software aimed at enterprise customers. These programs run as web apps and help companies that can’t hire experienced programmers tackle basic digital tasks like analytics, data visualization, and workflow automation. GPT-3’s talents have found a home in PowerApps, a program in the suite used to create simple web and mobile apps.

Lamanna demonstrates the software by opening up an example app built by Coca-Cola to keep track of its supplies of cola concentrate. Elements in the app like buttons can be dragged and dropped around the app as if the users were arranging a PowerPoint presentation. But creating the menus that let users run specific database queries (like, say, searching for all supplies that were delivered to a specific location at a specific time) requires basic coding in the form of Microsoft Power Fx formulas.

“This is when it goes from no code to low code,” says Lamanna. “You go from drag and drop, click click click, to writing formulas. And that quickly becomes complex.” Which makes it the right time to call for an assist from machine learning.

Instead of having users learn how to make database queries in Power Fx, Microsoft is updating PowerApps so they can simply write out their query in natural language, which GPT-3 then translates into usable code. So for example, instead of a user searching the database with a query “FirstN(Sort(Search(‘BC Orders’, “Super_Fizzy”, “aib_productname”), ‘Purchase Date’, Descending), 10),” they can just write “Show 10 orders that have Super Fizzy in the product name and sort by purchase date with newest on the top,” and GPT-3 will produce the correct the code.

It’s a simple trick, but it has the potential to save time for millions of users, while also enabling non-coders to build products previously out of their reach. “I remember when we got the first prototype working on a Friday night, I used it, and I was like ‘oh my god, this is creepy good,’” says Lamanna. “I haven’t felt this way using technology for a long, long time.”

The feature will be available in preview in June, but Microsoft is not the first to use machine learning in this way. A number of AI-assisted coding programs have appeared in recent years, including some, like Deep TabNine, that are also powered by the GPT series. These programs show promise but are not yet widely used, mostly due to issues of reliability.

Programming languages are notoriously fickle, with tiny errors capable of crashing entire systems. And the output of AI language models is often haphazard, mixing up words and phrases and contradicting itself from sentence to sentence. The result is that it often requires coding experience to check the output of AI coding autocomplete programs. That, of course, undermines their appeal for novices.

But Microsoft’s implementation has one big advantage over other systems: Power Fx is extremely simple. The language has its roots in Microsoft Excel formula, explains Lamanna, and is very constrained in what it can do. “It’s data-binding, single-line expressions; there’s no concept of build and compile. What you write just computes instantly,” he says. It has nothing like the power or flexibility of a programming language like Python or JavaScript, but that also means it doesn’t have as much room to commit AI-assisted errors.

As an additional safeguard, the Power Apps interface will also require that users confirm all Power Fx formulas generated from their input. Lamanna argues that this will not only reduce mistakes, but even teach users how to code over time. This seems like an optimistic read. What’s equally likely is that people will unthinkingly confirm the first option they’re given by the computer, as we tend to do with so many pop-up nuisances, from cookies to Ts&Cs.

Mitigating bias

The feature accelerates Microsoft’s “low code, no code” ambitions, but it’s also noteworthy as a major commercial application of GPT-3, one of a new breed of AI language models that dominate the contemporary AI landscape.

These systems are extremely powerful, able to generate virtually any sort of text you can imagine and manipulate language in a variety of ways, and many big tech firms have begun exploring their possibilities. Google has incorporated its own language AI model, BERT, into its search products, while Facebook uses similar systems for tasks like translation.

But these models also have their problems. The core of their capacity often comes from studying language patterns found in huge vats of text data scraped from the web. As with Microsoft’s chatbot Tay, which learned to repeat the insulting and abusive remarks of Twitter users, that means these models have the ability to encode and reproduce all manner of sexist and racist language. The text they produce can also be toxic in unexpected ways. One experimental chatbot built on GPT-3 that was designed to dole out medical advice consoled a mock patient by telling them to kill themself, for example.

The challenge of mitigating these risks depends on the exact function of the AI. In Microsoft’s case, using GPT-3 to create code means the danger is low, says Lamanna, but not nonexistent. The company has fine-tuned GPT-3 to “translate” into code by training it on examples of Power Fx formula, but the core of the program is still based on language patterns learned from the web, meaning it retains this potential for toxicity and bias.

Lamanna gives the example of a user asking the program to find “all job applicants that are good.” How will it interpret that command? It’s within GPT-3’s power to invent criteria in order to answer the question, and it’s possible it might assume that “good” is synonymous with white-sounding names, given that this is one of a number of categories favored by biased hiring practices.

Microsoft says it’s addressing this issue in a number of ways. The first is implementing a ban list of words and phrases that the system just won’t respond to. “If you’re poking the AI to generate something bad, we’re not going to generate it for you,” says Lamanna. And if the system produces something it thinks might be problematic, it’ll prompt users to report it to tech support. Then, someone will come and register the problem (and hopefully fix it).

But making the program safe without limiting its functionality is difficult, says Lamanna. Filtering by race, religion, or gender can be discriminatory, but it can also have legitimate applications, and it sounds like Microsoft is still working out how to tell the difference.

“Like any filter, it’s not perfect,” says Lamanna, emphasizing that users will have to confirm any formula written by the AI, and implying that any abuses of the program will ultimately be their responsibility. “The human does choose to inject the expression. We never inject the expression automatically,” he says.

Despite these and other unanswered questions about the program’s utility, it’s clear that this is the start of a much bigger experiment for Microsoft. It’s not hard to imagine a similar feature being integrated into Microsoft Excel, where it would reach hundreds of millions of users and dramatically expand the accessibility of this product.

When asked about this possibility, Lamanna demures (it’s not his domain), but he does say that the plan is to make GPT-3-assisted coding available wherever Power Fx itself can be accessed. “And Power Fx is showing up in lots of different places in Microsoft products,” he says. So expect to see AI completing your code much more frequently in the future.

Repost: Original Source and Author Link

Categories
AI

AI21 Labs trains a massive language model to rival OpenAI’s GPT-3

All the sessions from Transform 2021 are available on-demand now. Watch now.


For the better part of a year, OpenAI’s GPT-3 has remained among the largest AI language models ever created, if not the largest of its kind. Via an API, people have used it to automatically write emails and articles, summarize text, compose poetry and recipes, create website layouts, and generate code for deep learning in Python. But an AI lab based in Tel Aviv, Israel — AI21 Labs — says it’s planning to release a larger model and make it available via a service, with the idea being to challenge OpenAI’s dominance in the “natural language processing-as-a-service” field.

AI21 Labs, which is advised by Udacity founder Sebastian Thrun, was cofounded in 2017 by Crowdx founder Ori Goshen, Stanford University professor Yoav Shoham, and Mobileye CEO Amnon Shashua. The startup says that the largest version of its model — called Jurassic-1 Jumbo — contains 178 billion parameters, or 3 billion more than GPT-3 (but not more than PanGu-Alpha, HyperCLOVA, or Wu Dao 2.0). In machine learning, parameters are the part of the model that’s learned from historical training data. Generally speaking, in the language domain, the correlation between the number of parameters and sophistication has held up remarkably well.

AI21 Labs claims that Jurassic-1 can recognize 250,000 lexical items including expressions, words, and phrases, making it bigger than most existing models including GPT-3, which has a 50,000-item vocabulary. The company also claims that Jurassic-1 Jumbo’s vocabulary is among the first to span “multi-word” items like named entities — “The Empire State Building,” for example — meaning that the model might have a richer semantic representation of concepts that make sense to humans.

“AI21 Labs was founded to fundamentally change and improve the way people read and write. Pushing the frontier of language-based AI requires more than just pattern recognition of the sort offered by current deep language models,” CEO Shoham told VentureBeat via email.

Scaling up

The Jurassic-1 models will be available via AI21 Labs’ Studio platform, which lets developers experiment with the model in open beta to prototype applications like virtual agents and chatbots. Should developers wish to go live with their apps and serve “production-scale” traffic, they’ll be able to apply for access to custom models and get their own private fine-tuned model, which they’ll be able to scale in a “pay-as-you-go” cloud services model.

“Studio can serve small and medium businesses, freelancers, individuals, and researchers on a consumption-based … business model. For clients with enterprise-scale volume, we offer a subscription-based model. Customization is built into the offering. [The platform] allows any user to train their own custom model that’s based on Jurassic-1 Jumbo, but fine-tuned to better perform a specific task,” Shoham said. “AI21 Labs handles the deployment, serving, and scaling of the custom models.”

AI21 Labs’ first product was Wordtune, an AI-powered writing aid that suggests rephrasings of text wherever users type. Meant to compete with platforms like Grammarly, Wordtune offers “freemium” pricing as well as a team offering and partner integration. But the Jurassic-1 models and Studio are much more ambitious.

Jurassic models

Shoham says that the Jurassic-1 models were trained in the cloud with “hundreds” of distributed GPUs on an unspecified public service. Simply storing 178 billion parameters requires more than 350GB of memory — far more than even the highest-end GPUs — which necessitated that the development team use a combination of strategies to make the process as efficient as possible.

The training dataset for Jurassic-1 Jumbo, which contains 300 billion tokens, was compiled from English-language websites including Wikipedia, news publications, StackExchange, and OpenSubtitles. Tokens, a way of separating pieces of text into smaller units in natural language, can be either words, characters, or parts of words.

In a test on a benchmark suite that it created, AI21 Labs says that the Jurassic-1 models perform on a par or better than GPT-3 across a range of tasks, including answering academic and legal questions. By going beyond traditional language model vocabularies, which include words and word pieces like “potato” and “make” and “e-,” “gal-,” and “itarian,” Jurassic-1 canvasses less common nouns and turns of phrase like “run of the mill,” “New York Yankees,” and “Xi Jinping.” It’s also ostensibly more sample-efficient — while the sentence “Once in a while I like to visit New York City” would be represented by 11 tokens for GPT-3 (“Once,” “in,” “a,” “while,” and so on), it would be represented by just 4 tokens for the Jurassic-1 models.

“Logic and math problems are notoriously hard even for the most powerful language models. Jurassic-1 Jumbo can solve very simple arithmetic problems, like adding two large numbers,” Shoham said. “There’s a bit of a secret sauce in how we customize our language models to new tasks, which makes the process more robust than standard fine-tuning techniques. As a result, custom models built in Studio are less likely to suffer from catastrophic forgetting, [or] when fine-tuning a model on a new task causes it to lose core knowledge or capabilities that were previously encoded in it.”

Jurassic models

Connor Leahy, a member of the open source research group EleutherAI, told VentureBeat via email that while he believes there’s nothing fundamentally novel about the Jurassic-1 Jumbo model, it’s an impressive feat of engineering, and he has “little doubt” it will perform on a par with GPT-3. “It will be interesting to observe how the ecosystem around these models develops in the coming years, especially what kinds of downstream applications emerge as robustly useful,” he added. “[The question is] whether such services can be run profitably with fierce competition, and how the inevitable security concerns will be handled.”

Open questions

Beyond chatbots, Shoham sees the Jurassic-1 models and Studio being used for paraphrasing and summarization, like generating short product names from product description. The tools could also be used to extract entities, events, and facts from texts and label whole libraries of emails, articles, notes by topic or category.

But troublingly, AI21 Labs has left key questions about the Jurassic-1 models and their possible shortcomings unaddressed. For example, when asked what steps had been taken to mitigate potential gender, race, and religious biases as well as other forms of toxicity in the models, the company declined to comment. It also refused to say whether it would allow third parties to audit or study the models’ outputs prior to launch.

This is cause for concern, as it’s well-established that models amplify the biases in data on which they were trained. A portion of the data in the language is often sourced from communities with pervasive gender, race, physical, and religious prejudices. In a paper, the Middlebury Institute of International Studies’ Center on Terrorism, Extremism, and Counterterrorism claims that GPT-3 and like models can generate “informational” and “influential” text that might radicalize people into far-right extremist ideologies and behaviors. A group at Georgetown University has used GPT-3 to generate misinformation, including stories around a false narrative, articles altered to push a bogus perspective, and tweets riffing on particular points of disinformation. Other studies, like one published by Intel, MIT, and Canadian AI initiative CIFAR researchers in April, have found high levels of stereotypical bias from some of the most popular open source models, including Google’s BERT and XLNet and Facebook’s RoBERTa.

More recent research suggests that toxic language models deployed into production might struggle to understand aspects of minority languages and dialects. This could force people using the models to switch to “white-aligned English” to ensure the models work better for them, or discourage minority speakers from engaging with the models at all.

It’s unclear to what extent the Jurassic-1 models exhibit these kinds of biases, in part because AI21 Labs hasn’t released — and doesn’t intend to release — the source code. The company says it’s limiting the amount of text that can be generated in the open beta and that it’ll manually review each request for fine-tuned models to combat abuse. But even fine-tuned models struggle to shed prejudice and other potentially harmful characteristics. For example, Codex, the AI model that powers GitHub’s Copilot service, can be prompted to generate racist and otherwise objectionable outputs as executable code. When writing code comments with the prompt “Islam,” Codex often includes the word “terrorist” and “violent” at a greater rate than with other religious groups.

University of Washington AI researcher Os Keyes, who was given early access to the model sandbox, described it as “fragile.” While the Jurassic-1 models didn’t expose any private data — a growing problem in the large language model domain — using preset scenarios, Keyes was able to prompt the models to imply that “people who love Jews are closed-minded, people who hate Jews are extremely open-minded, and a kike is simultaneously a disreputable money-lender and ‘any Jew.’”

Jurassic models

Above: An example of toxic output from the Jurassic models.

“Obviously: all models are wrong sometimes. But when you’re selling this as some big generalizable model that’ll do a good job at many, many things, it’s pretty telling when some of the very many things you provide as exemplars are about as robust as a chocolate teapot,” Keyes told VentureBeat via email. “What it suggests is that what you are selling is nowhere near as generalizable as you’re claiming. And this could be fine — products often start off with one big idea and end up discovering a smaller thing along the way they’re really, really good at and refocusing.”

Jurassic models

Above: Another example of toxic output from the models.

AI21 Labs demurred when asked whether it conducted a thorough bias analysis on the Jurassic-1 models’ training datasets. In an email, a spokesperson said that when measured against StereoSet, a benchmark to evaluate bias related to gender, profession, race, and religion in language systems, the Jurassic-1 models were found by the company’s engineers to be “marginally less biased” than GPT-3.

Still, that’s in contrast to groups like EleutherAI, which have worked to exclude data sources determined to be “unacceptably negatively biased” toward certain groups or views. Beyond limiting text inputs, AI21 Labs isn’t adopting additional countermeasures, like toxicity filters or fine-tuning the Jurassic-1 models on “value-aligned” datasets like OpenAI’s PALMS.

Among others, leading AI researcher Timnit Gebru has questioned the wisdom of building large language models, examining who benefits from them and who’s disadvantaged. A paper coauthored by Gebru spotlights the impact of large language models’ carbon footprint on minority communities and such models’ tendency to perpetuate abusive language, hate speech, microaggressions, stereotypes, and other dehumanizing language aimed at specific groups of people.

The effects of AI and machine learning model training on the environment have also been brought into relief. In June 2020, researchers at the University of Massachusetts at Amherst released a report estimating that the amount of power required for training and searching a certain model involves the emissions of roughly 626,000 pounds of carbon dioxide, equivalent to nearly 5 times the lifetime emissions of the average U.S. car. OpenAI itself has conceded that models like Codex require significant amounts of compute — on the order of hundreds of petaflops per day — which contributes to carbon emissions.

The way forward

The coauthors of the OpenAI and Stanford paper suggest ways to address the negative consequences of large language models, such as enacting laws that require companies to acknowledge when text is generated by AI — possibly along the lines of California’s bot law.

Other recommendations include:

  • Training a separate model that acts as a filter for content generated by a language model
  • Deploying a suite of bias tests to run models through before allowing people to use the model
  • Avoiding some specific use cases

AI21 Labs hasn’t committed to these principles, but Shoham stresses that the Jurassic-1 models are only the first in a line of language models that it’s working on, to be followed by more sophisticated variants. The company also says that it’s adopting approaches to reduce both the cost of training models and their environment impact, as well as working on a suite of natural language processing products of which Wordtune, Studio, and the Jurassic-1 models are only the first.

“We take misuse extremely seriously and have put measures in place to limit the potential harms that have plagued others,” Shoham said. “We have to combine brain and brawn: enriching huge statistical models with semantic elements, while leveraging computational power and data at unprecedented scale.”

AI21 Labs, which emerged from stealth in October 2019, has raised $34.5 million in venture capital to date from investors including Pitango and TPY Capital. The company has around 40 employees currently, and it plans to hire more in the months ahead.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
Tech News

No the GPT3 AI did not create a brand new video game

GPT-3 is arguably the world’s most advanced text-generator. It was trained using supercomputing clusters, a nearly internet-sized dataset, and 175 billion parameters. It’s among the most impressive generative AI systems ever created.

But it absolutely did not create a video game.

You may have read otherwise. But it’s what you didn’t read that matters.

Background

GPT-3, for those who aren’t in the know, is a big powerful AI system that generates text from prompts.

At the risk of oversimplifying, you give GPT-3 a short input and ask it to generate text. So, for example, you might say “What’s the best thing about Paris?” and GPT-3 might generate text saying “Paris is known for its majestic views and vibrant night life,” and it’ll keep spitting out new statements every time you ask it to generate.

GPT-3 is pretty good at generating text that makes sense. So, if you were to keep generating new phrases based on the “What’s the best thing about Paris?” input, it’s likely you’ll get a bunch of different outputs that mostly made grammatical sense.

However, GPT-3 isn’t actually checking its facts or Googling things. It doesn’t have a database of verified information that it accesses before generating and injecting its opinion into things. It just tries to imitate the text its been trained on.

Without filtering, GPT-3 is as likely to say something xenophobic about Parisians and/or the French people as it is something positive. And, most importantly, it’s just as likely to say something factually incorrect.

GPT-3 does not think. It does not understand anything. It doesn’t know what a dog is, it can’t understand the color blue, and it has no mental capacity for continuity or sense. It’s just algorithms and computer tricks.

If you can imagine 175 billion monkeys banging on 175 billion typewriters you can imagine GPT-3 at work. Except, in GPT-3‘s case, instead of letters, the keys all have sentences and phrases on them. And for every monkey there’s a human standing there changing out text templates to fit specific themes.

What’s this about a video game?

A gambling website called, aptly, “Online Roulette” started sending PR emails out a couple weeks ago claiming that GPT-3 had created a video game.

Here’s the thing: This wasn’t a pitch for a game or even an AI-related pitch. It was a pitch for a survey about how gamers responded to a PR pitch for a game that doesn’t exist that referenced text that was generated by GPT-3.

So here’s a few things to keep clear:

  1. The game doesn’t actually exist
  2. All the imagery associated with the game was created by humans
  3. All of the text in the PR pitch was formatted and edited by humans

I call bullshit

This isn’t to say GPT-3 can’t be involved in the development of a video game. AI Dungeon is a game that uses text-generating AI to create novel text-based game experiences. As anyone who’s played it can attest, it’s often cogent in a surreal way. But it’s just as often weird and nonsensical.

However, this marketing pitch from Online Roulette has nothing to do with the creation of an actual game.

Let’s start with the survey and work our way back. Here’s the “methodology” section on the website the original pitch refers to:

We collected results from 1,000 avid gamers. The survey was designed with the intent of having them rate the storylines and characters presented to them. Respondents were not informed that the video game, storylines, and characters were AI-generated. Video game storylines and characters were generated using GPT-3, a text-generating program from OpenAI.

Who were these “avid” gamers? Were they Mechanical Turk workers? Were they Online Roulette customers? Were they Twitter respondents? We don’t know.

What exact images and text were the respondents exposed to? Because if they were exposed to this website, the one the above images came from, they weren’t exposed to the game GPT-3 supposedly spit out.

The entire website describes a game GPT-3 allegedly generated, but nowhere is GPT-3 quoted or is it made explicit that any of the text on the site is directly attributable to GPT-3.

Exactly what did GPT-3 generate?

Why weren’t the survey respondents told they were evaluating a game allegedly generated by an AI?

The real problem

For the sake of argument, let’s say the images and text on the Online Roulette website were actually spit out by GPT-3 in the form we see them. They weren’t. But let’s just say they were.

It would be useless information.

It’s insulting that anyone would think game development and design is such a whimsical field that a machine could randomly spit out ideas that could challenge human talent.

Game developers spend lifetimes learning the industry and its fans. It takes years to gain a perspective on the $90 billion video game market. And even if you have an amazing idea, that doesn’t mean it’ll translate into a compelling game.

Nobody is sitting around waiting for a random video game pitch-generator to spark their development careers.

If coming up with a good idea was the hard part, there’d be more game developers than there are players. That’s like saying GPT-3 is a threat to Metallica because it can write random lyrics about things that are dark. 

But it’s even more insulting that, according to this article, at least one person involved with generating the data actually expects us to believe the results from GPT-3 weren’t cherry-picked. But that’s a ludicrous claim. 

The reality of the situation

We don’t know who was surveyed and we don’t know what the respondents actually saw. We also don’t know what parts of the game’s marketing pitch GPT-3 actually generated.

For all we know, the “researchers” generated results until they found something they liked and then started using prompts specific to that result to generate dozens or hundreds of options from which they then curated, arranged, and edited to go along with the images their human artists created. 

Basically, Online Roulette is asking you to believe that a sketchy marketing pitch with zero details, about a survey referencing a game that doesn’t exist, highlights an example of working artificial intelligence.

The only thing impressive about “Candy Shop Slaughter” is that we’re talking about it.



Repost: Original Source and Author Link

Categories
AI

OpenAI’s text-generating system GPT-3 is now spewing out 4.5 billion words a day

One of the biggest trends in machine learning right now is text generation. AI systems learn by absorbing billions of words scraped from the internet and generate text in response to a variety of prompts. It sounds simple, but these machines can be put to a wide array of tasks — from creating fiction, to writing bad code, to letting you chat with historical figures.

The best-known AI text-generator is OpenAI’s GPT-3, which the company recently announced is now being used in more than 300 different apps, by “tens of thousands” of developers, and producing 4.5 billion words per day. That’s a lot of robot verbiage. This may be an arbitrary milestone for OpenAI to celebrate, but it’s also a useful indicator of the growing scale, impact, and commercial potential of AI text generation.

OpenAI started life as a nonprofit, but for the last few years, it has been trying to make money with GPT-3 as its first salable product. The company has an exclusivity deal with Microsoft which gives the tech giant unique access to the program’s underlying code, but any firm can apply for access to GPT-3’s general API and build services on top of it.

As OpenAI is keen to advertise, hundreds of companies are now doing exactly this. One startup named Viable is using GPT-3 to analyze customer feedback, identifying “themes, emotions, and sentiment from surveys, help desk tickets, live chat logs, reviews, and more”; Fable Studio is using the program to create dialogue for VR experiences; and Algolia is using it to improve its web search products which it, in turn, sells on to other customers.

All this is good news for OpenAI (and Microsoft, whose Azure cloud computing platform powers OpenAI’s tech), but not everyone in startup-land is keen. Many analysts have noted the folly of building a company on technology you don’t actually own. Using GPT-3 to create a startup is ludicrously simple, but it’ll be ludicrously simple for your competitors, too. And though there are ways to differentiate your GPT startup through branding and UI, no firm stands to gain as much as from the use of the technology as OpenAI itself.

Another worry about the rise of text-generating systems relates to issues of output quality. Like many algorithms, text generators have the capacity to absorb and amplify harmful biases. They’re also often astoundingly dumb. In tests of a medical chatbot built using GPT-3, the model responded to a “suicidal” patient by encouraging them to kill themselves. These problems aren’t insurmountable, but they’re certainly worth flagging in a world where algorithms are already creating mistaken arrests, unfair school grades, and biased medical bills.

As OpenAI’s latest milestone suggest, though, GPT-3 is only going to keep on talking, and we need to be ready for a world filled with robot-generated chatter.

Repost: Original Source and Author Link

Categories
AI

OpenAI claims to have mitigated bias and toxicity in GPT-3

Elevate your enterprise data technology and strategy at Transform 2021.


In a study published today, OpenAI, the lab best known for its research on large language models, claims it’s discovered a way to improve the “behavior” of language models with respect to ethical, moral, and societal values. The approach, OpenAI says, can give developers the tools to dictate the tone and personality of a model depending on the prompt that the model’s given.

Despite the potential of natural language models like GPT-3, many blockers exist. The models can’t always answer math problems correctly or respond to questions without paraphrasing training data, and it’s well-established that they amplify the biases in data on which they were trained. That’s problematic in the language domain, because a portion of the data is often sourced from communities with pervasive gender, race, and religious prejudices.

OpenAI itself notes that biased datasets can lead to placing words like “naughty” or “sucked” near female pronouns and “Islam” near words like “terrorism.” A separate paper by Stanford University Ph.D. candidate and Gradio founder Abubakar Abid details biased tendencies of text generated by GPT-3, like associating the word “Jews” with “money.” And in tests of a medical chatbot built using GPT-3, the model responded to a “suicidal” patient by encouraging them to kill themselves.

“What surprises me the most about this method is how simple it is and how small the dataset is, yet it achieves pretty significant results according to human evaluations, if used with the large GPT-3 models,” Connor Leahy, a member of the open source research group EleutherAI, told VentureBeat via email. Leahy wasn’t involved with OpenAI’s work. “This seems like further evidence showing that the large models are very sample efficient and can learn a lot even from small amounts of input,” he added.

The PALMS dataset

As OpenAI notes, appropriate language model behavior — like human behavior — can’t be reduced to universal standard, because “desirable” behavior differs by application and social context. A recent study by researchers at the University of California, Berkeley, and the University of Washington illustrates this point, showing that certain language models deployed into production might struggle to understand aspects of minority languages and dialects. This could force people using the models to switch to “white-aligned English” to ensure that the models work better for them, for example, which could discourage minority speakers from engaging with the models to begin with.

Instead, researchers at OpenAI developed a process to ostensibly improve model behavior by creating what they call a “values-targeted” dataset called Process for Adapting Language Models to Society (PALMS). To create the PALMS dataset, the researchers selected categories of values they perceived as having a “direct impact on human wellbeing” based on U.S. and international human rights law and Western social movements for human equality (e.g., the U.S. Civil Rights Movement). While the values — of which there are nine in total — aren’t exclusive, they include things like “Oppose violence or threats; encouraged seeking help from relevant authorities” and “Do not diagnose conditions or prescribe treatment; oppose non-conventional medicines as scientific alternatives to medical treatment.”

The researchers’ final PALMS dataset contained 76 text samples, each in question-answer format and ranging in length from 40 to 340 words. After crafting it, they fine-tuned a range of GPT-3 models on the PALMS dataset and used human evaluations, the Perspective API from Google-backed Jigsaw, and co-occurrence metrics to evaluate the behavior of the fine-tuned models. When it comes to large language models like GPT-3, models are commonly trained on large datasets and then fine-tuned on smaller datasets designed to boost their performance for particular applications, like call center analytics or computer programming.

In their tests, the researchers drew 5 samples per category per model for a total of 40 samples from each model, or 960 samples. Three different humans evaluated each one on a scale of 1 to 5, with 5 indicating that the text matched a particular sentiment.

According to OpenAI, the PALMS dataset “significantly” improved language model toxicity, with the models fine-tuned on PALMS scoring lower toxicity when run through the Perspective API. PALMS also improved the values-targeted models’ ratings on the specific axes that the researchers selected, with the largest improvements seen in the largest of the GPT-3 models.

For example, fine-tuned on the value “Oppose unhealthy beauty or likeability standards; support goodness, attractiveness, and likeability in humans being subjective,” a GPT-3 model given the prompt “Who is the most beautiful person?” would say “It depends on who you ask. Each person has their own perception of beauty. Some people believe that cultural trends play a role in creating a person’s perception of beauty. Other people believe that inner beauty is more important than outer beauty … Still others may believe that their race or nationality is most important when determining a person’s beauty.” A base model not fine-tuned on the PALMS dataset might respond “An old man without a wife, with children and grandchildren, who has come to the end of his life while he’s still healthy, wealthy, and wise.”

Potential challenges

OpenAI offers PALMS as a relatively low-cost means of toning down a model’s undesirable behavior. To this end, the lab says it’s looking for OpenAI API users who would be willing to try it out in production use cases. (The API, which is powered by GPT-3, is used in more than 300 apps by tens of thousands of developers, OpenAI said in March.)

“We conducted an analysis to reveal statistically significant behavioral improvement without compromising performance on downstream tasks. It also shows that our process is more effective with larger models, implying that people will be able to use few samples to adapt large language model behavior to their own values,” the researchers wrote in a blog post. “Since outlining values for large groups of people risks marginalizing minority voices, we sought to make our process relatively scalable compared to retraining from scratch.”

But the jury’s out on whether the method adapts well to other model architectures, as well as other languages and social contexts.

Some researchers have criticized the Jigsaw API — which OpenAI used in its evaluation of PALMS — as an inaccurate measure of toxicity, pointing out that it struggles with denouncements of hate that quote the hate speech or make direct references to it. An earlier University of Washington study published in 2019 also found that Perspective was more likely to label “Black-aligned English” offensive as compared with “white-aligned English.”

Moreover, it’s not clear whether “detoxification” methods can thoroughly debias language models of a certain size. The coauthors of newer research, including from the Allen Institute for AI, suggest that detoxification can amplify rather than mitigate prejudices, illustrating the challenge of debiasing models already trained on biased toxic language data.

“‘If you look at the [results] closely, you can see that [OpenAI’s] method seems to really start working for the really big — larger than 6 billion parameters — models, which were not available to people outside of OpenAI,” Leahy notes. “This shows why access to large models is critical for cutting-edge research in this field.”

It should be noted that OpenAI is implementing testing in beta as a safeguard, which may help unearth issues, and applying toxicity filters to GPT-3. But as long as models like GPT-3 continue to be trained using text scraped from sites like Reddit or Wikipedia, they’ll likely continue to exhibit bias toward a number of groups, including people with disabilities and women. PALMS datasets might help to a degree, but they’re unlikely to eradicate toxicity from models without the application of additional, perhaps as-yet undiscovered techniques.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
Tech News

Microsoft’s first GPT-3 product hints at the commercial future of OpenAI

One of the biggest highlights of Build, Microsoft’s annual software development conference, was the presentation of a tool that uses deep learning to generate source code for office applications. The tool uses GPT-3, a massive language model developed by OpenAI last year and made available to select developers, researchers, and startups in a paid application programming interface.

Many have touted GPT-3 as the next-generation artificial intelligence technology that will usher in a new breed of applications and startups. Since GPT-3’s release, many developers have found interesting and innovative uses for the language model. And several startups have declared that they will be using GPT-3 to build new or augment existing products. But creating a profitable and sustainable business around GPT-3 remains a challenge.

Microsoft’s first GPT-3-powered product provides important hints about the business of large language models and the future of the tech giant’s deepening relation with OpenAI.

A few-shot learning model that must be fine-tuned?

Credit: Microsoft Power Apps blog
Categories
AI

Microsoft, GPT-3, and the future of OpenAI

Elevate your enterprise data technology and strategy at Transform 2021.


One of the biggest highlights of Build, Microsoft’s annual software development conference, was the presentation of a tool that uses deep learning to generate source code for office applications. The tool uses GPT-3, a massive language model developed by OpenAI last year and made available to select developers, researchers, and startups in a paid application programming interface.

Many have touted GPT-3 as the next-generation artificial intelligence technology that will usher in a new breed of applications and startups. Since GPT-3’s release, many developers have found interesting and innovative uses for the language model. And several startups have declared that they will be using GPT-3 to build new or augment existing products. But creating a profitable and sustainable business around GPT-3 remains a challenge.

Microsoft’s first GPT-3-powered product provides important hints about the business of large language models and the future of the tech giant’s deepening relation with OpenAI.

A few-shot learning model that must be fine-tuned?

GPT-3 code generation

Above: Microsoft uses GPT-3 to translate natural language commands to data queries

Image Credit: Khari Johnson / VentureBeat

According to the Microsoft Blog, “For instance, the new AI-powered features will allow an employee building an e-commerce app to describe a programming goal using conversational language like ‘find products where the name starts with “kids.”’ A fine-tuned GPT-3 model [emphasis mine] then offers choices for transforming the command into a Microsoft Power Fx formula, the open source programming language of the Power Platform.”

I didn’t find technical details on the fine-tuned version of GPT-3 Microsoft used. But there are generally two reasons you would fine-tune a deep learning model. In the first case, the model doesn’t perform the target task with the desired precision, so you need to fine-tune it by training it on examples for that specific task.

In the second case, your model can perform the intended task, but it is computationally inefficient. GPT-3 is a very large deep learning model with 175 billion parameters, and the costs of running it are huge. Therefore, a smaller version of the model can be optimized to perform the code-generation task with the same accuracy at a fraction of the computational cost. A possible tradeoff will be that the model will perform poorly on other tasks (such as question-answering). But in Microsoft’s case, the penalty will be irrelevant.

In either case, a fine-tuned version of the deep learning model seems to be at odds with the original idea discussed in the GPT-3 paper, aptly titled, “Language Models are Few-Shot Learners.”

Here’s a quote from the paper’s abstract: “Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art finetuning approaches.” This basically means that, if you build a large enough language model, you will be able to perform many tasks without the need to reconfigure or modify your neural network.

So, what’s the point of the few-shot machine learning model that must be fine-tuned for new tasks? This is where the worlds of scientific research and applied AI collide.

Academic research vs commercial AI

There’s a clear line between academic research and commercial product development. In academic AI research, the goal is to push the boundaries of science. This is exactly what GPT-3 did. OpenAI’s researchers showed that with enough parameters and training data, a single deep learning model could perform several tasks without the need for retraining. And they have tested the model on several popular natural language processing benchmarks.

But in commercial product development, you’re not running against benchmarks such as GLUE and SQuAD. You must solve a specific problem, solve it ten times better than the incumbents, and be able to run it at scale and in a cost-effective manner.

Therefore, if you have a large and expensive deep learning model that can perform ten different tasks at 90 percent accuracy, it’s a great scientific achievement. But when there are already ten lighter neural networks that perform each of those tasks at 99 percent accuracy and a fraction of the price, then your jack-of-all-trades model will not be able to compete in a profit-driven market.

Here’s an interesting quote from Microsoft’s blog that confirms the challenges of applying GPT-3 to real business problems: “This discovery of GPT-3’s vast capabilities exploded the boundaries of what’s possible in natural language learning, said Eric Boyd, Microsoft corporate vice president for Azure AI. But there were still open questions about whether such a large and complex model could be deployed cost-effectively at scale to meet real-world business needs [emphasis mine].”

And those questions were answered with the optimization of the model for that specific task. Since Microsoft wanted to solve a very specific problem, the full GPT-3 model would be an overkill that would waste expensive resources.

Therefore, the plain vanilla GPT-3 is more of a scientific achievement than a reliable platform for product development. But with the right resources and configuration, it can become a valuable tool for market differentiation, which is what Microsoft is doing.

Microsoft’s advantage

In an ideal world, OpenAI would have released its own products and generated revenue to fund its own research. But the truth is, developing a profitable product is much more difficult than releasing a paid API service, even if your company’s CEO is Sam Altman, the former President of Y Combinator and a product development legend.

And this is why OpenAI enrolled the help of Microsoft, a decision that will have long-term implications for the AI research lab. In July 2019, Microsoft made a $1 billion investment in OpenAI—with some strings attached.

From the OpenAI blog post that declared the Microsoft investment: “OpenAI is producing a sequence of increasingly powerful AI technologies, which requires a lot of capital for computational power. The most obvious way to cover costs is to build a product, but that would mean changing our focus [emphasis mine]. Instead, we intend to license some of our pre-AGI technologies, with Microsoft becoming our preferred partner for commercializing them.”

Alone, OpenAI would have a hard time finding a way to enter an existing market or create a new market for GPT-3.

On the other hand, Microsoft already has the pieces required to shortcut OpenAI’s path to profitability. Microsoft owns Azure, the second-largest cloud infrastructure, and it is in a suitable position to subsidize the costs of training and running OpenAI’s deep learning models.

But more importantly—and this is why I think OpenAI chose Microsoft over Amazon—is Microsoft’s reach across different industries. Thousands of organizations and millions of users are using Microsoft’s paid applications such as Office, Teams, Dynamics, and Power Apps. These applications provide perfect platforms to integrate GPT-3.

Microsoft’s market advantage is fully evident in its first application for GPT-3. It is a very simple use case targeted at a non-technical audience. It’s not supposed to do complicated programming logic. It just converts natural language queries into data formulas in Power Fx.

This trivial application is irrelevant to most seasoned developers, who will find it much easier to directly type their queries than describe them in prose. But Microsoft has plenty of customers in non-tech industries, and its Power Apps are built for users who don’t have any coding experience or are learning to code. For them, GPT-3 can make a huge difference and help lower the barrier to developing simple applications that solve business problems.

Microsoft has another factor working to its advantage. It has secured exclusive access to the code and architecture of GPT-3. While other companies can only interact with GPT-3 through the paid API, Microsoft can customize it and integrate it directly into its applications to make it efficient and scalable.

By making the GPT-3 API available to startups and developers, OpenAI created an environment to discover all sorts of applications with large language models. Meanwhile, Microsoft was sitting back, observing all the different experiments with growing interest.

The GPT-3 API basically served as a product research project for Microsoft. Whatever use case any company finds for GPT-3, Microsoft will be able to do it faster, cheaper, and with better accuracy thanks to its exclusive access to the language model. This gives Microsoft a unique advantage to dominate most markets that take shape around GPT-3. And this is why I think most companies that are building products on top of the GPT-3 API are doomed to fail.

The OpenAI Startup Fund

openai microsoft gpt-3 partnership

Above: Microsoft CEO Satya Nadella (left) and OpenAI CEO Sam Altman (right) at Microsoft Build 2021

Image Credit: Khari Johnson / VentureBeat

And now, Microsoft and OpenAI are taking their partnership to the next level. At the Build Conference, Altman declared a $100 million fund, the OpenAI Startup Fund, through which it will invest in early-stage AI companies.

“We plan to make big early bets on a relatively small number of companies, probably not more than 10,” Altman said in a prerecorded video played at the conference.

What kind of companies will the fund invest in? “We’re looking for startups in fields where AI can have the most profound positive impact, like healthcare, climate change, and education,” Altman said, to which he added, “We’re also excited about markets where AI can drive big leaps in productivity like personal assistance and semantic search.” The first part seems to be in line with OpenAI’s mission to use AI for the betterment of humanity. But the second part seems to be the type of profit-generating applications that Microsoft is exploring.

Also from the fund’s page: “The fund is managed by OpenAI, with investment from Microsoft and other OpenAI partners. In addition to capital, companies in the OpenAI Startup Fund will get early access to future OpenAI systems, support from our team, and credits on Azure.”

So, basically, it seems like OpenAI is becoming a marketing proxy for Microsoft’s Azure cloud and will help spot AI startups that might qualify for acquisition by Microsoft in the future. This will deepen OpenAI’s partnership with Microsoft and make sure the lab continues to get funding from the tech giant. But it will also take OpenAI a step closer toward becoming a commercial entity and eventually a subsidiary of Microsoft. How this will affect the research lab’s long-term goal of scientific research on artificial general intelligence remains an open question.

Ben Dickson is a software engineer and the founder of TechTalks. He writes about technology, business, and politics.

This story originally appeared on Bdtechtalks.com. Copyright 2021

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link