Categories
AI

DeepMind’s new AI model helps decipher, date, and locate ancient inscriptions

Machine learning techniques are providing new tools that could help archaeologists understand the past — particularly when it comes to deciphering ancient texts. The latest example is an AI model created by Alphabet-subsidiary DeepMind that helps not only restore text that is missing from ancient Greek inscriptions but offers suggestions for when the text was written (within a 30-year period) and its possible geographic origins.

“Inscriptions are really important because they are direct sources of evidence … written directly by ancient people themselves,” Thea Sommerschield, a historian and machine learning expert who helped created the model, told journalists in a press briefing.

Due to their age, these texts are often damaged, making restoration a rewarding challenge. And because they are often inscribed on inorganic material like stone or metal, it means methods like radiocarbon dating can’t be used to find out when they were written. “To solve these tasks, epigraphers look for textual and contextual parallels in similar inscriptions,” said Sommerschield, who was co-lead on the work alongside DeepMind staff research scientist Yannis Assael. “However, it’s really difficult for a human to harness all existing, relevant data and to discover underlying patterns.”

That’s where machine learning can help.

Ancient Greek inscriptions are often fragmented. The software Ithaca can suggest what letters are missing.
Image: DeepMind

The new software, named Ithaca, is trained on a dataset of some 78,608 ancient Greek inscriptions, each of which is labeled with metadata describing where and when it was written (to the best of historians’ knowledge). Like all machine learning systems, Ithaca looks for patterns in this information, encoding this information in complex mathematical models, and uses these inferences to suggest text, date, and origins.

In a paper published in Nature that describes Ithaca, the scientists who created the model say it is 62 percent accurate when restoring letters in damaged texts. It can attribute an inscription’s geographic origins to one of 84 regions of the ancient world with 71 percent accuracy and can date a text to within, on average, 30 years of its known year of writing.

These are promising statistics, but it’s important to remember that Ithaca is not capable of operating independently of human expertise. Its suggestions are ultimately based on data collected by traditional archaeological methods, and its creators are positioning it as simply another tool in a wider set of forensic methods, rather than a fully-automated AI historian. “Ithaca was designed as a complementary tool to aid historians,” said Sommerschield.

Ithaca is the first model to geographical and chronological attribution with textual restoration.
Image: DeepMind

Eleanor Dickey, a professor of classics from the University of Reading who specializes in ancient Greek and Latin sociolinguists, told The Verge that Ithaca was an “exciting development that may improve our knowledge of the ancient world.” But, she added that a 62 percent accuracy for restoring lost text was not reassuringly high — “when people rely on it they will need to keep in mind that it is wrong about one third of the time” — and that she was not sure how the software would fit into existing academic methodologies.

For example, DeepMind highlighted tests that showed the model helped improve the accuracy of historians restoring missing text in ancient inscriptions from 25 percent to 72 percent. But Dickey notes that those being tested were students, not professional epigraphers. She says that AI models may be broadly accessible, but that doesn’t mean they can or should replace the small cadre of specialized academics who decipher texts.

“It is not yet clear to what extent use of this tool by genuinely qualified editors would result in an improvement in the editions generally available — but it will be interesting to find out,” said Dickey. She added that she was looking for to trying the Ithaca model out for herself. The software, along with its open-source code, is available online for anyone to test.

Ithaca and its predecessor (named Pythia and released in 2019) have already been used to help recent archaeological debates — including helping date inscriptions discovered in the Acropolis of Athens. However, the true potential of the software has yet to be seen.

Sommerschield stresses that the real value of Ithaca may be in its flexibility. Although it was trained on ancient Greek inscriptions, it could be easily configured to work with other ancient scripts. “Ithaca’s architecture makes it really applicable to any ancient language, not just Latin, but Mayan, cuneiform; really any written medium — papyri, manuscripts,” she said. “There’s a lot of opportunities.”

Repost: Original Source and Author Link

Categories
AI

DeepMind tests the limits of large AI language systems with 280-billion-parameter model

Language generation is the hottest thing in AI right now, with a class of systems known as “large language models” (or LLMs) being used for everything from improving Google’s search engine to creating text-based fantasy games. But these programs also have serious problems, including regurgitating sexist and racist language and failing tests of logical reasoning. One big question is: can these weaknesses be improved by simply adding more data and computing power, or are we reaching the limits of this technological paradigm?

This is one of the topics that Alphabet’s AI lab DeepMind is tackling in a trio of research papers published today. The company’s conclusion is that scaling up these systems further should deliver plenty of improvements. “One key finding of the paper is that the progress and capabilities of large language models is still increasing. This is not an area that has plateaued,” DeepMind research scientist Jack Rae told reporters in a briefing call.

DeepMind, which regularly feeds its work into Google products, has probed the capabilities of this LLMs by building a language model with 280 billion parameters named Gopher. Parameters are a quick measure of a language’s models size and complexity, meaning that Gopher is larger than OpenAI’s GPT-3 (175 billion parameters) but not as big as some more experimental systems, like Microsoft and Nvidia’s Megatron model (530 billion parameters).

It’s generally true in the AI world that bigger is better, with larger models usually offering higher performance. DeepMind’s research confirms this trend and suggests that scaling up LLMs does offer improved performance on the most common benchmarks testing things like sentiment analysis and summarization. However, researchers also cautioned that some issues inherent to language models will need more than just data and compute to fix.

“I think right now it really looks like the model can fail in variety of ways,” said Rae. “Some subset of those ways are because the model just doesn’t have sufficiently good comprehension of what it’s reading, and I feel like, for those class of problems, we are just going to see improved performance with more data and scale.”

But, he added, there are “other categories of problems, like the model perpetuating stereotypical biases or the model being coaxed into giving mistruths, that […] no one at DeepMind thinks scale will be the solution [to].” In these cases, language models will need “additional training routines” like feedback from human users, he noted.

To come to these conclusions, DeepMind’s researchers evaluated a range of different-sized language models on 152 language tasks or benchmarks. They found that larger models generally delivered improved results, with Gopher itself offering state-of-the-art performance on roughly 80 percent of the tests selected by the scientists.

In another paper, the company also surveyed the wide range of potential harms involved with deploying LLMs. These include the systems’ use of toxic language, their capacity to share misinformation, and their potential to be used for malicious purposes, like sharing spam or propaganda. All these issues will become increasingly important as AI language models become more widely deployed — as chatbots and sales agents, for example.

However, it’s worth remembering that performance on benchmarks is not the be-all and end-all in evaluating machine learning systems. In a recent paper, a number of AI researchers (including two from Google) explored the limitations of benchmarks, noting that these datasets will always be limited in scope and unable to match the complexity of the real world. As is often the case with new technology, the only reliable way to test these systems is to see how they perform in reality. With large language models, we will be seeing more of these applications very soon.

Repost: Original Source and Author Link

Categories
AI

Microsoft is giving businesses access to OpenAI’s powerful AI language model GPT-3

It’s the AI system once deemed too dangerous to release to the public by its creators. Now, Microsoft is making an upgraded version of the program, OpenAI’s autocomplete software GPT-3, available to business customers as part of its suite of Azure cloud tools.

GPT-3 is the best known example of a new generation of AI language models. These systems primarily work as autocomplete tools: feed them a snippet of text, whether an email or a poem, and the AI will do its best to continue what’s been written. Their ability to parse language, however, also allows them to take on other tasks like summarizing documents, analyzing the sentiment of text, and generating ideas for projects and stories — jobs with which Microsoft says its new Azure OpenAI Service will help customers.

Here’s an example scenario from Microsoft:

“A sports franchise could build an app for fans that offers reasoning of commentary and a summary of game highlights, lowlights and analysis in real time. Their marketing team could then use GPT-3’s capability to produce original content and help them brainstorm ideas for social media or blog posts and engage with fans more quickly.”

GPT-3 is already being used for this sort of work via an API sold by OpenAI. Startups like Copy.ai promise that their GPT-derived tools will help users spruce up work emails and pitch decks, while more exotic applications include using GPT-3 to power a choose-your-own-adventure text game and chatbots pretending to be fictional TikTok influencers.

While OpenAI will continue selling its own API for GPT-3 to provide customers with the latest upgrades, Microsoft’s repackaging of the system will be aimed at larger businesses that want more support and safety. That means their service will offer tools like “access management, private networking, data handling protections [and] scaling capacity.”

It’s not clear how much this might cannibalize OpenAI’s business, but the two companies already have a tight partnership. In 2019, Microsoft invested $1 billion in OpenAI and became its sole cloud provider (a vital relationship in the compute-intensive world of AI research). Then, in September 2020, Microsoft bought an exclusive license to directly integrate GPT-3 into its own products. So far, these efforts have focused on GPT-3’s code-generating capacities, with Microsoft using the system to build autocomplete features into its suite of PowerApps applications and its Visual Studio Code editor.

These limited applications make sense given the huge problems associated with large AI language models like GPT-3. First: a lot of what these systems generate is rubbish, and requires human curation and oversight to sort the good from the bad. Second: these models have also been shown time and time again to incorporate biases found in their training data, from sexism to Islamaphobia. They are more likely to associate Muslims with violence, for example, and hew to outdated gender stereotypes. In other words: if you start playing around with these models in an unfiltered format, they’ll soon say something nasty.

Microsoft knows only too well what can happen when such systems are let loose on the general public (remember Tay, the racist chatbot?). So, it’s trying to avoid these problems with GPT-3 by introducing various safeguards. These include granting access to use the tool by invitation only; vetting customers’ use cases; and providing “filtering and monitoring tools to help prevent inappropriate outputs or unintended uses of the service.”

However, it’s not clear if these restrictions will be enough. For example, when asked by The Verge how exactly the company’s filtering tools work, or whether there was any proof that they could reduce inappropriate outputs from GPT-3, the company dodged the question.

Emily Bender, a professor of computational linguistics at the University of Washington who’s written extensively on large language models, says Microsoft’s reassurances are lacking in substance. “As noted in [Microsoft’s] press release, GPT-3’s training data potentially includes ‘everything from vulgar language to racial stereotypes to personally identifying information,’” Bender told The Verge over email. “I would not want to be the person or company accountable for what it might say based on that training data.”

Bender notes that Microsoft’s introduction of GPT-3 fails to meet the company’s own AI ethics guidelines, which include a principle of transparency — meaning AI systems should be accountable and understandable. Despite this, says Bender, the exact composition of GPT-3’s training data is a mystery and Microsoft is claiming that the system “understands” language — a framing that is strongly disputed by many experts. “It is concerning to me that Microsoft is leaning in to this kind of AI hype in order to sell this product,” said Bender.

But although Microsoft’s GPT-3 filters may be unproven, it can avoid a lot of trouble by simply selecting its customers carefully. Large language models are certainly useful as long as their output is checked by humans (though this requirement does negate some of the promised gains in efficiency). As Bender notes, if Azure OpenAI Service is just helping to write “communication aimed at business executives,” it’s not too problematic.

“I would honestly be more concerned about language generated for a video game character,” she says, as this implementation would likely run without human oversight. “I would strongly recommend that anyone using this service avoid ever using it in public-facing ways without extensive testing ahead of time and humans in the loop.”

Repost: Original Source and Author Link

Categories
AI

For AI model success, utilize MLops and get the data right

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!


It’s critical to adopt a data-centric mindset and support it with ML operations 

Artificial intelligence (AI) in the lab is one thing; in the real world, it’s another. Many AI models fail to yield reliable results when deployed. Others start well, but then results erode, leaving their owners frustrated. Many businesses do not get the return on AI they expect. Why do AI models fail and what is the remedy? 

As companies have experimented with AI models more, there have been some successes, but numerous disappointments. Dimensional Research reports that 96% of AI projects encounter problems with data quality, data labeling and building model confidence.

AI researchers and developers for business often use the traditional academic method of boosting accuracy. That is, hold the model’s data constant while tinkering with model architectures and fine-tuning algorithms. That’s akin to mending the sails when the boat has a leak — it is an improvement, but the wrong one. Why? Good code cannot overcome bad data.

Instead, they should ensure the datasets are suited to the application. Traditional software is powered by code, whereas AI systems are built using both code (models + algorithms) and data. Take facial recognition, for instance, in which AI-driven apps were trained on mostly Caucasian faces, instead of ethnically diverse faces. Not surprisingly, results were less accurate for non-Caucasian users. 

Good training data is only the starting point. In the real world, AI applications are often initially accurate, but then deteriorate. When accuracy degrades, many teams respond by tuning the software code. That doesn’t work because the underlying problem was changing real-world conditions. The answer: to increase reliability, improve the data rather than the algorithms. 

Since AI failures are usually related to data quality and data drifts, practitioners can use a data-centric approach to keep AI applications healthy. Data is like food for AI. In your application, data should be a first-class citizen. Endorsing this idea isn’t sufficient; organizations need an “infrastructure” to keep the right data coming. 

MLops: The “how” of data-centric AI

Continuous good data requires ongoing processes and practices known as MLops, for machine learning (ML) operations. The key mission of MLops: make high-quality data available because it’s essential to a data-centric AI approach.

MLops works by tackling the specific challenges of data-centric AI, which are complicated enough to ensure steady employment for data scientists. Here is a sampling: 

  • The wrong amount of data: Noisy data can distort smaller datasets, while larger volumes of data can make labeling difficult. Both issues throw models off. The right size of dataset for your AI model depends on the problem you are addressing. 
  • Outliers in the data: A common shortcoming in data used to train AI applications, outliers can skew results. 
  • Insufficient data range: This can cause an inability to properly handle outliers in the real world. 
  • Data drift: Which often degrades model accuracy over time. 

These issues are serious. A Google survey of 53 AI practitioners found that “data cascades—compounding events causing negative, downstream effects from data issues — triggered by conventional AI/ML practices that undervalue data quality… are pervasive (92% prevalence), invisible, delayed, but often avoidable.”

How does MLOps work?

Before deploying an AI model, researchers need to plan to maintain its accuracy with new data. Key steps: 

  • Audit and monitor model predictions to continuously ensure that the outcomes are accurate
  • Monitor the health of data powering the model; make sure there are no surges, missing values, duplicates, or anomalies in distributions.
  • Confirm the system complies with privacy and consent regulations
  • When the model’s accuracy drops, figure out why

To practice good MLops and responsibly develop AI, here are several questions to address: 

  • How do you catch data drifts in your pipeline? Data drift can be more difficult to catch than data quality shortcomings. Data changes that appear subtle may have an outsized impact on particular model predictions and particular customers.
  • Does your system reliably move data from point A to B without jeopardizing data quality? Thankfully, moving data in bulk from one system has become much easier, as tools for ML improve.
  • Can you track and analyze data automatically, with alerts when data quality issues arise? 

MLops: How to start now

You may be thinking, how do we gear up to address these problems? Building an MLops capability can begin modestly, with a data expert and your AI developer. As an early days discipline, MLops is evolving. There is no gold standard or approved framework yet to define a good MLops system or organization, but here are a few fundamentals:

  • In developing models, AI researchers need to consider data at each step, from product development through deployment and post-deployment. The ML community needs mature MLops tools that help make high-quality, reliable and representative datasets to power AI systems.
  • Post-deployment maintenance of the AI application cannot be an afterthought. Production systems should implement ML-equivalents of devops best practices including logging, monitoring and CI/CD pipelines which account for data lineage, data drifts and data quality. 
  • Structure ongoing collaboration across stakeholders, from executive leadership, to subject-matter experts, to ML/Data Scientists, to ML Engineers, and SREs.

Sustained success for AI/ML applications demands a shift from “get the code right and you’re done” to an ongoing focus on data. Systematically improving data quality for a basic model is better than chasing state-of-the-art models with low-quality data.

Not yet a defined science, MLops encompasses practices that make data-centric AI workable. We will learn much in the upcoming years about what works most effectively. Meanwhile, you and your AI team can proactively – and creatively – devise an MLops framework and tune it to your models and applications. 

Alessya Visnijc is the CEO of WhyLabs

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers

Repost: Original Source and Author Link

Categories
AI

Naver’s large language model is powering shopping recommendations

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more


In June, Naver, the Seongnam, South Korea-based company that operates the eponymous search engine Naver, announced that it had trained one of the largest AI language models of its kind, called HyperCLOVA. Naver claimed that the system learned 6,500 times more Korean data than OpenAI’s GPT-3 and contained 204 billion parameters, the parts of the machine learning model learned from historical training data. (GPT-3 has 175 billion parameters.)

HyperCLOVA was seen as a notable achievement because of the scale of the model and since it fits into the trend of generative model “diffusion,” with multiple actors developing GPT-3-style models, like Huawei’s PanGu-Alpha (stylized PanGu-α). The benefits of large language models — including the ability to generate human-like text for marketing and customer support purposes — were previously limited to English because companies lacked the resources to train these models in other languages.

In the months since HyperCLOVA was developed, Naver has begun using it to personalize search results on the Naver platform, Naver executive officer Nako Sung told VentureBeat in an interview. It’ll also soon become available in private beta through HyperCLOVA Studio, a no-code tool that’ll allow developers to access the model for text generation and classification tasks.

“Initially used to correct typos in search queries on Naver Search, [HyperCLOVA] is now enabling many new features on our ecommerce platform, Naver Shopping, such as summarizing multiple consumer reviews into one line, recommending and curating products to user shopping preferences, or generating trendy marketing phrases for featured shopping collections,” Sung said. “We also launched CLOVA CareCall, a … conversational agent for elderly citizens who live alone. The service is based on the HyperCLOVA’s natural conversation generation capabilities, allowing it to have human-like conversations.”

Large language models

Training HyperCLOVA, which can understand English and Japanese in addition to Korean, required large-scale datacenter infrastructure, according to Sung. Naver leveraged a server cluster made up of 140 Nvidia SuperPod A100 DGX nodes, which the company claims can deliver up to 700 petaflops of compute power.

It took months to train HyperCLOVA on 2TB of Korean text data, much of which came from user-generated content on Naver’s platforms. For example, one source was Knowledge iN, a Quora-like, Korean-language community where users can ask questions on topics to receive answers from experts. Another was public blost posts from people who use free web hosting services provided through Naver.

Naver HyperCLOVA

Sung says that this differentiates HyperCLOVA from previous large language models like GPT-3, which have a limited ability to understand the nuances of languages besides English. He claims that by having the model draw on the “collective intelligence of Korean culture and society,” it can better serve Korean users — and at the same time reduce Naver’s dependence on other, less Asia Pacific-centric AI services.

In a recent issue of his Import AI newsletter, former OpenAI policy director Jack Clark asserted that because generative models ultimately reflect and magnify the data they’re trained on, different nations care a lot about how their own culture is represented in these models. “[HyperCLOVA] is part of a general trend of different nations asserting their own AI capacity [and] capability via training frontier models like GPT-3,” he continued. “[We’ll] await more technical details to see if [it’s] truly comparable to GPT-3.”

Some experts have argued that because the companies developing influential AI systems are predominantly located in the U.S., China, and the E.U., a disproportionate share of economic benefit will fall inside these regions — potentially exacerbating inequality. In an analysis of publications at two major machine learning conferences, NeurIPS 2020 and ICML 2020, none of the top 10 countries in terms of publication index were located in Latin America, Africa, or Southeast Asia. Moreover, a recent report from Georgetown University’s Center for Security and Emerging Technology found that while 42 of the 62 major AI labs are located outside of the U.S., 68% of the staff are located within the United States.

“These large amounts of collective intelligence are continuously enriching and fortifying HyperCLOVA,” Sung said. “The most well-known hyperscale language model is GPT-3, and it is trained mainly with English data, and is only taught 0.016% of Korean data out of the total input … [C]onsidering the impact of hyperscale AI on industries and economies in the near future, we are confident that building a Korean language-based AI is very important for Korea’s AI sovereignty.”

Challenges in developing models

Among others, leading AI researcher Timnit Gebru has questioned the wisdom of building large language models, examining who benefits from them and who is harmed. It’s well-established that models can amplify the biases in data on which they were trained, and the effects of model training on the environment have been raised as serious concerns.

To address the issues around bias, Sung says that Naver is in discussions with “external experts” including researchers at Seoul National University’s AI Policy Initiative and plans to form an advisory committee on AI ethics in Korea this year. The company also released a benchmark — Korean Language Understanding Evaluation (KLUE) — to evaluate the natural language understanding capabilities of Korean language models including HyperCLOVA.

“We recognize that while AI can make our lives convenient, it is also not infallible like all other technologies used today,” he added. “While pursuing convenience in the service we provide, Naver will also endeavor to explain our AI service in a manner that users can easily understand upon their request or when necessary … We will pay attention to safety during all stages of designing and testing our services, including after the service is deployed, to prevent a situation where AI as a daily tool threatens life or causes physical harm to people.”

Real-world applications

Currently, Naver says that HyperCLOVA is being tapped for various Naver services including Naver Smart Stores, the company’s ecommerce marketplace, where it’s “correcting” the names of products by generating “more attractive” names versus the original search-engine-optimized SKUs. In another ecommerce use case, Naver is applying HyperCLOVA to create product recommendation systems tailored to shoppers’ individual preferences.

Naver HyperCLOVA

“While HyperCLOVA doesn’t specifically learn users’ purchase logs, we discovered that it was able to recommend products on our marketplace to some extent. So, we fine-tuned this capability and introduced it as one of our ecommerce features. Unlike the existing recommendation algorithms, this model shows the ‘generalized’ ability to perform well on cold items, cold users and cold services,” Sung said. “Recommending a certain gift to someone is not a suitable problem for traditional machine learning to solve. That’s because there is no information about the recipient of the gift … [But] with HyperCLOVA, we were able to make this experience possible.”

HyperCLOVA is also powering an AI-driven call service for senior citizens who live alone, which Naver says it plans to refine to provide more personalized conversations in the future. Beyond this, Naver says it’s developing a multilingual version of HyperCLOVA that can understand two or more languages at the same time and an API that will allow developers to build apps and services on top of the model.

The pandemic has accelerated the world’s digital transformation, pushing businesses to become more reliant on software to streamline their processes. As a result, the demand for natural language technology is now higher than ever — particularly in the enterprise. According to a 2021 survey from John Snow Labs and Gradient Flow, 60% of tech leaders indicated that their natural language processing budgets grew by at least 10% compared to 2020, while a third — 33% — said that their spending climbed by more than 30%.

The global NLP market is expected to climb in value to $35.1 billion by 2026.

“The most interesting thing about HyperCLOVA is that its usability is not limited only to AI experts, such as engineers and researchers, but it has also been used by service planners and business managers within our organization. Most of the winners [in a recent HyperCLOVA hackathon] were from non-AI developer positions, which I believe proves that HyperCLOVA’s no-code AI platform will empower everyone with AI capabilities, significantly accelerating the speed of AI transformation and changing its scope in the future.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

New Relic allows enterprises to monitor ML model performance

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more


San Francisco-based New Relic, a company that offers a cloud-based observability platform to help enterprises visualize, analyze, and optimize their entire software stack, has announced a solution to monitor the performance and accuracy of machine learning models in real-time.

In today’s data-driven landscape, organizations are heavily leaning towards AI and machine learning applications to improve business resilience and gain a competitive advantage. A recent survey conducted by IBM revealed that almost one-third of businesses are now using artificial intelligence, and as many as 43% have accelerated the rollout of AI as a result of COVID-19.

However, as the adoption continues to increase, the gap between data science teams developing ML models and DevOps teams operating those models is also increasing. The reason? Most engineers build and train models in siloed environments, resulting in reduced collaboration to monitor and govern the models in production. Such situations mean teams could fail to notice models that might be becoming irrelevant over time, particularly models based on static data, and consequently lose out on millions.

New Relic integrates model performance monitoring

To prevent this, New Relic is extending the capabilities of its flagship observability platform — New Relic One. The company said on Wednesday that the solution can now be enhanced with model performance monitoring integrations, providing data science and DevOps teams a single place to monitor and visualize model performance telemetry data, including critical signals such as recall, precision, and accuracy.

The platform, as New Relic’s General Manager for AIOps Guy Fighel explained in a blog post, is getting support to integrate popular MLOps frameworks such as AWS SageMaker, DataRobot (Algorithmia), Aporia, Superwise, Comet, Dagshub, Mona, and TruEra. Each of these would appear within New Relic Instant Observability (I/O) — an open-source ecosystem of quickstarts, integrations, and resources in New Relic One — and could be integrated within minutes, complete with custom performance dashboards and other observability building blocks.

This will ultimately allow companies to monitor their ML models and interdependencies with the rest of the application components and make necessary changes to ensure that the algorithms remain relevant in the long run — for maximum business impact.

New Relic also notes that data science and DevOps teams can use the offering to enable predictive alerts for unusual model-related changes in advance. This way, once the issue is detected, they could collaborate in the production environment to contextualize the situation and take decisions to address the problem.

“We are committed to making observability a daily best practice for every engineer, and with the launch of New Relic Model Performance Monitoring, we deliver the only unified data observability platform that gives Data Science and DevOps teams unprecedented visibility into the performance of their machine-learning-based applications,” Fighel said.

Growing space

The development comes as the latest step from New Relic to strengthen its footprint in the enterprise observability space and take on players like Dynatrace and DataDog. Back in February, the company had added a visualization tool called Explorer to make it simpler for IT professionals to discover the root cause of issues.

Globally, the IT monitoring and observability market is estimated to be a $17 billion market opportunity.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Amazon launches SageMaker Canvas for no-code AI model development

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more


During a keynote address today at its re:Invent 2021 conference, Amazon announced SageMaker Canvas, which enables users to create machine learning models without having to write any code. Using SageMaker Canvas, Amazon Web Services (AWS) customers can run a machine learning workflow with a point-and-click user interface to generate predictions and publish the results.

Low- and no-code platforms allow developers and non-developers alike to create software through visual dashboards instead of traditional programming. Adoption is on the rise, with a recent OutSystems report showing that 41% of organizations were using a low- or no-code tool in 2019/2020, up from 34% in 2018/2019.

“Now, business users and analysts can use Canvas to generate highly accurate predictions using an intuitive, easy-to-use interface,” AWS CEO Adam Selipsky said onstage. “Canvas uses terminology and visualizations already familiar to [users] and complements the data analysis tools that [people are] already using.”

AI without code

With Canvas, Selipsky says that customers can browse and access petabytes of data from both cloud and on-premises data sources, such as Amazon S3, Redshift databases, as well as local files. Canvas uses automated machine learning technology to create models, and once the models are created, users can explain and interpret the models and share the models with each other to collaborate and enrich insights.

“With Canvas, we’re making it even easier to prepare and gather data for machine learning to train models faster and expand machine learning to an even broader audience,” Selipsky added. “It’s really going to enable a whole new group of users to leverage their data and to use machine learning to create new business insights.”

Canvas follows on the heels of SageMaker improvements released earlier in the year, including Data Wrangler, Feature Store, and Pipelines. Data Wrangler recommends transformations based on data in a target dataset and applies these transformations to features. Feature Store acts as a storage component for features and can access features in either batches or subsets. As for Pipelines, it allows users to define, share, and reuse each step of an end-to-end machine learning workflow with preconfigured customizable workflow templates while logging each step in SageMaker Experiments.

With upwards of 82% of firms saying that custom app development outside of IT is important, Gartner predicts that 65% of all apps will be created using low- and no-code platforms like Canvas by 2024. Another study reports that 85% of 500 engineering leaders think that low- and no-code will be commonplace within their organizations as soon as 2021.

If the current trend holds, the market for low- and no-code could climb to between $13.3 billion and $17.7 billion in 2021 and between $58.8 billion and $125.4 billion in 2027.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Microsoft’s Tutel optimizes AI model training

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more


Let the OSS Enterprise newsletter guide your open source journey! Sign up here.

Microsoft this week announced Tutel, a library to support the development of mixture of experts (MoE) models — a particular type of large-scale AI model. Tutel, which is open source and has been integrated into fairseq, one of Facebook’s toolkits in PyTorch, is designed to enable developers across AI disciplines to “execute MoE more easily and efficiently,” a statement from Microsoft explained.

MoE are made up of small clusters of “neurons” that are only active under special, specific circumstances. Lower “layers” of the MoE model extract features and experts are called upon to evaluate those features. For example, MoEs can be used to create a translation system, with each expert cluster learning to handle a separate part of speech or special grammatical rule.

Compared with other model architectures, MoEs have distinct advantages. They can respond to circumstances with specialization, allowing the model to display a greater range of behaviors. The experts can receive a mix of data, and when the model is in operation, only a few experts are active — even a huge model needs only a small amount of processing power.

In fact, MoE is one of the few approaches demonstrated to scale to more than a trillion parameters, paving the way for models capable of powering computer vision, speech recognition, natural language processing, and machine translation systems, among others. In machine learning, parameters are the part of the model that’s learned from historical training data. Generally speaking, especially in the language domain, the correlation between the number of parameters and sophistication has held up well.

Tutel mainly focuses on the optimizations of MoE-specific computation. In particular, the library is optimized for Microsoft’s new Azure NDm A100 v4 series instances, which provide a sliding scale of Nvidia A100 GPUs. Tutel has a “concise” interface intended to make it easy to integrate into other MoE solutions, Microsoft says. Alternatively, developers can use the Tutel interface to incorporate standalone MoE layers into their own DNN models from scratch.

A line graph comparing the end-to-end performance of Meta’s MoE language model using Azure NDm A100 v4 nodes with and without Tutel. The x-axis is the number of A100 (80GB) GPUs, beginning at 8 and going up to 512, and the y-axis is the throughput (K tokens/s), beginning with 0 and going up to 1,000 in intervals of 100. Tutel always achieves higher throughput than fairseq.

Above: For a single MoE layer, Tutel achieves an 8.49 times speedup on an NDm A100 v4 node with 8 GPUs and a 2.75 times speedup on 64 NDm A100 v4 nodes with 512 A100 GPUs, Microsoft claims.

“Because of the lack of efficient implementations, MoE-based models rely on a naive combination of multiple off-the-shelf operators provided by deep learning frameworks such as PyTorch and TensorFlow to compose the MoE computation. Such a practice incurs significant performance overheads thanks to redundant computation,” Microsoft wrote in a blog post. (Operators provide a model with a known dataset that includes desired inputs and outputs). “Tutel designs and implements multiple highly optimized GPU kernels to provide operators for MoE-specific calculation.”

Tutel is available in open source on GitHub. Microsoft says that the Tutel development team will “be actively integrating” various emerging MoE algorithms from the community into future releases.

“MoE is a promising technology. It enables holistic training based on techniques from many areas, such as systematic routing and network balancing with massive nodes, and can even benefit from GPU-based acceleration. We demonstrate an efficient MoE implementation, Tutel, that resulted in significant gain over the fairseq framework. Tutel has been integrated [with our] DeepSpeed framework, as well, and we believe that Tutel and related integrations will benefit Azure services, especially for those who want to scale their large models efficiently,” Microsoft added.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

OpenAI rival Cohere launches language model API

Cohere, a startup creating large language models to rival those from OpenAI and AI2Labs, today announced the general availability of its commercial platform for app and service development. Through an API, customers can access models fine-tuned for a range of natural language applications, in some cases at a fraction of the cost of rival offerings.

The pandemic has accelerated the world’s digital transformation, pushing businesses to become more reliant on software to streamline their processes. As a result, the demand for natural language technology is now higher than ever — particularly in the enterprise. According to a 2021 survey from John Snow Labs and Gradient Flow, 60% of tech leaders indicated that their natural language processing (NLP) budgets grew by at least 10% compared to 2020, while a third — 33% — said that their spending climbed by more than 30%.

The global NLP market is expected to climb in value from $11.6 billion in 2020 to $35.1 billion by 2026.

“Language is essential to humanity and arguably its single greatest invention — next to the development of computers. Ironically, computers still lack the ability to fully comprehend language, finding it difficult to parse the syntax, semantics, and context that all work together to give words meaning,” Cohere CEO Aidan Gomez told VentureBeat via email. “However, the latest in NLP technology is continuously improving our ability to communicate seamlessly with computers.”

Cohere

Headquartered in Toronto, Canada, Cohere was founded in 2019 by a pedigreed team including Gomez, Ivan Zhang, and Nick Frosst. Gomez, a former intern at Google Brain, coauthored the academic paper “Attention Is All You Need,” which introduced the world to a fundamental AI model architecture called the Transformer. (Among other high-profile systems, OpenAI’s GPT-3 and Codex are based on the Transformer architecture.) Zhang, alongside Gomez, is a contributor at FOR.ai, an open AI research collective involving data scientists and engineers. As for Frosst, he, like Gomez, worked at Google Brain, publishing research on machine learning alongside Turing Award winner Geoffrey Hinton.

In a vote of confidence, even before launching its commercial service, Cohere raised $40 million from institutional venture capitalists as well as Hinton, Google Cloud AI chief scientist Fei-Fei Li, UC Berkeley AI lab co-director Pieter Abbeel, and former Uber autonomous driving head Raquel Urtasun. “Very large language models are now giving computers a much better understanding of human communication. The team at Cohere is building technology that will make this revolution in natural language understanding much more widely available,” Hinton said in a statement to Fast Company in September.

Unlike some of its competitors, Cohere offers two types of English NLP models, generation and representation, in languages that include Large, Medium, Small. The generation models can complete tasks involving generating text — for example, writing product descriptions or extracting document metadata. By contrast, the representational models are about understanding language, driving apps like semantic search, chatbots, and sentiment analysis.

Intro to Large Language Models with Cohere | Cohere API Documentation

Cohere is already providing the NLP capability for Ada, a company in the chatbot space. Ada leverages a Cohere model to match customer chat requests with available support information.

“By being in both [the generative and representative space], Cohere has the flexibility that many enterprise customers need, and can offer a range of model sizes that allow customers to choose the model that best fits their needs across the spectrums of latency and performance,” Gomez said. “[Use] cases across industries include the ability to more accurately track and categorize spending, expedite data entry for medical providers, or leverage semantic search for legal cases, insurance policies and financial documents. Companies can easily generate product descriptions with minimal input, draft and analyze legal contracts, and analyze trends and sentiment to inform investment decisions.”

To keep its technology relatively affordable, Cohere charges access on a per-character basis based on the size of the model and the number of characters apps use (ranging from $0.0025 to $0.12 per 10,000 characters for generation and $0.019 per 10,000 characters for representation). Only the generate models charge on input and output characters, while other models charge on output characters. All fine-tuned models, meanwhile — i.e., models tailored to particular domains, industries, or scenarios — are charged at two times the baseline model rate.

“The problem remains that the only companies able to capitalize on NLP technology require seemingly bottomless resources in order to access the technology for large language models — which is due to the cost of these models ranging from the tens to hundreds of millions of dollars to build,” Gomez said. “Cohere is easy-to-deploy. With just three lines of code, companies can apply [our] full-stack engine to power all their NLP needs. The models themselves are … already pre-trained.”

Intro to Large Language Models with Cohere | Cohere API Documentation

To Gomez’s point, training and deploying large language models into production isn’t an easy feat, even for enterprises with massive resources. For example, Nvidia’s recently released Megatron 530B model was originally trained across 560 Nvidia DGX A100 servers, each hosting 8 Nvidia A100 80GB GPUs. Microsoft and Nvidia say that they observed between 113 to 126 teraflops per second per GPU while training Megatron 530B, which would put the training cost in the millions of dollars. (A teraflop rating measures the performance of hardware including GPUs.)

Inference — actually running the trained model — is another challenge. On two of its costly DGX SuperPod systems, Nvidia claims that inference (e.g., autocompleting a sentence) with Megatron 530B only takes half a second. But it can take over a minute on a CPU-based on-premises server. While cloud alternatives might be cheaper, they’re not dramatically so — one estimate pegs the cost of running GPT-3 on a single Amazon Web Services instance at a minimum of $87,000 per year.

Training the models

To build Cohere’s models, Gomez says that the team scrapes the web and feeds billions of ebooks and web pages (e.g., WordPress, Tumblr, Stack Exchange, Genius, the BBC, Yahoo, and the New York Times) to the models so that they learn to understand the meaning and intent of language. (The training dataset for the generation models amounts to 200GB dataset after some filtering, while the dataset for the representation models, which wasn’t filtered, totals 3TB.) Like all AI models, Cohere’s trains by ingesting a set of examples to learn patterns among data points, like grammatical and syntactical rules.

It’s well-established that models can amplify the biases in data on which they were trained. In a paper, the Middlebury Institute of International Studies’ Center on Terrorism, Extremism, and Counterterrorism claims that GPT-3 and similar models can generate text that might radicalize people into far-right extremist ideologies. A group at Georgetown University has used GPT-3 to generate misinformation, including stories around a false narrative, articles altered to push a bogus perspective, and tweets riffing on particular points of disinformation. Other studies, like one published by Intel, MIT, and Canadian AI initiative CIFAR researchers in April, have found high levels of stereotypical bias from some of the most popular open source models, including Google’s BERT and   XLNet and Facebook’s RoBERTa.

Generation | Cohere API Documentation

Cohere, for its part, claims that it’s committed to safety and trains its models “to minimize bias and toxicity.” Customers must abide by the company’s usage guidelines or risk having their access to the API revoked. And Cohere — which has an external advisory council in addition to an internal safety team — says that it plans to monitor “evolving risks” with tools designed to identify harmful outputs.

But Cohere’s NLP models aren’t perfect. In its documentation, the company admits that the models might generate “obscenities, sexually explicit content, and messages that mischaracterize or stereotype groups of people based on problematic historical biases perpetuated by internet communities.” For example, when fed prompts about people, occupations, and political/religious ideologies, the API’s output could be toxic 5 to 6 times per 1,000 generations and discuss men twice as much as it does women, Cohere says. Meanwhile, the Otter model in particular tends to associate men and women with stereotypically “male” and “female” occupations (e.g., male scientist versus female housekeeper).

In response, Gomez says that the Cohere team “puts substantial effort into filtering out toxic content and bad text,” including running adversarial attacks and measuring the models against safety research benchmarks. “[F]iltration is done at the keyword and domain levels in order to minimize bias and toxicity,” he added. “[The team has made] meaningful progress that sets Cohere apart from other [companies developing] large language models …  [W]e’re confident in the impact it will have on the future of work over the course of this transformative era.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

AI model development platform Abacus.ai lands $50M

Abacus.ai, a platform creating dev tools to develop and deploy enterprise AI technologies, today announced that it raised $50 million in a series C round led by Tiger Global with participation from Coatue, Index Partners, and Alkeon Ventures. The raise brings the company’s total funding to $90.3 million to date, and CEO Bindu Reddy says it’ll be used to further develop Abacus’ AI technologies while growing the company’s workforce.

While the percentage of firms investing greater than $50 million in big data and AI initiatives reached 64.8% in 2020 (up from 39.7% in 2018), organizations of all sizes still struggle to implement AI expeditiously — and successfully. About 80% of AI projects never reach deployment, according to Gartner, and those that do are only profitable about 60% of the time.

Founded in 2019 by Arvind Sundararajan, Siddartha Naidu, and Reddy, Abacus provides a service for organizations to develop AI models via modules that can stream, monitor, debias, merge, store, and transform data. According to Reddy, users without advanced data science knowledge and limited budgets can use it to iterate end-to-end systems comparable to Twitter’s and TikTok’s content feeds and Gmail’s autocomplete feature.

“We have seen rapid adoption of our platform as customers generate orders of magnitude more data, move all their operations to the digital realm, and are looking to AI models to make decisions,” Reddy told VentureBeat via email. “We will soon see an inflection point in AI adoption, as it becomes easier and easier to develop models and operationalize them.”

AutoML

Abacus embraces elements of “AutoML,” or the process of automating the application of machine learning to real-world problems. AutoML covers the complete pipeline, from raw datasets to deployable machine learning models, and data science teams are increasingly adopting it to overcome blockers in their organizations. Forrester reports that 25% of data and analytics decision makers whose firms are adopting AI said that they’re planning to implement AutoML software within the next year. Sixty-one percent said that they’d already implemented AutoML software or are were in the process of implementing it, according to the study.

Abacus conducts research and offers cloud AI services to help companies embed machine learning models into their processes. Customers pick a use case and point to their data, after which Abacus’ engine creates an AI system that can be used to make and share predictions.

Abacus

Above: Abacus’ model management dashboard.

Image Credit: Abacus

Abacus says its system applies the startup’s research on generative models and neural architecture search to deal with noisy or incomplete data. It ostensibly identifies the best neural network that models a customer’s proprietary dataset and use cases spanning IT operations, marketing and sales, fraud and security, and forecasting and planning.

In addition, the system is good at configuring pipelines, scheduling model retraining on new data, provisioning model serving from raw data, and providing explanations for models’ predictions, Reddy says. “Common enterprise AI use cases like churn modeling, lead scoring, and anomaly detection have seen exponential growth [on our platform],” she added. “The pandemic has been great for AI companies — and specifically for us.”

Pulling from multiple data sources

Beyond the new funding, Abacus today announced what it’s calling “vision AI-as-a-service,” along with support for hybrid AI models that can generate predictions from language, vision, and tabular data. According to Reddy, customers can now use a combination of datasets to create models that extract intelligence from all of the available data on hand.

“For example, you can predict the closing price of homes based on unstructured data like listing description and house photos along with structured tabular data including number of bedrooms, bathrooms, and more by combining all this data and using the Abacus predictive workflow to generate a hybrid predictive model that combines all the data types,” Reddy explained. “This is a powerful way to extract intelligence from data.”

Despite competition from platforms like Amazon SageMaker, Google’s Cloud AutoML, and startups such as DataRobot and H2O.ai, Abacus says that over 10,000 developers across more than 6,000 customers including 1-800-Flowers have used its products to train roughly 20,000 real-time personalization, time-series forecasting, and anomaly detection models to date. The San Francisco, California-based company currently has 45 employees and plans to expand to 80 by the end of the year.

“Abacus has several vertically integrated workflows for common enterprise use cases, including natural language processing,” Reddy continued. “The new money is going to be used to continue to build out more vertical use cases like computer vision and to create more horizontal platform capabilities such as machine learning and deep learning operations modules.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link