Categories
AI

How Moveworks’ AI platform broke through the multilingual NLP barrier 

Chatbots have a checkered past of often not delivering the performance their providers have promised. This is especially true in the IT service management (ITSM) and multilingual NLP spaces, where service desks found support teams deluged with complaints — yes, about the support chatbots.

Just getting English language nuance right and how enterprises communicate often require chatbots to be custom programmed with constraint and logic workflows supported with natural language processing (NLP) and machine learning. If that sounds like a science project, it is, and IT users are the test subjects. Because of their complexity, chatbots were contributing to already overflowing trouble-ticket queues.

Moveworks’ announcement this week about supporting French, Italian, German, Spanish, and Portuguese languages on its platform broke through this multilingual NLP barrier. The company thinks its approach to scaling languages in an enterprise context — using no scripting or system-integration prework — is the future of conversational AI in the workplace.

Moreover, the platform goes well beyond translation to understand support issues in any language, leveraging bespoke natural language understanding (NLU) models trained to make sense of enterprise jargon. To resolve requests end-to-end, Moveworks employs techniques such as cross-lingual information retrieval, which ranks all available answers according to their relevance.

Flowchart of how Moveworks matches responses to the users' preferred language, in this example either English, Portuguese, or German.

Above: Moveworks uses context to surface the most relevant answer to each employee in their language of choice.

Multilingual NLP at scale

Moveworks took a different approach than its competitors when creating its AI platform. The company designed it to streamline workflows and break down the barriers holding companies back from automating many common support issues. It based its approach on an intelligence engine that serves as the foundation of the conversational AI platform; the multilingual NLP package includes a series of machine-learning models synchronized around the nuance and meaning of words in an enterprise context. That is tedious and hard and to do reliably, which is why no one else has accomplished it yet with an AI platform.

Rest assured that a lot of companies are trying. The following gaggle of vendors from Gartner’s Hype Cycle for Natural Language Technologies, 2021, has developed chatbots for production: Amazon, Amelia, Cognigy, Google, IBM, Kore.ai, Microsoft, Nuance, Pypestream, ServisBot, and Uniphore. There are certainly others working on this functionality.

One factor contributing to the checkered track record of conventional chatbots is the amount of setup and training involved; it’s yet another new system that service desk IT staffs need to configure and employees need to learn. By contrast, Moveworks’ approach learns each company’s terminology on the fly and factors in an employee’s location, department, and language preference without requiring any setup or training.

Moveworks uses 250 million issues to train its machine-learning models. This technique, among others, enables the Moveworks AI platform to adapt to language preferences in real time and ensure natural back-and-forth communication, the company says, even when requests include multiple languages. In addition, Moveworks employs machine learning to decide when to adjust and switch between languages to get users a solution as quickly as possible. The result, the company claims, is a real-time response to any employee, across any communication channel, at any time, in their native language.

A diagram of the Moveworks bot adjusting to the stated and implied needs of a French-speaking Canadian user named Jeanne.

Above: The Moveworks Intelligence Engine uses machine learning to determine the underlying structure of employees’ support issues, across multiple languages.

Real-world results

The truest tests of any conversational AI platform are the adoption rates, interactions, and cost savings the platform delivers to enterprise accounts. Moveworks has successfully developed an approach that provides enterprises the ability to track their platforms’ contributions — a core requirement for becoming part of the ongoing workflows of any business. Leading customers include Broadcom, DocuSign, and Western Digital.

“As a global company, we need to provide the same quality of support to every employee at Albemarle to empower their potential, no matter which languages they speak,” said Patrick Thompson, CIO at Albemarle, a global specialty-chemicals maker. “Moveworks gives our people 24/7 help in their native language, just by having a natural conversation with the bot. Now, they can get support right away, without us needing localized service desks in each location.”

In addition, LinkedIn, Palo Alto Networks, Slack, Hearst, Autodesk, Broadcom, and other Moveworks customers are deploying multilingual support. Albemarle is among the pilot customers to first deploy support.

Additional customer results include the following:

  • Palo Alto Networks: More than 90% of employees use Moveworks, and they have 122,000 interactions per day with its Moveworks bot, Sheldon.
  • DocuSign: It boasts 89% employee adoption of Moveworks. Its bot, Hearo, handles the workload equivalent of eight full-time help desk agents, freeing them up for high-impact projects. Saran Mandair, its VP of global IT, said, “Moveworks provides the automation we need to focus on the challenging projects that matter.”
  • Unity: Ninety-one percent of its employees express satisfaction with Moveworks, and Unity has 92% adoption. Its bot, Ninja Unicorn, allowed CIO Brian Hoyt to expand support to 45 offices worldwide, “keeping up the same quality and speed without increasing headcount.”
  • Verisk: Verisk boasts 96% adoption of its Moveworks bot, Vic. David Lewis, AVP of computer services, said, “Moveworks has meant so much more than cost savings. The bot has completely changed our employee experience; it’s easily the best business decision I’ve made.”

Gartner Research senior director analyst Annette Jump wrote in a whitepaper, “Emerging Technologies: Top Use Cases for Conversational UI,” (May 2020): “Despite the growing proliferation of CUI uses, Gartner research indicates that the two most mature and prevalent use cases are chatbots for customer support (phone, mobile and a variety of online platforms) and call centers.” These are the two markets to which Moveworks is bringing multilingual support.

Gartner analysts Steve White and Venkat Rayapudi wrote in an ITSM best-practices report last March, “Solutions such as ServiceNow and Moveworks offer intelligent routing capabilities natively. These tools leverage natural-language understanding and ML to process structured data from machines and/or unstructured data from humans to automatically classify and route incident tickets to the assignee or invoke the automation with the greatest likelihood of being able to resolve the incident.”

Conclusion

Conversational AI needs to be just that: conversational. To support a global workforce, companies must ensure that their AI is as easy to use in multiple languages, with no system integration or expensive customization needed. With six languages launched and more on the way in 2022, Moveworks is a company to watch for any enterprise with an international presence.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Capital One uses NLP to discuss potential fraud with customers over SMS

Join executive leaders at the Conversational AI & Intelligent AI Assistants Summit, presented by Five9. Watch now!


Capital One has a 99% success rate when it comes to understanding customer responses to an SMS fraud alert, according to Ken Dodelin, the company’s VP of mobile, web, conversational AI, and messaging products. Dodelin was speaking today about how the bank harnesses the power of personalization and automation in a conversation with VentureBeat senior reporter Sage Lazzaro at VentureBeat’s Transform 2021 virtual conference.

When Capital One notices an anomaly in a customer’s transactions, it reaches out over SMS and asks the customer to verify those transaction details. If the customer doesn’t recognize the transaction, Capital One can proceed to treat it as fraudulent.

By adding a third-party natural language processing/understanding solution, the AI assistant Eno is able to understand customers’ written responses, such as “that was me shopping in Philadelphia,” which is not easy for machines to understand, Dodelin said.

Capital One first considered using AI for customer service in 2016, when the company was among the early recipients of Amazon Echo devices. Back then, Amazon was searching for partners across industries to see how they might create a conversational experience. Capital One said it became the first bank to develop a skill — a special program that enabled customers to accomplish tasks on Amazon platforms. In the following years, Capital One started to incorporate natural language understanding into its SMS alerts, as well as its website and mobile apps.

The current AI assistant has evolved a lot from the initial version, Dodelin said. First, the assistant is available to chat with customers in more places. Whether the customer is inquiring about the bank or a car loan, they have an opportunity to ask questions about their account. Second, the company said it hasn’t restricted chats to conversations initiated by the customer. The company relies on its advanced data infrastructure to anticipate customer needs and reach out proactively, either through push notifications or email. The interaction would include important information and actions customers would expect from a human assistant.

One challenge Capital One had to address was what to do if the customer wanted something not included in the options displayed on the screen. “Now we have to not just design experiences for the things we expect them to get, but continuously learn about all the other things that are coming in and the different ways they are coming in,” Dodelin said.

Context matters when applying AI technologies to customer service. In many cases, scripts are relatively consistent, regardless of who the customer is or their specific circumstances. But when creating an experience, it is important to remember that customers are being contacted under very different circumstances. Levity may not be appropriate during a moment of emotional and financial stress, for example.

Capital One has continued to enhance the service so it will proactively anticipate where a customer might need help and respond in an appropriate tone, Dodelin said.

Another challenge is anticipating the breadth of questions customers have. Customers who encounter issues often lack an outlet to express their frustration beyond having a human assistant pick up the phone, he said. Learning more about those experiences helps the AI assistant provide better answers and lets the company adjust which options are included in the user interface.

“As we learn more, we got better and expanded the audience that [the AI assistant] is available to,” Dodelin said. Capital One did not make the service available to all customers but started with a small segment of its credit card business. Over time, the company has opened the service to more customers.

“It’s a lot of work done by some very talented people here at Capital One to try to make it successful in all these different circumstances,” Dodelin concluded.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

NLP needs to be open. 500+ researchers are trying to make it happen

Join executive leaders at the Conversational AI & Intelligent AI Assistants Summit, presented by Five9. Watch now!


The acceleration in Artificial Intelligence (AI) and Natural Language Processing (NLP) will have a fundamental impact on society, as these technologies are at the core of the tools many of us use on a daily basis. However, the resources necessary to create the best-performing AI and NLP models are found mainly at technology giants.

The stranglehold tech giants have on this transformative technology poses a number of problems, ranging from who decides which research gets shared to its impacts on environmental and ethical fronts. For example, while recent NLP models such as GPT3 (from OpenAI and Microsoft) show interesting behaviors from a research point of view, such models are private and only restricted access — or no access at all — is provided to many academic organizations, making it impossible to answer important questions around these models and study capabilities, limitations, potential improvements, bias, and fairness.

A group of more than 500 researchers from 45 different countries — from France, the US, and Japan to Indonesia, Ghana, and Ethiopia — has come together to work towards tackling some of these problems. The project, which the authors of this article are all involved in, is called Big Science, and our goal is to improve the scientific understanding of the capabilities and limitations of large-scale neural network models in NLP and to create a diverse and multilingual dataset and a large-scale language model as research artifacts, open to the scientific community.

BigScience was inspired by scientific creation schemes existing in other scientific fields, such as CERN and the LHC in particle physics, in which open scientific collaborations facilitate the creation of large-scale artifacts useful for the entire research community. So far, a broad range of institutions and disciplines have joined the project in its year-long effort that started in May 2021.

The project has more than 20 working groups and subgroups tackling different aspects of language modeling in parallel, some of which are closely related and interdependent. Data plays a crucial role in the process. In machine learning, a model learns to make predictions based on data it has seen before. The datasets that large language models are typically trained on are massive, mostly English-centric, and sourced from the web, which raises questions about bias, fairness, ethics, and privacy, among others.

Thus, the collective seeks to implement an intentional constitution of the training dataset to favor linguistic, geographical and social representativeness rather than the opportunistic practices that currently define the training data used in very large models. Our data effort also strives to identify the rights of the language owners, subjects, and communities. This is as much an organizational and social challenge as it is a technical challenge. The engineering and modeling groups are dedicated to determining architecture design and scaling laws, for instance, with the concrete goal of training a language model with a capacity of up to 210 billion machine learning parameters on the French Jean Zay supercomputer at IDRIS.

One of our objectives is to uncover and understand the mechanisms that enable a language model to produce valid output on any natural task description it has been given without explicitly being trained to do so (an ability known as zero-shot behavior). Another point of interest is studying how a language model can be updated through time. We also have a group of researchers working on tokenization strategies for a diverse set of languages and modeling multilinguality to ensure that all NLP capabilities are transposed to languages other than English. Others are working on the social impact, carbon footprint, data governance, and legal implications of NLP models and how to extrinsically and intrinsically evaluate them for accuracy.

As the output of this enormous effort, BigScience aims to share a very large multilingual corpus constituted in a way that is responsible, diverse, and mindful of ethical and legal issues, a large-scale multilingual language model exhibiting non-trivial zero shot behaviors in a way that is accessible to all researchers, as well as code and tools associated with these artifacts to enable easy use. Apart from that, this is an opportunity to create a blueprint on how to do large-scale research initiatives in AI. Our effort keeps evolving and growing, with more researchers joining every day, making it already the biggest open science contribution in artificial intelligence to date.

Much like the tensions between proprietary and open-source software in the early 2000s, AI is at a turning point where it can either go in a proprietary direction, where large-scale state-of-the-art models are increasingly developed internally in companies and kept private, or in an open, collaborative, community-oriented direction, marrying the best aspects of open-source and open-science. It’s essential that we make the most of this current opportunity to push AI onto that community-oriented path so that it can benefit society as a whole.

Yacine Jernite is a Research Scientist at HuggingFace. He coordinates the Data effort of the BigScience project as area chair and co-organizer of the data governance group.

Matthias Gallé leads various research teams at Naver Labs Europe, focused on developing AI for our Digital World. His focus for BigScience is on how to inspect, control, and update large pre-trained models.

Victor Sanh is a Research Scientist at Hugging Face. His research focuses on making NLP systems more robust for production scenarios and mechanisms behind generalization.

Samson Tan is a final year computer science PhD candidate at the National University of Singapore and co-chair of the Tokenization working group in BigScience.

Thomas Wolf is co-founder and Chief Science Officer of HuggingFace and co-leader of the BigScience initiative.

Suzana Ilic is a Technical Program Manager at Hugging Face, co-leading the organization of BigScience.

Margaret Mitchell is an industrial AI research scientist and co-chair of the Data Governance working group in BigScience.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Amex bets on AI and NLP for customer service

Elevate your enterprise data technology and strategy at Transform 2021.


The customer draws the AI roadmap at American Express (Amex), at least according to two of the company’s top AI leaders. When describing their latest project, Josh Pizzaro, the company’s director of AI, and Cong Liu, the VP of natural language processing and conversational AI, couldn’t stress this enough.

“We were looking to apply machine learning and advanced analytics to create frictionless and seamless customer experiences. And so when we looked across the enterprise, we looked for opportunities to inject machine learning, and we found one such opportunity in search,” Pizzaro told VentureBeat.

Contextual search is rising as a use case for natural language processing (NLP), which is booming overall. This year, Amex will debut a contextual and predictive search capability inside its app. Trained on an NLP model initially intended for the company’s customer service chatbots, the feature will “understand” various scenarios and, if all goes right, predict what customers need before they type anything at all. If a customer opens search while en route to the airport, for example, the system (equipped with their transaction and previous search data) might predict they’re looking for the lounge finder. Or in the case of a user opening search after noticing duplicate transactions, it can determine they’re likely interested in disputing a charge.

The company started the project in early 2020 and recently launched a U.K. pilot for the elevated search function, with a U.S. launch set to follow later this year. To learn more about the problem they were trying to solve, challenges they encountered, and the technology’s potential impact, VentureBeat spoke with Liu and Pizzaro.

This interview has been edited for brevity and clarity.

VentureBeat: What was the impetus for creating this? What problem were you trying to solve? 

Cong Liu: For this specific capability, what we really wanted to do is anticipate a customer’s need at any given point.

Josh Pizzaro: And I would say, from a more agnostic perspective, we started building the model because if you think about where the world was, it was in a place where we would ask our card members how they’re feeling and what they wanted. And now today, in the machine learning era, we just need to know, and we do know based on the data that we have. And so we look across the different services that we provide and try to reduce the burden on the customer, and in this case, search and present things in that contextual and fast way so they get what they want faster. Because ultimately, great customer experience is about speed.

VentureBeat: Why did you lean into AI, specifically a deep neural net? What was the decision process?

Liu: We started this journey [of leveraging AI] long before we applied machine learning to some other more mature use cases, including our fraud models and some credit risk models. And in the past couple years, especially in the past five years or so, we started to see with certainty that deep neural network models started to outperform almost every other machine learning model when it comes to high dimensional data and highly unstructured data. We not only deal with some of the traditional fields, like customer transactions, but also there are tax consequences and volume history data. Neural network models can effectively deal with all of that.

VentureBeat: What internal challenges, perceived opportunities, or other factors did you consider when launching this search project? Was there anything in particular that tipped the scale for whether or not to do this, or how to approach it?

Pizzaro: First, I think it’s really about recognizing patterns. And if you look at certain use cases where you have customer behavior that’s being repeated and you can expedite that behavior, then that tends to be a real sweet spot for machine learning capabilities. The other thing I would add is we take the decision to apply machine learning techniques quite seriously. We have an entire AI governance board that cross-checks all the models that we build for bias and privacy concerns. So even taking the approach of AI, we have to justify to a number of internal teams why it makes sense.

VentureBeat: The NLP model used to train this neural network was originally developed to advance your chatbots. What was the process of extending its use? And what did you learn about applying models created for a specific purpose to a new use case?

Liu: When we started developing this model, we started with tags and focused on improving the personalization of the data and making the bot smarter. Later, we identified it could be power search as well because both in search and in chat, the goal is to help the customers with better and more proactive services. So from a data science perspective, it’s kind of a natural extension.

Pizzaro: For what we learned, I need to take a step back and say we developed an in-house annotation team that retagged data where our models went wrong. It was all American Express customer service experts. And a lot of other folks, you know, farm this out to different companies. And what we realized is that by actually having the customer service experts tag the data, accuracy is just so much higher. So it’s an investment, but it’s an investment in accuracy and progress.

VentureBeat: So you think that’s your real differentiator?

Pizzaro: We absolutely do. It’s been key to the success of the accuracy of our models.

Liu: Sometimes people overlook the effort they need to spend on the simple tasks, such as labeling. But without accurate data, you’re not going anywhere. You’re not going to build an accurate model.

VentureBeat: So that’s worked well for you. But I know you feel that building this type of one-to-one search capability is more difficult than it sounds. What was the biggest challenge you ran into along the way, and how did you overcome it?

Liu: I think the biggest challenge for this particular capability is that, in general, when you open a browser and do a search, you’re looking at 10 or 20 different links and have to find what you want. We really wanted to build a one-shot journey. If the customer searches and is already happy with what we provided, that’s great. But otherwise, we’d love to get it right with as few inputs as possible. So that’s the challenge: How do you get the model right with very limited input?

VentureBeat: Are you finding any limitations with your current model or approach?

Pizzaro: One of the things we have not done today is create generative models. And so that’s something we know is a technology we’re capable of working with and creating, but it’s not something we feel is in our customers’ best interest at this time. And so we haven’t explored it much in production.

Liu: And another thing I want to add here is that when you talk about limitations of machine learning models, there’s one common limitation, or I would say, an opportunity. How do you keep improving the model? Because as long as it’s a machine learning model, it’s not 100% accurate.

VentureBeat: Let’s talk about the impact. What’s the most significant result you’re seeing?

Pizzaro: Search just launched as a pilot in the U.K., and we’ll be launching later this year in the U.S., but we can speak to how the predictive machine learning capability is working in chat. Over the past six to eight months, we’ve seen our RTS scores, which is essentially a proxy for NPS scores for the bot experience, go up significantly. And so obviously there’s a number of things that we’ve done in order to move some of those results, but we do believe that some of these advanced machine learning models are helping that score.

We’re also seeing higher engagement with the responses that we send back to our customers, which refers to them clicking on a link or the information that we’re providing. It’s greatly improved. Our chat function is a bot-human hybrid, and so we’ve been reducing some of the chat handling time on the agent side. We’ve also seen more fully automated experiences.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

EleutherAI claims new NLP model approaches GPT-3-level performance

Elevate your enterprise data technology and strategy at Transform 2021.


AI-powered language systems have transformative potential, particularly in the enterprise. They’re already being used to drive chatbots, translate natural language into structured query language, create application layouts and spreadsheets, and improve the accuracy of web search products. OpenAI’s GPT-3, which may be the best-known AI text-generator, is currently used in more than 300 apps by tens of thousands of developers and producing 4.5 billion words per day.

As business interest in AI rises, advisory firm Mordor Intelligence forecasts that the natural language processing (NLP) market will more than triple its revenue by 2025. But noncommercial, open source efforts are concurrently gaining steam, as evidenced by the progress made by EleutherAI. A grassroots collection of AI researchers, EleutherAI this week released GPT-J-6B (GPT-J), a model the group claims performs nearly on par with an equivalent-sized GPT-3 model on various tasks.

“We think it’s probably fair to say this is currently the best open source autoregressive language model you can get by a pretty wide margin,” Connor Leahy, one of the founding members of EleutherAI, told VentureBeat.

GPT-J is what’s known as a Transformer model, which means it weighs the influence of different parts of input data rather than treating all the input data the same. Transformers don’t need to process the beginning of a sentence before the end. Instead, they identify the context that confers meaning on a word in the sentence, enabling them to process input data in parallel.

The Transformer architecture forms the backbone of language models that include GPT-3 and Google’s BERT, but EleutherAI claims GPT-J took less time to train compared with other large-scale model developments. The researchers attribute this to the use of Jax, DeepMind’s Python library designed for machine learning research, as well as training on Google’s tensor processing units (TPU), application-specific integrated circuits (ASICs) developed specifically to accelerate AI.

Training GPT-J

EleutherAI says GPT-J contains roughly 6 billion parameters, the parts of the machine learning model learned from historical training data. It was trained over the course of five weeks on 400 billion tokens from a dataset created by EleutherAI called The Pile, an 835GB collection of 22 smaller datasets — including academic sources (e.g., Arxiv, PubMed), communities (StackExchange, Wikipedia), code repositories (Github), and more. (Tokens are a way of separating pieces of text into smaller units in natural language, and they can be words, characters, or parts of words.)

EleutherAI

Above: GPT-J can solve basic math problems.

Image Credit: EleutherAI

For compute, EleutherAI was able to leverage the TPU Research Cloud, a Google Cloud initiative that supports projects with the expectation that the results of the research will be shared via code and models. GPT-J’s code and the trained model are open-sourced under the MIT license and can be used for free via HuggingFace’s Transformers platform or EleutherAI’s website.

GPT-J is more capable than the two previously released EleutherAI models: GPT-Neo 1.3B and GPT-Neo 2.7B. For example, it can perform addition and subtraction and prove simple mathematical theorems, like “Any cyclic group is abelian.” It can also answer quantitative reasoning questions from a popular test dataset (BoolQ) and generate pseudocode.

EleutherAI

Above: GPT-J proving a theorem.

Image Credit: EleutherAI

“[OpenAI’s] GPT-2 was about 1.5 billion parameters and doesn’t have the best performance since it’s a bit old. GPT-Neo was about 2.7 billion parameters but somewhat underperforms equal-sized GPT-3 models. GPT-J, the new one, is now 6B — sized similar to the Curie model of OpenAI, we believe,” Leahy said.

Looking ahead

EleutherAI plans to eventually deliver the code and weights needed to run a model similar, though not identical, to the full “DaVinci” GPT-3. (Weights are parameters within a neural network that transform input data.) Compared with GPT-J, the full GPT-3 contains 175 billion parameters and was trained on 499 billion tokens from a 45TB dataset.

Language models like GPT-3 often amplify biases encoded in data. A portion of the training data is not uncommonly sourced from communities with pervasive gender, race, and religious prejudices. OpenAI notes that this can lead to placing words like “naughty” or “sucked” near female pronouns and “Islam” near words like “terrorism.” Other studies, like one published in April by Intel, MIT, and the Canadian Institute for Advanced Research (CIFAR) researchers, have found high levels of stereotypical bias in some of the most popular models.

EleutherAI

Above: GPT-J answering a word problem.

Image Credit: EleutherAI

But EleutherAI claims to have performed “extensive bias analysis” on The Pile and made “tough editorial decisions” to exclude datasets they felt were “unacceptably negatively biased” toward certain groups or views.

While EleutherAI’s model might not be cutting edge in terms of its capabilities, it could go a long way toward solving a common tech problem: the disconnect between research and engineering teams. As Hugging Face CEO Clément Delangue told VentureBeat in a recent interview, tech giants provide black-box NLP APIs while also releasing open source repositories that can be hard to use or aren’t well-maintained. EleutherAI’s efforts could help enterprises realize the business value of NLP without having to do much of the legwork themselves.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Facebook’s Dynabench now scores NLP models for metrics like ‘fairness’

Elevate your enterprise data technology and strategy at Transform 2021.


Last September, Facebook introduced Dynabench, a platform for AI data collection and benchmarking that uses humans and models “in the loop” to create challenging test datasets. Leveraging a technique called dynamic adversarial data collection, Dynabench measures how easily humans can fool AI, which Facebook believes is a better indicator of a model’s quality than any provided by current benchmarks.

Today, Facebook updated Dynabench with Dynaboard, an evaluation-as-a-service platform for conducting evaluations of natural language processing models on demand. The company claims Dynaboard makes it possible to perform apples-to-apples comparisons of models without problems from bugs in test code, inconsistencies in filtering test data, and other reproducibility issues.

“Importantly, there is no single correct way to rank models in AI research,” Facebook wrote in a blog post. “Since launching Dynabench, we’ve collected over 400,000 examples, and we’ve released two new, challenging datasets. Now we have adversarial benchmarks for all four of our initial official tasks within Dynabench, which initially focus on language understanding … Although other platforms have addressed subsets of current issues, like reproducibility, accessibility, and compatibility, [Dynabench] addresses all of these issues in one single end-to-end solution.”

Dynabench

A number of studies imply that commonly used benchmarks do a poor job of estimating real-world AI performance. One recent report found that 60-70% of answers given by natural language processing (NLP) models were embedded somewhere in the benchmark training sets, indicating that the models were often simply memorizing answers. Another study — a meta analysis of over 3,000 AI papers — found that metrics used to benchmark AI and machine learning models tended to be inconsistent, irregularly tracked, and not particularly informative.

Facebook’s solution to this is what it calls the Dynascore, a metric designed to capture model performance on the axes of accuracy, compute, memory, robustness, and “fairness.” The Dynascore allows AI researchers to tailor an evaluation by placing greater or less emphasis (or weight) on a collection of tests.

As users employ the Dynascore to gauge the performance of models, Dynabench tracks which examples fool the models and lead to incorrect predictions across the core tasks of natural language inference, question answering, sentiment analysis, and hate speech. These examples improve the systems and become part of more challenging datasets that train the next generation of models, which can in turn be benchmarked with Dynabench to create a “virtuous cycle” of research progress.

Crowdsourced annotators connect to Dynabench and receive feedback on a model’s response. This enables them to employ tactics like making the model focus on the wrong word or attempt to answer questions requiring real-world knowledge. All examples on Dynabench are validated by other annotators, and if the annotators don’t agree with the original label, the example is discarded from the test set.

Metrics

In the new Dynascore on Dynabench’s Dynaboard, “accuracy” refers to the number of examples the model got right as a percentage. The exact accuracy metric is task-dependent — while tasks can have multiple accuracy metrics, only one metric decided by the task owners can be used as a part of the ranking function.

Compute, another component of the Dynascore, measures the computational efficiency of an NLP model. To account for computation, Dynascore measures the number of examples the model can process per second on its instance in Facebook’s evaluation cloud.

To calculate memory usage, Dynascore measures the amount of memory a model requires in gigabytes of total memory usage. Memory usage over the duration the model is running is averaged over time, with measurements taken over a set period of seconds.

Facebook Dynaboard

Above: Users can adjust various metrics in Dynaboard to see how these might affect an NLP model’s performance.

Image Credit: Facebook

Dynascore also measures robustness, or typographical errors and local paraphrases a model might make during benchmarking. The platform measures changes after adding “perturbations” to the examples, testing whether, for example, a model can capture that a “baaaad restaurant” is not a good restaurant.

Finally, Facebook claims Dynascore can evaluate a model’s fairness with a test that substitutes, among other things, noun phrase gender (e.g., replacing “sister” with “brother” or “he” with “they”) in datasets and names with others that are statistically predictive of another race or ethnicity. For the purposes of Dynaboard scoring, a model is considered “fair” if its predictions remain stable after these changes.

Facebook admits that this fairness metric isn’t perfect. Replacing “his” with “hers” or “her” might make sense in English, for example, but can sometimes result in contextual mistakes. If Dynaboard were to replace “his” with “her” in the sentence “this cat is his,” the result would be “this cat is her,” which doesn’t maintain the original meaning.

“At the launch of Dynaboard, we’re starting off with an initial metric relevant to NLP tasks that we hope serves as a starting point for collaboration with the broader AI community,” Facebook wrote. “Because the initial metric leaves room for improvement, we hope that the AI community will build on Dynaboard’s … platform and make progress on devising better metrics for specific contexts for evaluating relevant dimensions of fairness in the future.”

Calculating a score

To combine the disparate metrics into a single score that can be used to rank models in Dynabench, Dynaboard finds the “exchange rate” between metrics that can be applied to standardize units across metrics. The platform takes a weighted average to calculate the Dynascore so the models can be dynamically re-ranked in real time as the weights are adjusted.

To compute the rate at which the adjustments are made, Dynaboard uses a formula called the “marginal rate of substitution” (MRS), which in economics is the amount of good a consumer is willing to give up for another good while getting the same utility. Arriving at the default Dynascore involves estimating the average rate at which users are willing to trade off each metric for a one-point gain in performance and using that to convert all metrics into units of performance.

Dynaboard is available for researchers to submit their own model for evaluation via a new command-line interface tool and library called Dynalab. In the future, the company plans to open Dynabench up so anyone can run their own task or models in the loop for data collection while hosting their own dynamic leaderboards.

“The goal of the platform is to help show the world what state-of-the-art NLP models can achieve today, how much work we have yet to do, and in doing so, help bring about the next revolution in AI research,” Facebook continued. “We hope Dynabench will help the AI community build systems that make fewer mistakes, are less subject to potentially harmful biases, and are more useful and beneficial to people in the real world.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Cambridge Quantum pushes into NLP and quantum computing with new head of AI

Join Transform 2021 this July 12-16. Register for the AI event of the year.


Cambridge Quantum Computing (CQC) hiring Stephen Clark as head of AI last week could be a sign the company is boosting research into ways quantum computing could be used for natural language processing.

Quantum computing is still in its infancy but promises such significant results that dozens of companies are pursuing new quantum architectures. Researchers at technology giants such as IBM, Google, and Honeywell are making measured progress on demonstrating quantum supremacy for narrowly defined problems. Quantum computers with 50-100 qubits may be able to perform tasks that surpass the capabilities of today’s classical digital computers, “but noise in quantum gates will limit the size of quantum circuits that can be executed reliably,” California Institute of Technology theoretical physics professor John Preskill wrote in a recent paper. “We may feel confident that quantum technology will have a substantial impact on society in the decades ahead, but we cannot be nearly so confident about the commercial potential of quantum technology in the near term, say the next 5 to 10 years.”

CQC has been selling software focused on specific use cases, such as in cybersecurity and pharmaceutical and drug delivery, as the hardware becomes available. “We are very different from the other quantum software companies that we are aware of, which are primarily focused on consulting-based revenues,” CQC CEO Ilyas Khan told VentureBeat.

For example, amid concerns that improvements in quantum hardware will make it easier to break existing algorithms used in modern cryptography, CQC devised a method to generate quantum-resistant cryptographic keys that cannot be cracked by today’s methods. CQC partners with pharmaceutical and drug discovery companies to develop quantum algorithms for improving material discovery, such as working with Roche on drug development, Total on new materials for carbon capture and storage solutions, and CrownBio for novel cancer treatment biomarker discovery.

Moving into AI

The addition of Clark to CQC’s team signals the company will be shifting some of its research and development efforts toward quantum natural language processing (QNLP). Humans are good at composing meanings, but this process is not well understood. Recent research established that quantum computers, even with their current limitations, could learn to reason with the uncertainty that is part of real-world scenarios.

“We do not know how we compose meaning, and therefore we have not been sure how this process can be carried over to machines/computers,” Khan said.

QNLP could enable grammar-aware representation of language that makes sense of text at a deeper level than is currently available with state-of-the-art NLP algorithms like Bert and GPT 3.0. The company has already demonstrated some early success in representing and processing text using quantum computers, suggesting that QNLP is within reach.

Clark was previously senior staff research scientist at DeepMind and led a team working on grounded language learning in virtual environments. He has a long history with CQC chief scientist Bob Coecke, with whom he collaborated 15 years ago to devise a novel approach for processing language. That research stalled due to the limitations of classical computers. Quantum computing could help address these bottlenecks, and there are plans to continue that research program, Clark said in a statement.

“The methods we developed to demonstrate this could improve a broad range of applications where reasoning in complex systems and quantifying uncertainty are crucial, including medical diagnoses, fault-detection in mission-critical machines, and financial forecasting for investment management,” Khan said.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

NLP Cloud helps app developers add language processing

Join GamesBeat Summit 2021 this April 28-29. Register for a free or VIP pass today.


NLP tools and services are taking off, but developers often struggle with the hurdle of getting NLP models into production. NLP Cloud is a new AI startup focused on lowering the barriers for developers trying to create apps for sorting support tickets, extracting leads, analyzing social networks, and developing tools for economic intelligence.

NLP has been around for decades, but interest has seen a dramatic uptick with the recent introduction of transformers, a new type of neural network. Google researchers demonstrated in 2017 how transformers dramatically improved the speed, performance, and precision of NLP tools. Transformers made possible the much larger models Google’s BERT and OpenAI’s GPT-3. The capabilities are available through innovative open source libraries Hugging Face and spaCy.

Developing accurate models and pushing models into production are two different processes. NLP Cloud intends to close this gap by reducing the barriers to production — providing NLP capabilities via an API, rather than a raw AI model that must be pushed into production. Developers only need to worry about integrating the API into their application.

“Today, the main challenge remaining in NLP projects is clearly the production side,” NLP Cloud CTO and founder Julien Salinas told VentureBeat in an email. New NLP models make it easier for more types of developers to experiment with weaving language capabilities into their projects.

Julian Salina, CTO and founder of NLPCloud.io

Above: Julien Salina, CTO and founder of NLP Cloud

Image Credit: NLPCloud.io

Possible use cases include scanning web pages and other unstructured text and extracting name entities as part of lead generation before conducting sentiment analysis on support tickets and sorting them based on urgency. Content marketers can use the platform to summarize text and generate headlines.

Properly deploying and running AI models in production requires strong DevOps, programming, and AI capabilities. Few developers have mastered all three disciplines, especially within smaller companies. The team may have data science knowledge but not the DevOps capabilities, or software engineers who need to deploy NLP without hiring a data science team.

The company is focusing on making the best available open source models easier to deploy rather than developing its own models. This allows it to focus on improving the developer experience rather than tweaking the underlying models. Salinas said the company selected Hugging Face and spaCy for their respective strengths.

Hugging Face’s transformers are more advanced and accurate than spaCy, Salinas said. Hugging Face is also building a huge open source repository for NLP models, which makes selecting the best model for a given use case more convenient.

SpaCy is faster and less resource-intensive than other NLP libraries. The library has been around longer and recently added the capability to natively support transformer-based models.

In the future, Salinas plans to add conversational models for chatbots, new summarization models that can handle bigger pieces of text, and text generation models. He also hopes to eventually support more languages but believes non-English models still need more work.

Since its launch three months ago, NLP Cloud has been growing rapidly. It currently has around 500 users, 30 of them paid users. While most of the users are startups the company has begun to see some larger customers.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Aunalytics unifies siloed bank customer data with AI-driven data mart and NLP

Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.


Aunalytics announced an update to its Daybreak for Financial Services platform that employs machine learning algorithms to enable midrange banks and credit unions to more easily analyze data.

The latest update adds a data mart that automatically discovers and aggregates customer data residing in siloed lending, mobile banking, automated teller machine (ATM), customer relationship management (CRM), wealth management, and trust applications. The platform has also added support for a natural language processing (NLP) engine that eliminates the need to know SQL to query data. Companies can automatically create visualizations of those query results as well.

Finally, Aunalytics made it simpler to access external data via connectors and added a “smart features” capability that will, for example, automatically generate alerts anytime a customer’s credit score changes.

Midrange banks and credit unions are at a distinct AI disadvantage compared to larger financial services rivals that can afford to build their own AI models with specialists who know how to program in Python or R programming languages, Aunalytics president Rich Carlton said. “They can’t afford to hire a team of data scientists,” he added.

Aunalytics is making a case for a platform that automates low-level data science tasks in a way that enables either end users or a small team of data scientists to maximize the value of the data any midrange bank or credit union routinely collects, Carlton said. The Daybreak for Financial Services platform is based on cloud-native technologies such as Hadoop, containers, and Kubernetes clusters that he said enable it to be deployed in the cloud or an on-premises IT environment.

Midrange financial services providers have realized they are losing touch with customers in the wake of the COVID-19 pandemic as they rely more on digital services. The number of banking customers that visit their local bank has sharply declined as reliance on web and mobile applications increases. The challenge midrange financial services face today is that they already rely on a disjointed suite of applications to manage their business. Mobile applications in particular have added yet another silo that makes it difficult for financial services providers to correlate customer activity across a portfolio of services.

Awareness of AI and data science has never been higher. The issue organizations are trying to come to terms with is to what degree they are now at a competitive disadvantage because they lack these capabilities. Platforms and applications that embed AI capabilities may provide a way to close that gap at a time when many smaller financial services firms need to operate as efficiently as possible just to stay afloat.

As data science and AI continue to evolve, organizations will soon need to decide when it makes sense to employ advanced analytics that are baked into a platform such as Daybreak for Financial Services versus building and maintaining their own AI models. Given the general shortage of data science professionals, it’s especially difficult for smaller organizations to hire and retain in-house talent.

At the same time, it usually takes a data science team several months to successfully deploy an AI model in a production environment. Providers of applications and platforms may very well have added similar capabilities to their offerings before that custom AI project ever comes to fruition. In many cases, organizations will find they are gaining access to advanced analytics capabilities at no extra cost as new updates are made available under a subscription license.

Most end users, of course, are a lot more interested in the business outcomes AI models and data science enable than they are in the processes employed to build them. The fact that an independent provider of a platform or application is willing to vouch for the accuracy of those AI models adds yet another perceived level of comfort.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link