Categories
AI

Google is taking sign-ups for Relate, a voice assistant that recognizes impaired speech

Google launched a beta app today that people with speech impairments can use as a voice assistant while contributing to a multiyear research effort to improve Google’s speech recognition. The goal is to make Google Assistant, as well as other features that use speech to text and speech to speech, more inclusive of users with neurological conditions that affect their speech.

The new app is called Project Relate, and volunteers can sign up at g.co/ProjectRelate. To be eligible to participate, volunteers need to be 18 or older and “have difficulty being understood by others.” They’ll also need a Google account and an Android phone using OS 8 or later. For now, it’s only available to English speakers in the US, Canada, Australia, and New Zealand. They’ll be tasked with recording 500 phrases, which should take between 30 to 90 minutes to record.

After sharing their voice samples, volunteers will get access to three new features on the Relate App. It can transcribe their speech in real time. It also has a feature called “Repeat” that will restate what the user said in “a clear, synthesized voice.” That can help people with speech impairments when having conversations or when using voice commands for home assistant devices. The Relate App also connects to Google Assistant to help users turn on the lights or play a song with their voices.

Without enough training data, other Google apps like Translate and Assistant haven’t been very accessible for people with conditions like ALS, traumatic brain injury (TBI), or Parkinson’s disease. In 2019, Google started Project Euphonia, a broad effort to improve its AI algorithms by collecting data from people with impaired speech. Google is also training its algorithms to recognize sounds and gestures so that it can better help people who cannot speak. That work is still ongoing; Google and its partners still appear to be collecting patients’ voices separately for Project Euphonia.

“I’m used to the look on people’s faces when they can’t understand what I’ve said,” Aubrie Lee, a brand manager at Google whose speech is affected by muscular dystrophy, said in a blog post today. “Project Relate can make the difference between a look of confusion and a friendly laugh of recognition.”

Repost: Original Source and Author Link

Categories
AI

AI-powered voice transcription startup Verbit secures $250M

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more


Verbit, a startup developing an AI-powered transcription platform, today announced that it secured $250 million, bringing its total capital raised to $550 million. The round — a series E, made up of a $150 million primary investment and $100 million in secondary transactions — was led by Third Point Ventures with participation from Sapphire Ventures, More Capital, Disruptive AI, Vertex Growth, 40North, Samsung Next, and TCP.

With the fresh capital, Verbit, which is now valued at $2 billion, plans to expand its workforce while supporting product research and development as well as customer acquisition efforts. Beyond this, CEO Tom Livne said that Verbit will pursue further mergers and acquisitions and “provide enhanced value” to its media, education, corporate, legal, and government clients.

During the pandemic, enterprises ramped up their adoption of voice technologies, including transcription, as remote videoconferencing became the norm. In a survey from Speechmatics, a little over two-thirds of companies said that they now have a voice technology strategy. While they cited accuracy and privacy as concerns, 60% without a strategy said that they’d consider one within five years — potentially driving the speech and voice recognition market to $22 billion in value by 2022.

Livne cofounded New York-based Verbit with Eric Shellef and Kobi Ben Tzvi in 2017. Shellef previously led speech recognition at Intel’s wearables group, while Tzvi cofounded and served as CTO at facial recognition startup Foresight Solutions. As for Livne, who’s also a member of Verbit’s board, he was an early investor in counter-drone platform Convexum, which was acquired by NSO Group in 2020 for $60 million.

AI-powered transcription

Verbit’s voice transcription and captioning services aren’t novel — well-established players like Nuance, Cisco, Otter, Voicera, Microsoft, Amazon, and Google have offered rival products for years, including enterprise-focused platforms like Microsoft 365. But Verbit’s adaptive speech recognition tech can generate transcriptions that it claims achieve higher accuracy than its rivals.

Verbit users upload audio or video to a dashboard for AI-powered processing. Then, a team edits and reviews the material — taking into account customer-supplied notes and guidelines.

Finished transcriptions from Verbit are available for export to services like Blackboard, Vimeo, YouTube, Canvas, and Brightcove. A web frontend shows the progress of jobs and lets users edit and share files or define the access permissions for each, plus add inline comments, requesting reviews, or viewing usage reports.

“Verbit’s in-house AI technology detects domain-specific terms, filters out background noise and echoes, and transcribes speakers regardless of accent to generate … transcripts and captions from both live and recorded video and audio. Acoustic, linguistic, and contextual data is … checked by our transcribers, who [incorporate] customer-supplied notes, guidelines, specific industry terms, and requirements,” Livne told VentureBeat via email. “By indexing video content for web searches, Verbit [can help] companies improve SEO and increase their site traffic. [In addition, the platform can] provide audio visual translation to help global businesses with translations and to reach international audiences with their products and offerings.”

The transcriber experience

Like its competition, Verbit relies on an army of crowdworkers to transcribe files. The company’s roughly 35,000 freelancers and 600 professional captioners are paid in one of two ways, per audio minute or word. While Verbit doesn’t post rates on its website, a source pegs transcription pay at $0.30 per audio minute. Two years ago, transcription service Rev faced a massive backlash when it slashed minimum rates for its transcribers from $0.45 to $0.30 per word transcribed.

In some cases, pay can dip below $0.30 on Verbit, according to employee reviews on Indeed. The company reportedly started paying as low as around $0.24 cents per audio minute last year for a standard job.

Transcription platforms also don’t always have the technology in place to prevent crowdworkers from seeing disturbing content. In a piece by The Verge, crowdworkers on Rev said that they were exposed to graphic or troubling material on multiple occasions with no warning, including violent police recordings, descriptions of child abuse, and graphic medical videos.

A spokesperson told VentureBeat via email: “Currently, we employ a mix of full-time transcribers and captioners, as well as freelancers that are paid per audio minute. We’ve established a ranking system based on efficiency and accuracy to incentivize and reward freelancers with higher compensation in exchange for consistently delivering high-quality transcripts … The company’s transcribers have a support system — chat and forum — that constantly relays feedback to Verbit management, and it has a bonus program to ensure proper compensation for its top performers.”

The spokesperson continued: “In addition to competitive pay and opportunities for advancement, our staff of full-time transcribers and captioners are eligible to receive healthcare benefits … Our transcriber community follows a ranking system based on tenure and number of hours worked, allowing freelancers to earn promotions to roles such as editor, reviewer, and supervisor.”

On the subject of graphic content, the spokesperson said: “Verbit does not take on any business associated with violent or graphic content. For example, an adult entertainment company recently requested our services, but we chose not to accept them as a customer.”

Growth year

Verbit’s platform has wooed a healthy base of over 2,000 customers, bolstered by its acquisition of captioning provider VITAC earlier this year. In recent months, Verbit has pursued contracts with educational institutions like Harvard and Stanford, which have stricter accommodation standards than organizations in other sectors.

Auto captioning technologies on YouTube, Microsoft Teams, Google Meet, and like platforms aren’t beholden to the accommodations standards outlined in the Americans with Disabilities Act. In contrast, captioning must satisfy certain accuracy criteria in order to meet federal guidelines. A recent survey conducted by Verbit found that only 14% of schools provided captions as a default, while about 10% said that they only caption lessons when a student requests it.

Verbit also says that it’ll continue to explore verticals in the insurance, financial, media, and medical industries. The company — which currently has 470 employees, a number that it expects will grow to 750 by 2023 — recently launched a human-in-the-loop transcription service for media outlets and inked an agreement with the nonprofit Speech to Text Institute to invest in court reporting and legal transcription.

“With six times year-over-year revenue growth and close to $100 million in annual recurring revenue, Verbit continues to expand into new verticals at a hyper-growth pace. The shift to remote work and accelerated digitization amid the pandemic has been a major catalyst … and has further driven Verbit’s rapid growth,” Livne added. “In today’s digital era where audio and video content is a given, and many times the main method of conveying information, these AI tools are crucial to ensure that individuals and organizations of all sizes and forms can engage with their audiences and stakeholders more efficiently and effectively.”

Livne previously said that Verbit plans to file for an initial public offering in 2022.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

New Anthony Bourdain documentary deepfakes his voice

In a new documentary, Roadrunner, about the life and tragic death of Anthony Bourdain, there are a few lines of dialogue in Bourdain’s voice that he might not have ever said out loud.

Filmmaker Morgan Neville used AI technology to digitally re-create Anthony Bourdain’s voice and have the software synthesize the audio of three quotes from the late chef and television host, Neville told the New Yorker.

The deepfaked voice was discovered when the New Yorker’s Helen Rosner asked how the filmmaker got a clip of Bourdain’s voice reading an email he had sent to a friend. Neville said he had contacted an AI company and supplied it with a dozen hours of Bourdain speaking.

“ … and my life is sort of shit now. You are successful, and I am successful, and I’m wondering: Are you happy?” Bourdain wrote in an email, and an AI algorithm later narrated an approximation of his voice.

You can hear the line in the documentary’s trailer linked below, right around the 1:30 mark. The algorithm’s generation of Bourdain’s voice is especially audible when it says, “and I am successful.”

Neville told Rosner that there were three lines of dialogue that he wanted Bourdain’s voice to orate, but he couldn’t find previous audio to string together or make it work otherwise.

There’s no shortage of companies that can achieve this kind of AI voice replication, and there’s actually a burgeoning industry of companies that can specifically generate voices for video game characters or allow you to clone your own voice.

But whether it’s ethical to clone a dead person’s voice and have them say things they hadn’t gotten on tape when they were alive is another question, and one Neville doesn’t seem too concerned with.

“We can have a documentary-ethics panel about it later,” he told the New Yorker.

Repost: Original Source and Author Link

Categories
Game

‘Battlefield 2042’ won’t have voice chat when it debuts on November 19th

When  comes out in about a week on , it won’t ship with built-in voice chat. Series developer DICE told the feature won’t be available until sometime after launch. The studio didn’t provide a reason for the decision. Whatever the case, the absence of voice chat will likely be keenly felt by Battlefield fans, especially on PC, Xbox Series X and S, and PlayStation 5 where matches will include as many as 128 players.

For what it’s worth, you can still use Discord or the party chat feature on your console to communicate with friends, but that won’t help you when you’re trying to play with strangers. In those instances, you’ll need to rely on the ping system, which is apparently on the cumbersome side. According to Polygon, you have to navigate through multiple menus before you can get to the right ping. All told, it sounds like Battlefield’s already chaotic matches will be a tad more unpredictable in the first few weeks that 2042 is available.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.

Repost: Original Source and Author Link

Categories
AI

Microsoft updates Dynamics 365 Customer Service with first-party voice channel

At its Ignite developers event today, Microsoft announced the addition of a first-party voice channel to Dynamics 365 Customer Service, its end-to-end cloud product offering for customer support. According to the company, the new capabilities enable organizations to provide more consistent and personalized service to customers across channels with data-driven, AI-infused solutions.

“Service leaders know that 80% of consumers are more likely to purchase from companies that provide more personalized experiences. But for many contact centers, ensuring a continuous, personalized experience across all channels is difficult to achieve. Multiple tools and disconnected data silos prevent agents from having a complete view of the customer journey. But no more. No matter how your customers connect with you, now you can deliver a consistent, intelligent, and personalized service experience,” Dynamics 365 customer service and field service VP Jeff Comstock said in a statement.

AI-powered features

Prior to today’s upgrade, Dynamics 365 Customer Service provided case routing and management for customer service agents and add-ons for insights and omnichannel engagement, as well as authoring tools for knowledge base articles. With the addition of the voice channel, Power Virtual Agent chatbots can now be used as an interactive voice response or for responding to SMS, chat and social messaging channels. Dynamics 365 Customer Service affords AI-based routing of incoming calls to the voice agents, consistent with other support channels. And Microsoft Teams is integrated, allowing agents to collaborate with each other and with subject-matter experts on particular customer topics.

“AI is infused throughout our first-party voice channel to enrich the customer and agent experience by automating routine tasks and offering insights and recommendations to increase the agent’s focus on the customer,” Comstock continued. “Dynamics 365 Customer Service breaks down traditional data silos between channels with a single, secure data platform, elegantly connecting customer conversations across all channels.”

The updated Dynamics 365 Customer Service offers real-time transcription and live sentiment analysis in addition to AI-driven recommendations for similar cases and knowledge articles. Transcripts can be translated in real time for agents assisting customers in different regions and across multiple languages, while AI analyzes conversations, identifying emerging issues and generating KPIs and insights that span live chat, social messaging, and voice.

“With the new voice channel, we are delivering an all-in-one digital contact center solution that brings together contact center channels, unified communications, leading AI, and customer service capabilities together into a single, software-as-a-service solution, built on the Microsoft Cloud,” Comstock said. “And, when it comes to [businesses, they] have a choice. We continue to support integrations with key partners such as Five9, Genesys, NICE, Solgari, Tenfold, Vonage, and others who are building connectors to enable their voice solutions within Dynamics 365 Customer Service.”

The enhancements come roughly a year after Microsoft launched Azure Communication Services, a service that leverages the same network powering Teams to let developers add multimodal messaging to apps and websites while tapping into services like Azure Cognitive Services for translation, sentiment analysis, and more. The pandemic has accelerated the demand for distributed contact center setups, particularly those powered by AI — according to a 2020 report by Grand View Research, the contact center software market is anticipated to grow to $72.3 billion by 2027.

Amazon recently launched an AI-powered contact center product — Contact Lens — in general availability alongside several third-party solutions. And Google continues to expand Contact Center AI, which automatically responds to customer queries and hands them off to a person when necessary.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Veritone launches new platform to let celebrities and influencers clone their voice with AI

Recording advertisements and product endorsements can be lucrative work for celebrities and influencers. But is it too much like hard work? That’s what US firm Veritone is betting. Today, the company is launching a new platform called Marvel.AI that will let creators, media figures, and others generate deepfake clones of their voice to license as they wish.

“People want to do these deals but they don’t have enough time to go into a studio and produce the content,” Veritone president Ryan Steelberg tells The Verge. “Digital influencers, athletes, celebrities, and actors: this is a huge asset that’s part of their brand.”

With Marvel.AI, he says, anyone can create a realistic copy of their voice and deploy it as they see fit. While celebrity Y is sleeping, their voice might be out and about, recording radio spots, reading audiobooks, and much more. Steelberg says the platform will even be able to resurrect the voices of the dead, using archive recordings to train AI models.

“Whoever has the copyright to those voices, we will work with them to bring them to the marketplace,” he says. “That will be up to the rightsholder and what they feel is appropriate, but hypothetically, yes, you could have Walter Cronkite reading the nightly news again.”

Speech synthesis has improved rapidly in recent years, with machine learning techniques enabling the creation of ever-more realistic voices. (Just think about the difference between how Apple’s Siri sounded when it launched in 2011 and how it sounds now.) Many big tech firms like Amazon offer off-the-shelf text-to-speech models that generate voices at scale that are robotic but not unpleasant. But new companies are also making boutique voice clones that sound like specific individuals, and the results can be near-indistinguishable from the real thing. Just listen to this voice clone of podcaster Joe Rogan, for example.

It’s this leap forward in quality that motivated Veritone to create Marvel.AI, says Steelberg, as well as the potential for synthetic speech to dovetail with the firm’s existing businesses.

Although Veritone markets itself as an “AI company,” a big part of its revenue apparently comes from old-school advertising and content licensing. As Steelberg explains, its advertising subsidiary Veritone One is heavily invested in the podcast space, and every month places more than 75,000 “ad integrations” with influencers. “It’s mostly native integrations, like product placements,” he says. “It’s getting the talent to voice sponsorships and commercials. That’s extremely effective but very expensive and time consuming.”

Another part of the firm, Veritone Licensing, licenses video from a number of major archives. These include archives owned by broadcasters like CBS and CNN and sports organizations like the NCAA and US Open. “When you see the Apollo moon landing footage show up in movies, or Tiger words content in a Nike commercial, all that’s coming through Veritone,” says Steelberg. It’s this experience with licensing and advertising that will give Veritone the edge over AI startups focusing purely on technology, he says.

To customers, Marvel.AI will offer two streams. One will be a self-service model, where anyone can pick from a catalog of pre-generated voices and create speech on demand. (This is how Amazon, Microsoft, et al. have been doing it for years.) But the other stream — “the focus,” says Steelberg — will be a “managed, white-glove approach,” where customers submit training data, and Veritone will create a voice clone just for them. The resulting models will be stored on Veritone’s systems and available to generate audio as and when the client wants. Marvel.AI will also function as a marketplace, allowing potential buyers to submit requests to use these voices. (How all this will be priced isn’t yet clear.)

Steelberg is convincing that the demand for these voices exists and that Veritone’s business model is ready to go. But one major factor will decide whether Marvel.AI succeeds: the quality of the AI voices the platform can generate. And this is much less certain.

When asked for examples of the company’s work, Veritone shared three short clips with The Verge, each a single line endorsement for a brand of mints. The first line is read by Steelberg himself, the second by his AI clone, and the third by a gender-swapped voice. You can listen to all three below:

The AI clone is, to my ear at least, a pretty good imitation, though not a perfect copy. It’s flatter and more clipped than the real thing. But it’s also not a full demonstration of what voices can do during an endorsement. Steelberg’s delivery lacks the enthusiasm and verve you’d expect of a real ad (we’re not faulting him for this — he’s an executive, not a voice actor), and so it’s not clear if Veritone’s voice models can capture a full range of emotion.

It’s also not a great sign that the voiceover for the platform’s sizzle reel (embedded at the top of the story) was done by Steelberg himself rather than an AI copy. Either the company didn’t think a voice clone was good enough for the job, or it ran out of time to generate one — either way, it’s not a great endorsement of the product.

The technology is moving quickly, though, and Steelberg is keen to stress that Veritone has the resources and expertise to adopt whatever new machine learning models emerge in the years to come. Where it’s going to differentiate itself, he says, is managing the experience of customers and clients into actually deploying synthetic speech at scale.

One problem Steelberg offers is how synthetic speech might dilute the power of endorsements. After all, the attraction of product endorsement hinges on the belief (however delusional) that this or that celebrity really does enjoy this particular brand of cereal / toothpaste / life insurance. If the celeb can’t be bothered to voice the endorsement themselves, doesn’t it take away from the ad’s selling power?

Steelberg’s solution is to create an industry standard for disclosure — some sort of audible tone that plays before synthetic speech to a) let listeners know it’s not a real voice, and b) reassure them that the voice’s owner endorses this use. “It’s not just about avoiding the negative connotations of tricking the consumer, but also wanting them to be confident that [this or that celebrity] really approved this synthetic content,” he says.

It’s questions like these that are going to be increasingly important as synthetic content becomes more common, and it’s clear Veritone has been thinking hard about these issues. Now the company just needs to convince the influencers, athletes, actors, podcasters, and celebrities of the world to lend it their voices.

Repost: Original Source and Author Link

Categories
Security

Facebook Messenger is adding end-to-end encryption for voice and video calls

Facebook is adding end-to-end encryption for voice and video calls in Messenger. The company announced in a blog post that it’s rolling out the change today alongside new controls for its disappearing messages. Some users may also see new test features related to encryption.

Facebook Messenger got end-to-end encryption for text messages in 2016, when Facebook added a “secret conversation” option to its app. Now, that mode also supports calling. Facebook says it’s adding the feature as interest in voice and video calls grows, saying Messenger now sees more than 150 million video calls a day.

Encrypted video calling on Messenger.

Encrypted video calling on Messenger
Image: Facebook

Facebook chat app WhatsApp already offered calling with end-to-end encryption or E2EE, which prevents anyone but a sender and receiver from seeing the encrypted data. So do some other video calling apps like Zoom, Signal, and Apple’s FaceTime. Facebook characterizes E2EE as “becoming the industry standard” across messaging services. Earlier rumors have suggested that Facebook might roll out a unified, end-to-end encrypted messaging system across WhatsApp, Messenger, and Instagram — but so far, that hasn’t happened.

Text conversations are getting a smaller update. If you’re setting a message to disappear, you’ll see more options for picking when it expired, from between five seconds and 24 hours. (It originally offered one-minute, 15-minute, one-hour, four-hour, and 24-hour increments.)

While everyone will see the updates above, Facebook is running a limited beta test of other features. Some users will see an option for end-to-end encrypted group chats and calls between “friends and family that already have an existing chat thread or are already connected.” Others will get support for Facebook’s existing non-E2EE controls over who can reach them on Messenger. And finally, if you use Instagram, a “limited test” will offer opt-in E2EE for that app’s direct messages as well.

Repost: Original Source and Author Link

Categories
AI

Juniper projects voice assistants will play a bigger role in shopping

All the sessions from Transform 2021 are available on-demand now. Watch now.


Customers are increasingly making purchases through voice assistants — to the tune of $19.4 billion by 2023, according to projections. A new study from Juniper Research finds that ecommerce transactions will increase from $4.6 billion this year as voice assistant devices with screens improve the efficiency of the checkout process.

“[G]rowing the size and accessibility of the content domain libraries will be critical to increasing the number of transactions processed by voice assistant services. In turn, this will increase the value proposition of voice commerce to third-party retailers and generate new revenue streams for voice assistant platforms,” the report reads.

Juniper expects that the global install base of smart speakers will rise by over 50% between 2021 and 2023. Similarly, Statista expects global smart speaker revenue will see an uptick, reaching $35.5 billion by 2025. While smartphone-based assistants will remain dominant in terms of usage, the rising number of standalone smart speakers means the potential for commerce will grow — supporting the adoption of new monetization strategies.

In 2020, 23 million consumers used voice assistants to make purchases, according to a survey by the publication PYMNTS and Visa. That was a 45% increase from 2018 and an 8% gain since 2019.

The pandemic and desire for contactless shopping options have fueled a rise in ecommerce and accelerated the shift toward omnichannel retail experiences. According to Opus Research, retailers are increasingly installing voice-enabled kiosks and contact centers. And 73% of retail respondents to the survey considered search via voice to be a top benefit of voice assistants.

According to a study by Gartner, brands that redesign their websites to support voice search stand to increase their digital commerce revenue by 30%. Forty-one percent of consumers would prefer a voice assistant over a website or app while shopping online, Capgemini reports. And the Opus study found that online shoppers who use voice spend $136 more on average than those who shop solely online.

Barriers to adoption

Some experts disagree with the findings in the Juniper report, disputing the notion that voice will become a major ecommerce revenue stream. A recent eMarketer study, for example, found that more than half of U.S. adults have never shopped for goods via voice and have no interest in trying voice shopping.

Even the Juniper report encourages leaders in the voice assistant space — particularly Amazon, Apple, and Google — to open up their ecommerce services to third-party retailers, in addition to leveraging their own ecosystems. A key hurdle to attracting third-party retailers is the absence of a screen in many smart speakers, according to Juniper, which limits the contextual information that can be presented to users.

The report also recommends implementing omnichannel retail strategies, where users’ interactions are managed across multiple channels, to enable retailers to display detailed info on a product. “Users will generally use voice assistants to initially explore a product, before completing the purchase via a device with a screen,” Meike Escherich, one of the report’s coauthors, said in a statement. “Voice assistant platforms must ensure that the user experience is so seamless that transactions are carried out via these platforms, rather than requiring additional devices.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

How voice biometrics can protect your customers from fraud

All the sessions from Transform 2021 are available on-demand now. Watch now.


Voice identity verification is catching on, especially in finance. Talking is convenient, particularly for users already familiar with voice technologies like Siri and Alexa. Voice identification offers a level of security that PIN codes and passwords can’t, according to experts from two leading companies innovating in the voice biometrics space.

In a conversation at VentureBeat’s Transform 2021 virtual conference, Daniel Thornhill, senior VP at cybersecurity solutions company Validsoft, and Paul Magee, president of voice biometrics company Auraya, discussed the emerging field with Richard Dumas, Five9 VP of marketing.

Passive vs. active voice biometrics

Just like a fingerprint, an iris, or a face, voice biometrics are unique to an individual. To create a voiceprint, a speaker provides a sample of their voice.

“When you want to verify your identity, you use another sample of your voice to compare it to that initial sample,” Magee explained. “It’s as simple as that.”

What sets it apart from other biometrics is that every time someone speaks when prompted, the voiceprint is unique, Magee said. “Nobody can steal my voice because you can’t steal what I’m going to say next.”

When users are prompted to say their phone or account numbers or digits displayed on the screen, that’s active biometrics.

“Passive is more in the background,” Magee said. “So while I’m talking with the call center agent, my voice is being sampled and the agent is being provided with a confirmation that it really is me.”

Voice identity biometrics security

An organization responsible for the voice biometrics can store it with a trusted service provider, Magee said. “The last thing that we advocate is for the voiceprints to be flying around into some unknown place with limited security,” he added. “We think they should be locked up securely behind the clients’ firewall, like [companies] protect the rest of their clients’ information.”

Cheating in the voice identification system

Thornhill described how the system can be cheated: Someone can record a user and replay that audio, or someone can use a computer to generate synthetic versions of people’s voices, also known as deep fakes.

But there are ways to prevent such fraud. “You can apply some kind of [live element], so maybe a random element of the phrase, or use passive voice biometrics so the user is continuously speaking,” Thornhill explained.

There’s also technology that looks at anomalies in speech. “Does this look like it’s being recorded and replayed? Does it look like it’s been synthetically produced or modified by a machine?” Thornhill said. “So there are ways that fraudsters can potentially try to subvert the system, but we do have measures in place that detect those and prevent them.”

Industry-wide voice identification adoption

The greatest barrier to a successful biometric deployment is getting people to enroll their voice, Magee said. That’s why companies should avoid a one-size-fits-all approach.

If a customer often contacts a call center for their needs, that’s the best way to enroll them, Magee said. If they usually use an app, present them with the invitation there. A great time to enroll in a voiceprint is while customers enter their account details during onboarding.

Thornhill agreed. “It’s about understanding your client’s needs, their interactions with their customers, to help them get those enrollments up and help them achieve return on investment,” he said. “They’ll benefit from it, whether it’s from fraud reduction or customer experience.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Speechmatics: Voice Technology Is Becoming a Critical Part of the Enterprise’s Toolkit

All the sessions from Transform 2021 are available on-demand now. Watch now.


Recently, voice technology has surged in adoption among enterprises, with 68% of companies reporting they have a voice technology strategy in place, an 18% increase from 2019. And among the companies that don’t, 60% plan to in the next 5 years.

The pandemic forever altered enterprises’ tech stack. Many companies already had countless pieces of software in place – from web conferencing to collaboration tools – that made the transition from face-to-face to remote a bit more seamless, but the pandemic spurred the rapid growth of other technologies in both importance and popularity throughout 2020. Voice technology, specifically, experienced a marked increase in adoption among enterprises, with 68% of respondents reporting their company has a voice technology strategy – up 18% from 2019.

The pandemic has shown that the organizations that already integrated voice technology into their tech stacks had the ability to scale, pivot, adapt, and operate with the robustness to deal with unexpected changes. But barriers to adoption are still rampant, as outlined in Speechmatics’ annual Trends and Predictions for Voice Technology Report.

The biggest challenges with voice technology currently are accuracy (73%) and accent or dialect-related issues (51%). Traditionally in speech recognition, the engine is trained to recognize one dialect of a language at a time, making that dialect the one it most accurately recognizes, comprehends and, for speech-to-text, also transcribes. In English, it’s American English, and error rates have typically been higher for Australian accents, British accents, Jamaican accents, and so forth. For companies leveraging the technology to interact with a global customer base, this presents a massive challenge. In order for speech technology to reach its highest potential, it needs to understand everyone it’s interacting with.

But, providers are making massive strides in this area to make sure that voice technology is applicable and useful for all end-users, and organizations are starting to understand what that future potential will be. Of the respondents who have not yet put in place a strategy for voice technology, 60% reported it’s something they’ll consider in the next 5 years. With the technology behind voice technology evolving, and new innovations making it increasingly more reliable and accurate, it’s certain to be a crucial part of any enterprises’ tech stack.

The pandemic might have accelerated existing digital transformation, but it also showed organizations the acute importance of the new, must-have tools for the tech stack. There’s no doubt that in 2021 and beyond, voice technology will be a crucial one.

Methodology

Speechmatics collated data points from Owners/Executives/C-Level, Senior Management, Middle Management, Intermediate and Entry Level professionals from a range of industries and use cases in the UK, Europe, United States, Asia and Australia.

Read the full report from Speechmatics.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link