Categories
AI

Artificial intelligence might eventually write this article

I hope my headline is an overstatement, purely for job purposes, but in this week’s Vergecast artificial intelligence episode, we explore the world of large language models and how they might be used to produce AI-generated text in the future. Maybe it’ll give writers ideas for the next major franchise series, or write full blog posts, or, at the very least, fill up websites with copy that’s too arduous for humans to do.

Among the people we speak to is Nick Walton, the cofounder and CEO of Latitude, which makes the game AI Dungeon, which creates a plot in the game around what you put into it. (That’s how Walton ended up in a band of traveling goblins — you’ll just have to listen to understand how that makes sense!) We also chat with Samanyou Garg, founder of Writesonic, a company that offers various writing tools powered by AI. The company can even have AI write a blog post — I’m shaking! But really.

Anyway, toward the end of the episode, I chat with James Vincent, The Verge’s AI and machine learning senior reporter, who calms me down and helps me understand what the future of text-generation AI might be. He’s great. Check out the episode above, and make sure you subscribe to the Vergecast feed for one more episode of this AI miniseries, as well as the regular show. See you there!

Repost: Original Source and Author Link

Categories
AI

Meta announces plans to build an AI-powered ‘universal speech translator’

Meta, the owner of Facebook, Instagram, and WhatsApp, has announced an ambitious new AI research project to create translation software that works for “everyone in the world.” The project was announced as part of an event focusing on the broad range of benefits Meta believes AI can offer the company’s metaverse plans.

“The ability to communicate with anyone in any language — that’s a superpower people have dreamed of forever, and AI is going to deliver that within our lifetimes,” said Meta CEO Mark Zuckerberg in an online presentation.

The company says that although commonly spoken languages like English, Mandarin, and Spanish are well catered to by current translation tools, roughly 20 percent of the world’s population do not speak languages covered by these systems. Often, these under-served languages do not have easily accessible corpuses of written text that are needed to train AI systems or sometimes have no standardized writing system at all.

Meta says it wants to overcome these challenges by deploying new machine learning techniques in two specific areas. The first focus, dubbed No Language Left Behind, will concentrate on building AI models that can learn to translate language using fewer training examples. The second, Universal Speech Translator, will aim to build systems that directly translate speech in real-time from one language to another without the need for a written component to serve as an intermediary (a common technique for many translation apps).

In a blog post announcing the news, Meta researchers did not offer a timeframe for completing these projects or even a roadmap for major milestones in reaching their goal. Instead, the company stressed the utopian possibilities of universal language translation.

“Eliminating language barriers would be profound, making it possible for billions of people to access information online in their native or preferred language,” they write. “Advances in [machine translation] won’t just help those people who don’t speak one of the languages that dominates the internet today; they’ll also fundamentally change the way people in the world connect and share ideas.”

Crucially, Meta also envisions that such technology would hugely benefit its globe-spanning products — furthering their reach and turning them into essential communication tools for millions. The blog post notes that universal translation software would be a killer app for future wearable devices like AR glasses (which Meta is building) and would also break down boundaries in “immersive” VR and AR reality spaces (which Meta is also building). In other words, though developing universal translation tools may have humanitarian benefits, it also makes good business sense for a company like Meta.

It’s certainly true that advances in machine learning in recent years have hugely improved the speed and accuracy of machine translation. A number of big tech companies, from Google to Apple, now offer users free AI translation tools, used for work and tourism, and undoubtedly provide incalculable benefits around the world. But the underlying technology has its problems, too, with critics noting that machine translation misses nuances critical for human speakers, injects gendered bias into its outputs, and is capable of throwing up those weird, unexpected errors only a computer can. Some speakers of uncommon languages also say they fear losing hold of their speech and culture if the ability to translate their words is controlled solely by big tech.

Considering such errors is critical when massive platforms like Facebook and Instagram apply such translations automatically. Consider, for example, a case from 2017 when a Palestinian man was arrested by Israeli police after Facebook’s machine translation software mistranslated a post he shared. The man wrote “good morning” in Arabic, but Facebook translated this as “hurt them” in English and “attack them” in Hebrew.

And while Meta has long aspired to global access, the company’s own products remain biased towards countries that provide the bulk of its revenue. Internal documents published as part of the Facebook Papers revealed how the company struggles to moderate hate speech and abuse in languages other than English. These blind spots can have incredibly deadly consequences, as when the company failed to tackle misinformation and hate speech in Myanmar prior to the Rohingya genocide. And similar cases involving questionable translations occupy Facebook’s Oversight Board to this day.

So while a universal translator is an incredible aspiration, Meta will need to prove not only that its technology is equal to the task but that, as a company, it can apply its research fairly.

Repost: Original Source and Author Link

Categories
AI

Everyone will be able to clone their voice in the future

Cloning your voice using artificial intelligence is simultaneously tedious and simple: hallmarks of a technology that’s just about mature and ready to go public.

All you need to do is talk into a microphone for 30 minutes or so, reading a script as carefully as you can (in my case: the voiceover from a David Attenborough documentary). After starting and stopping dozens of times to re-record your flubs and mumbles, you’ll send off the resulting audio files to be processed and, in a few hours’ time, be told that a copy of your voice is ready and waiting. Then, you can type anything you want into a chatbox, and your AI clone will say it back to you, with the resulting audio realistic to fool even friends and family — at least for a few moments. The fact that such a service even exists may be news to many, and I don’t believe we’ve begun to fully consider the impact easy access to this technology will have.

The work of speech synthesis has improved massively in recent years, thanks to advances in machine learning. Previously, the most realistic synthetic voices were created by recording audio of a human voice actor, cutting up their speech into component sounds, and splicing these back together like letters in a ransom note to form new words. Now, neural networks can be trained on unsorted data of their target voice to generate raw audio of someone speaking from scratch. The end results are faster, easier, and more realistic to boot. The quality is definitely not perfect when rolling straight out the machine (though manual tweaking can improve this), but they’re only going to get better in the near future.

There’s no special sauce to making these clones, which means dozens of startups are already offering similar services. Just Google “AI voice synthesis” or “AI voice deepfakes,” and you’ll see how commonplace the technology is, available from specialist shops that only focus on speech synthesis, like Resemble.AI and Respeecher, and also integrated into companies with larger platforms, like Veritone (where the tech is part of its advertising repertoire) and Descript (which uses it in the software it makes for editing podcasts).

These voice clones have simply been a novelty in the past, appearing as one-off fakes like this Joe Rogan fake, but they’re beginning to be used in serious projects. In July, a documentary about chef Anthony Bourdain stirred controversy when the creators revealed they’d used AI to create audio of Bourdain “speaking” lines he’d written in a letter. (Notably, few people noticed the deepfake until the creators revealed its existence.) And in August, the startup Sonantic announced it had created an AI voice clone of actor Val Kilmer, whose own voice was damaged in 2014 after he underwent a tracheotomy as part of his treatment for throat cancer. These examples also frame some of the social and ethical dimensions of this technology. The Bourdain use case was decried as exploitative by many (particularly as its use was not disclosed in the film), while the Kilmer work has been generally lauded, with the technology praised for delivering what other solutions could not.

Celebrity applications of voice clones are likely to be the most prominent in the next few years, with companies hoping the famous will want to boost their income with minimal effort by cloning and renting out their voices. One company, Veritone, launched just such a service earlier this year, saying it would let influencers, athletes, and actors license their AI voice for things like endorsements and radio idents, without ever having to go into a studio. “We’re really excited about what that means for a host of different industries because the hardest part about someone’s voice and being able to use it and being able to expand upon that is the individual’s time,” Sean King, executive vice president at Veritone One, told The Vergecast. “A person becomes the limiting factor in what we’re doing.”

Such applications are not yet widespread (or if they are, they’re not widely talked about), but it seems like an obvious way for celebrities to make money. Bruce Willis, for example, has already licensed his image to be used as a visual deepfake in mobile phone ads in Russia. The deal allows him to make money without ever leaving the house, while the advertising company gets an infinitely malleable actor (and, notably, a much younger version of Willis, straight out of his Die Hard days). These sorts of visual and audio clones could accelerate the scales of economy for celebrity work, allowing them to capitalize on their fame — as long as they’re happy renting out a simulacrum of themselves.

In the here and now, voice synthesis technology is already being built into tools like the eponymous podcast editing software built by US firm Descript. The company’s “Overdub” feature lets a podcaster create an AI clone of their voice so producers can make quick changes to their audio, supplementing the program’s transcription-based editing. As Descript CEO Andrew Mason told The Vergecast: “You can not only delete words in Descript and have it delete the audio, you can type words and it will generate audio in your voice.”

Podcast editing software Descript uses AI voice clones to edit speech like a transcript.
Image: Descript

When I tried Descript’s Overdub feature myself, it was certainly easy enough to use — though, as mentioned above, recording the training data was a bit of a chore. (It was much easier for my colleague and regular Verge podcast host Ashley Carman, who had lots of pre-recorded audio ready to send the AI.) The voice clones made by Overdub are not flawless, certainly. They have an odd warble to their tone and lack the ability to really charge lines with emotion and emphasis, but they’re also unmistakably you. The first time I used my voice clone was a genuinely uncanny moment. I had no idea that this deeply personal thing — my voice — could be copied by technology so quickly and easily. It felt like a meeting with the future but was also strangely familiar. After all, life is already full of digital mirrors — of avatars and social media feeds that are supposed to embody “you” in various forms — so why not add a speaking automaton to the mix?

The initial shock of hearing a voice clone of yourself doesn’t mean human voices are redundant, though. Far from it. You can certainly improve on the quality of voice deepfakes with a little manual editing, but in their automated form, they still can’t deliver anywhere near the range of inflection and intonation you get from professionals. As voice artist and narrator Andia Winslow told The Vergecast, while AI voices might be useful for rote voice work — for internal messaging systems, automated public announcements, and the like — they can’t compete with humans in many use cases. “For big stuff, things that need breath and life, it’s not going to go that way because, partly, these brands like working with the celebrities they hire, for example,” said Winslow.

But what does this technology mean for the general public? For those of us who aren’t famous enough to benefit from the technology and are not professionally threatened by its development? Well, the potential applications are varied. It’s not hard to imagine a video game where the character creation screen includes an option to create a voice clone, so it sounds like the player is speaking all of the dialogue in the game. Or there might be an app for parents that allows them to copy their voice so that they can read bedtime stories to their children even when they’re not around. Such applications could be done with today’s technology, though the middling quality of quick clones would make them a hard sell.

There are also potential dangers. Fraudsters have already used voice clones to trick companies into moving money into their accounts, and other malicious uses are certainly lurking just beyond the horizon. Imagine, for example, a high school student surreptitiously recording a classmate to create a voice clone of them, then faking audio of that person bad-mouthing a teacher to get them in trouble. If the uses of visual deepfakes are anything to go by, where worries about political misinformation have proven largely misplaced but the technology has done huge damage creating nonconsensual pornography, it’s these sorts of incidents that pose the biggest threats.

One thing’s for sure, though: in the future, anyone will be able to create an AI voice clone of themselves if they want to. But the script this chorus of digital voices will follow has yet to be written.

Repost: Original Source and Author Link

Categories
AI

A Meta prototype lets you build virtual worlds by describing them

Meta is testing an artificial intelligence system that lets people build parts of virtual worlds by describing them, and CEO Mark Zuckerberg showed off a prototype at a live event today. Proof of the concept, called Builder Bot, could eventually draw more people into Meta’s Horizon “metaverse” virtual reality experiences. It could also advance creative AI tech that powers machine-generated art.

In a prerecorded demo video, Zuckerberg walked viewers through the process of making a virtual space with Builder Bot, starting with commands like “let’s go to the beach,” which prompts the bot to create a cartoonish 3D landscape of sand and water around him. (Zuckerberg describes this as “all AI-generated.”) Later commands range from broad demands like creating an island to extremely specific requests like adding altocumulus clouds and — in a joke poking fun at himself — a model of a hydrofoil. They also include playing sound effects like “tropical music,” which Zuckerberg suggests is coming from a boombox that Builder Bot created, although it could also have been general background audio. The video doesn’t specify whether Builder Bot draws on a limited library of human-created models or if the AI plays a role in generating the designs.

Several AI projects have demonstrated image generation based on text descriptions, including OpenAI’s DALL-E, Nvidia’s GauGAN2, and VQGAN+CLIP, as well as more accessible applications like Dream by Wombo. But these well-known projects involve creating 2D images (sometimes very surreal ones) without interactive components, although some researchers are working on 3D object generation.

As described by Meta and shown in the demo, Builder Bot appears to be using voice input to add 3D objects that users can walk around, and Meta is aiming for more ambitious interactions. “You’ll be able to create nuanced worlds to explore and share experiences with others with just your voice,” Zuckerberg promised during the event keynote. Meta made several other AI announcements during the event, including plans for a universal language translator, a new version of a conversational AI system, and an initiative to build new translation models for languages without large written data sets.

Zuckerberg acknowledged that sophisticated interactivity, including the kinds of usable virtual objects many VR users take for granted, poses major challenges. AI generation can pose unique moderation problems if users ask for offensive content or the AI’s training reproduces human biases and stereotypes about the world. And we don’t know the limits of the current system. So for now, you shouldn’t expect to see Builder Bot pop up in Meta’s social VR platform — but you can get a taste of Meta’s plans for its AI future.

Update 12:50PM ET: Added details about later event announcements from Meta.

Repost: Original Source and Author Link

Categories
AI

Facebook says its AI mislabeling a video of Black men as “primates” was “unacceptable”

Facebook is apologizing for an incident where its AI mislabeled a video of Black men with a “primates” label, calling it an “unacceptable error” that it was examining to prevent it from happening again. As reported by the New York Times, users who watched a June 27th video posted by the UK tabloid Daily Mail received an auto-prompt asking whether they wanted to “keep seeing videos about Primates.”

Facebook disabled the entire topic recommendation feature as soon as it realized what was happening, a spokesperson said in an email to The Verge on Saturday.

“This was clearly an unacceptable error,” the spokesperson said. The company is investigating the cause to prevent the behavior from happening again, the spokesperson added. “As we have said, while we have made improvements to our AI we know it’s not perfect and we have more progress to make. We apologize to anyone who may have seen these offensive recommendations.”

The incident is just the latest example of artificial intelligence tools showing gender or racial bias, with facial recognition tools shown to have a particular problem of misidentifying people of color. In 2015, Google apologized after its Photos app tagged photos of Black people as “gorillas.” Last year, Facebook said it was studying whether its algorithms trained using AI—including those of Instagram, which Facebook owns— were racially biased.

In April, the US Federal Trade Commission warned that AI tools that have demonstrated “troubling” racial and gender biases may be in violation of consumer protection laws if they’re used decision-making for credit, housing or employment. “Hold yourself accountable— or be ready for the FTC to do it for you,” FTC privacy attorney Elisa Jillson wrote in a post on the agency’s website.

Repost: Original Source and Author Link

Categories
AI

Listen to an AI voice actor try and flirt with you

The quality of AI-generated voices has improved rapidly in recent years, but there are still aspects of human speech that escape synthetic imitation. Sure, AI actors can deliver smooth corporate voiceovers for presentations and adverts, but more complex performances — a convincing rendition of Hamlet, for example — remain out of reach.

Sonantic, an AI voice startup, says it’s made a minor breakthrough in its development of audio deepfakes, creating a synthetic voice that can express subtleties like teasing and flirtation. The company says the key to its advance is the incorporation of non-speech sounds into its audio; training its AI models to recreate those small intakes of breath — tiny scoffs and half-hidden chuckles — that give real speech its stamp of biological authenticity.

“We chose love as a general theme,” Sonantic co-founder and CTO John Flynn tells The Verge. “But our research goal was to see if we could model subtle emotions. Bigger emotions are a little easier to capture.”

In the video below, you can hear the company’s attempt at a flirtatious AI — though whether or not you think it captures the nuances of human speech is a subjective question. On a first listen, I thought the voice was near-indistinguishable from that of a real person, but colleagues at The Verge say they instantly clocked it as a robot, pointing to the uncanny spaces left between certain words, and a slight synthetic crinkle in the pronunciation.

Sonantic CEO Zeena Qureshi describes the company’s software as “Photoshop for voice.” Its interface lets users type out the speech they want to synthesize, specify the mood of the delivery, and then select from a cast of AI voices, most of which are copied from real human actors. This is by no means a unique offering (rivals like Descript sell similar packages) but Sonantic says its level of customization is more in-depth than that of rivals’.

Emotional choices for delivery include anger, fear, sadness, happiness, and joy, and, with this week’s update, flirtatious, coy, teasing, and boasting. A “director mode” allows for even more tweaking: the pitch of a voice can be adjusted, the intensity of delivery dialed up or down, and those little non-speech vocalizations like laughs and breaths inserted.

Sonantic’s software lets you adjust the delivery of AI-generated speech.
Image: Sonantic

“I think that’s the main difference — our ability to direct and control and edit and sculpt a performance,” says Flynn. “Our clients are mostly triple-A game studios, entertainment studios, and we’re branching out into other industries. We recently did a partnership with Mercedes [to customize its in-car digital assistant] earlier this year.”

As is often the case with such technology, though, the real benchmark for Sonantic’s achievement is the audio that comes fresh out of its machine learning models, rather than what’s used in polished, PR-ready demos. Flynn says the speech synthesized for its flirty video required “very little manual adjustment,” but the company did cycle through a few different renderings to find the very best output.

To try and get a raw and representative sample of Sonantic’s technology, I asked them to render the same line (directed to you, dear Verge reader) using a handful of different moods. You can listen to them yourself to compare.

First, here’s “flirty”:

Then “teasing”:

“Pleased”:

“Cheerful”:

And finally, “casual”:

To my ears, at least, these clips are a lot rougher than the demo. This suggests a few things. First, that manual polishing is needed to get the most out of AI voices. This is true of many AI endeavors, like self-driving cars, which have successfully automated very basic driving but still struggle with that last and all-important 5 percent that defines human competence. It means that fully-automated, totally-convincing AI voice synthesis is still a way off.

Second, I think it shows that the psychological concept of priming can do a lot to trick your senses. The video demo — with its footage of a real human actor being unsettlingly intimate towards the camera — may cue your brain to hear the accompanying voice as real. The best synthetic media, then, might be that which combines real and fake outputs.

Apart from the question of how convincing the technology is, Sonantic’s demo raises other issues — like, what are the ethics of deploying a flirtatious AI? Is it fair to manipulate listeners in this way? And why did Sonantic choose to make its flirting figure female? (It’s a choice that arguably perpetuates a subtle form of sexism in the male-dominated tech industry, where companies tend to code AI assistants as pliant — even flirty — secretaries.)

On the first question, the company said their choice of a female voice was simply inspired by Spike Jonze’s 2013 film Her, where the protagonist falls in love with a female AI assistant named Samantha. On the second, Sonantic said it recognizes the ethical quandaries that accompany the development of new technology, and that it’s careful in how and where it uses its AI voices.

“That’s one of the biggest reasons we’ve stuck to entertainment,” says CEO Qureshi. “CGI isn’t used for just anything — it’s used for the best entertainment products and simulations. We see this [technology] the same way.” She adds that all of the company’s demos include a disclosure that the voice is, indeed, synthetic (though this doesn’t mean much if clients want to use the company’s software to generate voices for more deceitful purposes).

Comparing AI voice synthesis to other entertainment products makes sense. After all, being manipulated by film and TV is arguably the reason we make those things in the first place. But there is also something to be said about the fact that AI will allow such manipulation to be deployed at scale, with less attention to its impact in individual cases. Around the world, for example, people are already forming relationships — even falling in love — with AI chatbots. Adding AI-generated voices to these bots will surely make them more potent, raising questions about how these and other systems should be engineered. If AI voices can convincingly flirt, what might they persuade you to do?

Repost: Original Source and Author Link

Categories
AI

Sony’s new AI driver achieves ‘reliably superhuman’ race times in Gran Turismo

AI agents have bested humans at many games, from chess to Go to poker. Now, the machines can claim a new high score on the classic racing video game series Gran Turismo.

Sony announced today that its researchers have developed an AI driver named GT Sophy that is “reliably superhuman” — able to beat top human drivers in Gran Turismo Sport in back-to-back laps. You might think this an easy challenge. After all, isn’t racing simply a matter of speed and reaction time and therefore simple for a machine to master? But experts in both video game racing and artificial intelligence say GT Sophy’s success is a significant breakthrough, with the agent showing mastery of tactics and strategy.

“Outracing human drivers so skilfully in a head-to-head competition represents a landmark achievement for AI,” writes Stanford automotive professor J. Christian Gerdes in an editorial in the scientific journal Nature that accompanies a paper describing the work. “GT Sophy’s success on the track suggests that neural networks might one day have a larger role in the software of automated vehicles than they do today.”

GT Sophy was trained using a method known as reinforcement learning: essentially a form of trial-and-error in which the AI agent is thrown into an environment with no instructions and rewarded for hitting certain goals. In the case of GT Sophy, Sony’s researchers say they had to craft this “reward function” extremely carefully: for example, fine-tuning penalties for collisions in order to shape a driving style that was aggressive enough to win but that didn’t lead to the AI simply bullying other racers off the road.

Using reinforcement learning, GT Sophy was able to navigate round a racetrack with just a few hours of training and “within a day or two” was faster than 95 percent of drivers in its training dataset. After some 45,000 total hours of training, GT Sophy was able to achieve superhuman performance on three tracks. (For Gran Turismo Sport players, the tracks in question were Dragon Trail Seaside, Lago Maggiore GP, and Circuit de la Sarthe.)

A common concern when testing AI agents against humans is that machines have a number of innate advantages, like perfect recall and fast reaction times. Sony’s researchers note that GT Sophy does have some advantages compared to human players, like a precise map of the course with coordinates of track boundaries and “precise information about the load on each tire, slip angle of each tire, and other vehicle state.” But, they say, they accounted for two particularly important factors: action frequency and reaction time.

GT Sophy’s inputs were capped at 10 Hz, compared to a theoretical maximum human input of 60 Hz. This sometimes led to human drivers displaying “much smoother actions” at high speeds, write the researchers. For reaction times, GT Sophy was able to respond to events in the game environment in 23–30 ms, which is much faster than an estimated top reaction time for professional athletes of 200–250 ms. To compensate, researchers added artificial delay, training GT Sophy with reaction times of 100 ms, 200 ms, and 250 ms. But as they found out: “All three of these tests achieved a superhuman lap time.”

GT Sophy was tested against a trio of top e-sport drivers: Emily Jones, Valerio Gallo, and Igor Fraga. Although none of the humans were able to beat the AI in time trials, their match-ups lead to them discovering new tactics.

“It was really interesting seeing the lines where the AI would go, there were certain corners where I was going out wide and then cutting back in, and the AI was going in all the way around, so I learned a lot about the lines,” e-sports driver Emily Jones said in a testimonial in the Nature paper. “Going into turn 1, for example, I was braking later than the AI, but the AI would get a much better exit than me and beat me to the next corner. I didn’t notice that until I saw the AI and was like, ‘Okay, I should do that instead.’”

Sony says it’s currently working on integrating GT Sophy into future Gran Turismo titles but didn’t offer a schedule for when this might happen.

Repost: Original Source and Author Link

Categories
AI

AI computers can’t patent their own inventions — yet — a US judge rules

Should an artificially intelligent machine be able to patent its own inventions? For a US federal judge, the larger implications of that question were irrelevant. In April 2020, the US Patent and Trademark Office (USPTO) ruled that only “natural persons” could be credited as the inventor of a patent, and a US court decided Thursday that yes, that’s what the law technically says (via Bloomberg).

Not every country agrees with that direction. South Africa and Australia decided to go the other direction, granting one patent and reinstating a second patent application filed by AI researcher Steven Thaler, whose AI system DABUS reportedly came up with a flashing light and a new type of food container. Thaler is the one who sued the US in this case as well — he’s part of a group called The Artificial Inventor Project that’s lobbying for AI recognition around the globe.

You can read the US’s whole decision against Thaler for yourself at the bottom of this post, but it’s pretty simple when you boil it down:

  • The US Patent Act says inventors an inventor must be an “individual”
  • Previous legal decisions have clarified that “individuals” have to be people (not, say, companies)
  • It’s also pretty clear from context that the Patent Act was referring to people
  • AI systems are not people

Oh, and the court says it can only overrule a US agency’s decision if it’s arbitrary, capricious, or obviously illegal — but in this case, the USPTO already laid out its entire reasoning why it plans to stick to the status quo last April. It also asked for public comment in 2019, before it made its ruling.

As to the bigger question, US District Judge Leonie Brinkema had this to say:

“[T]here may come a time when artificial intelligence reaches a level of sophistication such that it might satisfy accepted meanings of inventorship. But that time has not yet arrived, and, if it does, it will be up to Congress to decide how, if at all, it wants to expand the scope of patent law.”

Repost: Original Source and Author Link

Categories
AI

Quick phrases could let you skip ‘Hey, Google’ for common tasks

“Quick phrases” is a new feature currently under development for Google Assistant that could one day let you skip having to say “Hey, Google” for common phrases like “What time is it?” or “Turn the lights on,” 9to5Google reports. The feature is yet to be officially announced, and it’s unclear when it might launch or exactly which devices might support it.

The feature emerged back in April under the codename “Guacamole.” At the time it was called “Voice shortcuts,” and its capabilities seemed limited to silencing alarms and timers, or responding to incoming phone calls. But the new menu discovered by 9to5Google shows a much broader range of tasks, or “salsas” as Google is nicknaming them. These salsas include the ability to ask about the weather, skip songs, or set alarms and timers in addition to just silencing them.

A menu showing Quick Phrases that can be enabled.
Image: 9to5Google

From the settings menu, it appears as though you’ll need to individually enable specific commands to get them to work without a wake word, and then Voice Match will be used to ensure they only respond to your unique voice. Another menu item suggests that the phrases can be set to work across other Google Assistant devices in addition to your own phone.

9to5Google speculates that the feature works by expanding the list of wake phrases an Assistant device is actively listening for. By default, the software is only listening for a “Hey, Google” or “OK, Google” wake phrase, but presumably if you’ve added “What time is it?” as a Quick Phrase this effectively becomes a wake phrase of its own.

A similar feature, introduced in 2019, already exists for Google’s Nest smart speakers and displays that lets you silence an alarm without needing to say a wake word first. Quick Phrases expands this functionality dramatically to potentially encompass a wide variety of other common tasks.

It’s an intriguing feature, especially for smart home controls that are best activated quickly and without much thought. But Google’s software will have its work cut out if it wants to avoid mistaking other random sounds for its expanded list of wake phrases.

Repost: Original Source and Author Link

Categories
AI

DeepMind says its new AI coding engine is as good as an average human programmer

DeepMind has created an AI system named AlphaCode that it says “writes computer programs at a competitive level.” The Alphabet subsidiary tested its system against coding challenges used in human competitions and found that its program achieved an “estimated rank” placing it within the top 54 percent of human coders. The result is a significant step forward for autonomous coding, says DeepMind, though AlphaCode’s skills are not necessarily representative of the sort of programming tasks faced by the average coder.

Oriol Vinyals, principal research scientist at DeepMind, told The Verge over email that the research was still in the early stages but that the results brought the company closer to creating a flexible problem-solving AI — a program that can autonomously tackle coding challenges that are currently the domain of humans only. “In the longer-term, we’re excited by [AlphaCode’s] potential for helping programmers and non-programmers write code, improving productivity or creating new ways of making software,” said Vinyals.

AlphaCode was tested against challenges curated by Codeforces, a competitive coding platform that shares weekly problems and issues rankings for coders similar to the Elo rating system used in chess. These challenges are different from the sort of tasks a coder might face while making, say, a commercial app. They’re more self-contained and require a wider knowledge of both algorithms and theoretical concepts in computer science. Think of them as very specialized puzzles that combine logic, maths, and coding expertise.

In one example challenge that AlphaCode was tested on, competitors are asked to find a way to convert one string of random, repeated s and t letters into another string of the same letters using a limited set of inputs. Competitors cannot, for example, just type new letters but instead have to use a “backspace” command that deletes several letters in the original string. You can read a full description of the challenge below:

An example challenge titled “Backspace” that was used to evaluate DeepMind’s program. The problem is of medium difficulty, with the left side showing the problem description, and the right side showing example test cases.
Image: DeepMind / Codeforces

Ten of these challenges were fed into AlphaCode in exactly the same format they’re given to humans. AlphaCode then generated a larger number of possible answers and winnowed these down by running the code and checking the output just as a human competitor might. “The whole process is automatic, without human selection of the best samples,” Yujia Li and David Choi, co-leads of the AlphaCode paper, told The Verge over email.

AlphaCode was tested on 10 of challenges that had been tackled by 5,000 users on the Codeforces site. On average, it ranked within the top 54.3 percent of responses, and DeepMind estimates that this gives the system a Codeforces Elo of 1238, which places it within the top 28 percent of users who have competed on the site in the last six months.

“I can safely say the results of AlphaCode exceeded my expectations,” Codeforces founder Mike Mirzayanov said in a statement shared by DeepMind. “I was sceptical [sic] because even in simple competitive problems it is often required not only to implement the algorithm, but also (and this is the most difficult part) to invent it. AlphaCode managed to perform at the level of a promising new competitor.”

An example interface of AlphaCode tackling a coding challenge. The input is given as it is to humans on the left and the output generated on the right.
Image: DeepMind

DeepMind notes that AlphaCode’s current skill set is only currently applicable within the domain of competitive programming but that its abilities open the door to creating future tools that make programming more accessible and one day fully automated.

Many other companies are working on similar applications. For example, Microsoft and the AI lab OpenAI have adapted the latter’s language-generating program GPT-3 to function as an autocomplete program that finishes strings of code. (Like GPT-3, AlphaCode is also based on an AI architecture known as a transformer, which is particularly adept at parsing sequential text, both natural language and code). For the end user, these systems work just like Gmails’ Smart Compose feature — suggesting ways to finish whatever you’re writing.

A lot of progress has been made developing AI coding systems in recent years, but these systems are far from ready to just take over the work of human programmers. The code they produce is often buggy, and because the systems are usually trained on libraries of public code, they sometimes reproduce material that is copyrighted.

In one study of an AI programming tool named Copilot developed by code repository GitHub, researchers found that around 40 percent of its output contained security vulnerabilities. Security analysts have even suggested that bad actors could intentionally write and share code with hidden backdoors online, which then might be used to train AI programs that would insert these errors into future programs.

Challenges like these mean that AI coding systems will likely be integrated slowly into the work of programmers — starting as assistants whose suggestions are treated with suspicion before they are trusted to carry out work on their own. In other words: they have an apprenticeship to carry out. But so far, these programs are learning fast.

Repost: Original Source and Author Link