GitHub and OpenAI launch an AI Copilot tool that generates its own code

GitHub and OpenAI have launched a technical preview of a new AI tool called Copilot, which lives inside the Visual Studio Code editor and autocompletes code snippets.

Copilot does more than just parrot back code it’s seen before, according to GitHub. It instead analyzes the code you’ve already written and generates new matching code, including specific functions that were previously called. Examples on the project’s website include automatically writing the code to import tweets, draw a scatterplot, or grab a Goodreads rating.

It works best with Python, JavaScript, TypeScript, Ruby, and Go, according to a blog post from GitHub CEO Nat Friedman.

GitHub sees this as an evolution of pair programming, where two coders will work on the same project to catch each others’ mistakes and speed up the development process. With Copilot, one of those coders is virtual.

This project is the first major result of Microsoft’s $1 billion investment into OpenAI, the research firm now led by Y Combinator president Sam Altman. Since Altman took the reins, OpenAI has pivoted from a nonprofit status to a “capped-profit” model, took on the Microsoft investment, and started licensing its GPT-3 text-generation algorithm.

Copilot is built on a new algorithm called OpenAI Codex, which OpenAI CTO Greg Brockman describes as a descendant of GPT-3.

GPT-3 is OpenAI’s flagship language-generating algorithm, which can generate text sometimes indistinguishable to human writing. It’s able to write so convincingly because of its sheer size of 175 billion parameters, or adjustable knobs that allow the algorithm to connect relationships between letters, words, phrases, and sentences.

While GPT-3 generates English, OpenAI Codex generates code. OpenAI plans to release a version of Codex through its API later this summer so developers can built their own apps with the tech, a representative for OpenAI told The Verge in an email.

Codex was trained on terabytes of openly available code pulled from GitHub, as well as English language examples.

While testimonials on the site rave about the productivity gains Copilot provides, GitHub implies that not all the code utilized was vetted for bugs, insecure practices, or personal data. The company writes they have put a few filters in place to prevent Copilot from generating offensive language, but it might not be perfect.

“Due to the pre-release nature of the underlying technology, GitHub Copilot may sometimes produce undesired outputs, including biased, discriminatory, abusive, or offensive outputs,” Copilot’s website says.

Given criticisms of GPT-3’s bias and abusive language patterns, it seems that OpenAI hasn’t found a way to prevent algorithms from inheriting its training data’s worst elements.

The company also warns that the model could suggest email addresses, API keys, or phone numbers, but that this is rare and the data has been found to be synthetic or pseudo-randomly generated by the algorithm. However, the code generated by Copilot is largely original. A test performed by GitHub found that only 0.1 percent of generated code could be found verbatim in the training set.

This isn’t the first project to try to automatically generate code to help toiling programmers. The startup Kite pitches a very similar functionality, with availability on more than 16 code editors.

Right now, Copilot is in a restricted technical preview, but you can sign up on the project’s website for a chance to access it.

Repost: Original Source and Author Link

Tech News

This AI generates music from silent piano performances

Scientists have developed an AI that can generate music from silent piano performances, just by watching the movements of the player’s hands.

The system, called Audeo, analyzes top-down videos of someone tickling the ivories to predict which keys are being pressed in each frame. It then produces a transcript of the music, which a synthesizer translates into sound.

The researchers trained and tested the AI on footage of pianist Paul Barton playing tunes by famous composers.

They then evaluated the accuracy of the Audeo’s compositions by playing them to music-recognition apps, such as Shazam and SoundHound.

The apps identified the tune 86% of the time  — just 7% less than they recognized the source videos.

[Read: How much does it cost to buy, own, and run an EV? It’s not as much as you think]

Senior study author Eli Shlizerman, an assistant professor at the University of Washington, said he was surprised by the quality of the AI’s output:

To create music that sounds like it could be played in a musical performance was previously believed to be impossible. An algorithm needs to figure out the cues, or ‘features’ in the video frames that are related to generating music, and it needs to ‘imagine’ the sound that’s happening in between the video frames. It requires a system that is both precise and imaginative. 

You can judge its performances for yourself in the video below:

The researchers have also explored using Audeo to change the style of music. Shlizerman said the system could show how music produced by a piano sounds when played through a trumpet.

He hopes the research will enable new ways for people to interact with music:

For example, one future application is that Audeo can be extended to a virtual piano with a camera recording just a person’s hands. Also, by placing a camera on top of a real piano, Audeo could potentially assist in new ways of teaching students how to play.

You can read the full study paper here.

Published February 5, 2021 — 17:37 UTC

Repost: Original Source and Author Link


Experimental AI framework Vx2Text generates video captions using inferences from audio and text

A grand challenge in AI is developing a conversational system that can reliably understand the world and respond using natural language. Ultimately, solving it will require a model capable of extracting salient information from images, text, audio, and video and answering questions in a way that humans can understand. In a step toward this, researchers at Facebook, Columbia University, Georgia Tech, and Dartmouth developed Vx2Text, a framework for generating text from videos, speech, or audio. They claim that Vx2Text can create captions and answer questions better than previous state-of-the-art approaches.

Unlike most AI systems, humans understand the meaning of text, videos, audio, and images together in context. For example, given text and an image that seem innocuous when considered apart (e.g., “Look how many people love you” and a picture of a barren desert), people recognize that these elements take on potentially hurtful connotations when they’re paired or juxtaposed. Multimodal learning can carry complementary information or trends, which often only become evident when they’re all included in the learning process. And this holds promise for applications from transcription to translating comic books into different languages.

In the case of Vx2Text, “modality-specific” classifiers convert semantic signals from videos, text, or audio into a common semantic language space. This enables language models to directly interpret multimodal data, opening up the possibility of carrying out multimodal fusion — i.e., combining signals to bolster classification — by means of powerful language models like Google’s T5.  A generative text decoder within Vx2Text transforms multimodal features computed by an encoder into text, making the framework suitable for generating natural language responses.

Vx2Text AI

“Not only is such a design much simpler but it also leads to better performance compared to prior approaches,” the researchers wrote in a paper describing their work. Helpfully, it also does away with the need to design specialized algorithms or resort to alternative approaches to combine the signals, they added.

In experiments, the researchers show that Vx2Text generates “realistic” natural text for both audio-visual “scene-aware” dialog and video captioning. Although the researchers provided the model with context in the form of dialog histories and speech transcripts, they note that the generated text includes information from non-text modalities, for example references to actions like helping someone to get up or answering a telephone.

Vx2Text AI

Vx2Text has applications in the enterprise, where it could be used to caption recorded or streamed videos for accessibility purposes. Alternatively, the framework (or something like it) could find its way into video sharing platforms like YouTube and Vimeo, which rely on captioning, among other signals, to improve the relevancy of search results.

“Our approach hinges on the idea of mapping all modalities into a semantic language space in order to enable the direct application of transformer networks, which have been shown to be highly effective at modeling language problems,” the researchers wrote. “This renders our entire model trainable end-to-end.”


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more

Become a member

Repost: Original Source and Author Link