Categories
AI

OpenAI’s DALL-E AI image generator can now edit pictures, too

Artificial intelligence research group OpenAI has created a new version of DALL-E, its text-to-image generation program. DALL-E 2 features a higher-resolution and lower-latency version of the original system, which produces pictures depicting descriptions written by users. It also includes new capabilities, like editing an existing image. As with previous OpenAI work, the tool isn’t being directly released to the public. But researchers can sign up online to preview the system, and OpenAI hopes to later make it available for use in third-party apps.

The original DALL-E, a portmanteau of the artist “Salvador Dalí” and the robot “WALL-E,” debuted in January of 2021. It was a limited but fascinating test of AI’s ability to visually represent concepts, from mundane depictions of a mannequin in a flannel shirt to “a giraffe made of turtle” or an illustration of a radish walking a dog. At the time, OpenAI said it would continue to build on the system while examining potential dangers like bias in image generation or the production of misinformation. It’s attempting to address those issues using technical safeguards and a new content policy while also reducing its computing load and pushing forward the basic capabilities of the model.

A DALL-E 2 result for “Shiba Inu dog wearing a beret and black turtleneck.”

A DALL-E 2 result for “Shiba Inu dog wearing a beret and black turtleneck.”

One of the new DALL-E 2 features, inpainting, applies DALL-E’s text-to-image capabilities on a more granular level. Users can start with an existing picture, select an area, and tell the model to edit it. You can block out a painting on a living room wall and replace it with a different picture, for instance, or add a vase of flowers on a coffee table. The model can fill (or remove) objects while accounting for details like the directions of shadows in a room. Another feature, variations, is sort of like an image search tool for pictures that don’t exist. Users can upload a starting image and then create a range of variations similar to it. They can also blend two images, generating pictures that have elements of both. The generated images are 1,024 x 1,024 pixels, a leap over the 256 x 256 pixels the original model delivered.

DALL-E 2 builds on CLIP, a computer vision system that OpenAI also announced last year. “DALL-E 1 just took our GPT-3 approach from language and applied it to produce an image: we compressed images into a series of words and we just learned to predict what comes next,” says OpenAI research scientist Prafulla Dhariwal, referring to the GPT model used by many text AI apps. But the word-matching didn’t necessarily capture the qualities humans found most important, and the predictive process limited the realism of the images. CLIP was designed to look at images and summarize their contents the way a human would, and OpenAI iterated on this process to create “unCLIP” — an inverted version that starts with the description and works its way toward an image. DALL-E 2 generates the image using a process called diffusion, which Dhariwal describes as starting with a “bag of dots” and then filling in a pattern with greater and greater detail.

An existing image of a room with a flamingo added in one corner.

An existing image of a room with a flamingo added in one corner.

Interestingly, a draft paper on unCLIP says it’s partly resistant to a very funny weakness of CLIP: the fact that people can fool the model’s identification capabilities by labeling one object (like a Granny Smith apple) with a word indicating something else (like an iPod). The variations tool, the authors say, “still generates pictures of apples with high probability” even when using a mislabeled picture that CLIP can’t identify as a Granny Smith. Conversely, “the model never produces pictures of iPods, despite the very high relative predicted probability of this caption.”

DALL-E’s full model was never released publicly, but other developers have honed their own tools that imitate some of its functions over the past year. One of the most popular mainstream applications is Wombo’s Dream mobile app, which generates pictures of whatever users describe in a variety of art styles. OpenAI isn’t releasing any new models today, but developers could use its technical findings to update their own work.

A DALL-E 2 result for “a bowl of soup that looks like a monster, knitted out of wool.”

A DALL-E 2 result for “a bowl of soup that looks like a monster, knitted out of wool.”

OpenAI has implemented some built-in safeguards. The model was trained on data that had some objectionable material weeded out, ideally limiting its ability to produce objectionable content. There’s a watermark indicating the AI-generated nature of the work, although it could theoretically be cropped out. As a preemptive anti-abuse feature, the model also can’t generate any recognizable faces based on a name — even asking for something like the Mona Lisa would apparently return a variant on the actual face from the painting.

DALL-E 2 will be testable by vetted partners with some caveats. Users are banned from uploading or generating images that are “not G-rated” and “could cause harm,” including anything involving hate symbols, nudity, obscene gestures, or “major conspiracies or events related to major ongoing geopolitical events.” They must also disclose the role of AI in generating the images, and they can’t serve generated images to other people through an app or website — so you won’t initially see a DALL-E-powered version of something like Dream. But OpenAI hopes to add it to the group’s API toolset later, allowing it to power third-party apps. “Our hope is to keep doing a staged process here, so we can keep evaluating from the feedback we get how to release this technology safely,” says Dhariwal.

Additional reporting from James Vincent.

Repost: Original Source and Author Link

Categories
Game

‘World of Warcraft: Dragonflight’ won’t use gendered language in its character generator

World of Warcraft: Dragonflight is joining the ranks of games with more inclusive character generators. Both Wowhead and Polygon note the expansion’s new alpha release has dropped gendered language from its character creator. Instead of the male and female options you frequently see in these tools, they’re now divided into respective “Body 1” and “Body 2” sections. While they effectively offer the same characteristics as before, you can now build a gender non-conforming adventurer without any awkward wording.

Wowhead also found code suggesting that you may get to choose he/him, she/her and they/them pronouns in a future release, which could help other players address your character accordingly. Game director Ion Hazzikostas also suggested in an interview that there might be a way to choose your character’s voice at some point, although the most recent alpha version pulled references to that potential feature.

The changes might not be as substantial as you’d like. You can’t have facial hair and breasts on the same character in the alpha, for instance. Still, this could make World of Warcraft more appealing if you’re non-binary, transgender or otherwise don’t fit neatly into conventional gender representations.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.

Repost: Original Source and Author Link

Categories
Tech News

Sony Motion Sonic gesture-based music effect generator lands on Indiegogo

Crowdfunding continues to be a significant source of new product launches not only for the small independent developers that were initially envisioned to use the platforms but also for massive corporations such as Sony. Sony has revealed a new product called Motion Sonic aimed directly at musicians, DJs, and other performers. The product has landed on Indiegogo seeking support from those interested in it with the goal of raising a little over $79,000 with 31 days to go on the project.

So far, ten people have backed the project with pricing at around $218. The total money raised so far is $2220. The device is a bit difficult to explain and will be available in the US and Japan for iOS devices. It’s designed to help creators expand their creative reach with electronic musical instruments. Motion Sonic has a sensor that can tell when a performer gestures with their hand and can change the sound in pitch, vibrato, or modulation depending on the movement.

If a DJ uses Motion Sonic when the DJ raises their hand, a delay can be added to the music output. Motion Sonic is a system featuring a wearable motion sensor and a paired smartphone application. The sensor in the wearable device detects movement and transfers it to the smartphone app using Bluetooth. The Bluetooth signal connects to the instrument via an audio interface to actualize the desired sound effects.

Sony says the effects generator was developed to enable sound manipulation with body movements and to give artists a new creative dimension. Sony is offering the first 400 buyers the opportunity to purchase at $219. After the first 400 units are sold, the price for the Motion Sonic increases to $249. The funding campaign ends on June 28, and Sony expects to ship the Motion Sonic in March 2022. Sony is also working on new microphones under the Motion Sonic program that will amplify wind noise. The goal is to enable additional ways to harness body motion for sound.

Repost: Original Source and Author Link