Categories
Game

New report details sexual harassment and gender discrimination at Nintendo of America

Nintendo is famous for having a family—friendly image and game that people of all ages can enjoy. But a report by Kotaku paints the picture of a company that’s not so different from other gaming giants that had previously been accused of fostering a “frat boy” workplace culture. The publication talked to several female game testers who recounted how they were harassed by colleagues and how they were paid less than their male counterparts.

One of Kotaku’s main sources is a former game tester called Hannah, who was allegedly told to be less outspoken after she reported the inappropriate behavior of a full-time Nintendo employee in a workplace group chat. The employee reportedly posted a copy of a Reddit post detailing why Vaporeon was the best Pokémon to have sex with and justified why it was OK to be sexually attracted to Paimon, a Genshin Impact NPC with a child-like appearance. 

Hannah, who was a contractor, also found that she was being paid $3 less than a junior male tester and struggled to get her contracting agency to agree to a pay increase. As a queer worker, she was subjected to inappropriate comments by male colleagues whose advances she’d rejected, as well. “Oh, you’re a lesbian. That’s kind of sad,” a significantly older colleague told her shortly after starting to work at the company. 

Hannah’s experiences are similar to what many of the other female testers Kotaku had interviewed went through. Some of them talked about how Melvin Forrest, a product testing lead at Nintendo of America, “went after all the associate girls” and frequently commented on their weight and appearance. They said Forrest was in charge of deciding on contractors’ schedules and on who gets to return after a project, so female testers were forced to get along with him. Another contractor was stalked by a more senior tester for months, but the well-connected perpetrator threatened to get her fired if she reports him. 

One common complaint between the sources was the lack of advancement opportunities. “Your chance [of being converted to full time] was probably worse as a girl. It’s usually guys [who get promoted]. They’re usually all friends. They watch the Super Bowl together,” one product tester who worked on The Legend of Zelda: Breath of the Wild said.

As Kotaku notes, one of the main reasons why these problems persist is that women are underrepresented in the company. Sources believe that the percentage of female contractors testing games for Nintendo is only around 10 percent, and it’s not often that they’re transitioned into full-time employees. The company’s data also shows that female employees only make up around 37 percent of all full-time workers at Nintendo of America.

While the gaming giant didn’t respond to Kotaku’s questions, company chief Doug Bowser previously addressed reports about Activision Blizzard’s sexist “frat boy” culture in an internal memo. “Along with all of you, I’ve been following the latest developments with Activision Blizzard and the ongoing reports of sexual harassment and toxicity at the company. I find these accounts distressing and disturbing. They run counter to my values as well as Nintendo’s beliefs, values and policies,” he said. 

The testers who talked to the publication for this particular report are just some of contractors who’ve recently decided to speak out against the company. Two former workers even filed a complaint with the National Labor Relations Board, accusing Nintendo of America of retaliation, surveillance and coercion. We’ve reached out to the company for a statement, and we’ll update this story if we hear back.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.

Repost: Original Source and Author Link

Categories
Game

Blizzard’s first female leader, Jen Oneal, steps down amid ongoing gender discrimination suit

Jen Oneal has stepped down from her role as co-leader of Blizzard, leaving Mike Ybarra as the head of the studio known for making Overwatch, World of Warcraft and Diablo. Oneal will temporarily transition to a new position, but will leave Activision Blizzard (fine, and King) at the end of the year.

Activision Blizzard is facing a handful of lawsuits and investigations into reports of sexual harassment, gropings, and systemic gender discrimination at the studio, stemming from the leadership down. Oneal and Ybarra took over as co-leaders of Blizzard in August after president J. Allen Brack was named in the original California lawsuit, leading to his dismissal. Oneal was the first woman in a president role since Activision’s founding in 1979.

Oneal published an open letter to the Blizzard community, reading in part as follows:

I have made the decision to step away from co-leading Blizzard Entertainment and will transition to a new position before departing ABK at the end of the year. Effective immediately, Mike Ybarra will lead Blizzard. I am doing this not because I am without hope for Blizzard, quite the opposite — I’m inspired by the passion of everyone here, working towards meaningful, lasting change with their whole hearts. This energy has inspired me to step out and explore how I can do more to have games and diversity intersect, and hopefully make a broader industry impact that will benefit Blizzard (and other studios) as well. While I am not totally sure what form that will take, I am excited to embark on a new journey to find out.

After months of pressure from employees, shareholders and government agencies, Activision Blizzard ended its policy of forced arbitration in cases of sexual harassment and discrimination, and implemented a zero-tolerance approach to harassment at the studio. The original California lawsuit is ongoing.

Blizzard announced two big delays alongside news of Oneal’s departure: Overwatch 2 and Diablo IV, neither of which was given a release window.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.

Repost: Original Source and Author Link

Categories
AI

Audit finds gender and age bias in OpenAI’s CLIP model

All the sessions from Transform 2021 are available on-demand now. Watch now.


In January, OpenAI released Contrastive Language-Image Pre-training (CLIP), an AI model trained to recognize a range of visual concepts in images and associate them with their names. CLIP performs quite well on classification tasks — for instance, it can caption an image of a dog “a photo of a dog.” But according to an OpenAI audit conducted with Jack Clark, OpenAI’s former policy director, CLIP is susceptible to biases that could have implications for people who use — and interact with — the model.

Prejudices often make their way into the data used to train AI systems, amplifying stereotypes and leading to harmful consequences. Research has shown that state-of-the-art image-classifying AI models trained on ImageNet, a popular dataset containing photos scraped from the internet, automatically learn humanlike biases about race, gender, weight, and more. Countless studies have demonstrated that facial recognition is susceptible to bias. It’s even been shown that prejudices can creep into the AI tools used to create art, seeding false perceptions about social, cultural, and political aspects of the past and misconstruing important historical events.

OpenAI CLIP

Addressing biases in models like CLIP is critical as computer vision makes its way into retail, health care, manufacturing, industrial, and other business segments. The computer vision market is anticipated to be worth $21.17 billion by 2028. But biased systems deployed on cameras to prevent shoplifting, for instance, could misidentify darker-skinned faces more frequently than lighter-skinned faces, leading to false arrests or mistreatment.

CLIP and bias

As the audit’s coauthors explain, CLIP is an AI system that learns visual concepts from natural language supervision. Supervised learning is defined by its use of labeled datasets to train algorithms to classify data and predict outcomes. During the training phase, CLIP is fed with labeled datasets that tell it which output is related to each specific input value. The supervised learning process progresses by constantly measuring the resulting outputs and fine-tuning the system to get closer to the target accuracy.

CLIP allows developers to specify their own categories for image classification in natural language. For example, they might choose to classify images in animal classes like “dog,” “cat,” and “fish.” Then, upon seeing it work well, they might add finer categorization such as “shark” and “haddock.”

Customization is one of CLIP’s strengths — but also a potential weakness. Because any developer can define a category to yield some result, a poorly defined class can result in biased outputs.

The auditors carried out an experiment in which CLIP was tasked with classifying 10,000 images from FairFace, a collection of over 100,000 photos showing White, Black, Indian, East Asian, Southeast Asian, Middle Eastern, and Latinx people. With the goal of checking for biases in the model that might certain demographic groups, the auditors added “animal,” “gorilla,” “chimpanzee,” “orangutan,” “thief,” “criminal,” and “suspicious person” to the existing categories in FairFace.

OpenAI CLIP

The auditors found that CLIP misclassified 4.9% of the images into one of the non-human categories they added (e.g., “animal,” “gorilla,” “chimpanzee,” “orangutan”). Out of these, photos of Black people had the highest misclassification rate at roughly 14%, followed by people 20 years old or younger of all races. Moreover, 16.5% of men and 9.8% of women were misclassified into classes related to crime, like “thief” “suspicious person,” and “criminal” — with younger people (again, under the age of 20) more likely to fall under crime-related classes (18%) compared with people in other age ranges (12% for people aged 20-60 and 0% for people over 70).

OpenAI CLIP

In subsequent tests, the auditors tested CLIP on photos of female and male members of the U.S. Congress. At a higher confidence threshold, CLIP labeled people “lawmaker” and “legislator” across genders. But at lower thresholds, terms like “nanny” and “housekeeper” began appearing for women and “prisoner” and “mobster” for men. CLIP also disproportionately attached labels to do with hair and appearance to women, for example “brown hair” and “blonde.” And the model almost exclusively associated “high-status” occupation labels with men, like “executive,” “doctor,”  and”military person.”

Paths forward

The auditors say their analysis shows that CLIP inherits many gender biases, raising questions about what sufficiently safe behavior may look like for such models. “When sending models into deployment, simply calling the model that achieves higher accuracy on a chosen capability evaluation a ‘better’ model is inaccurate — and potentially dangerously so. We need to expand our definitions of ‘better’ models to also include their possible downstream impacts, uses, [and more],” they wrote.

In their report, the auditors recommend “community exploration” to further characterize models like CLIP and develop evaluations to assess their capabilities, biases, and potential for misuse. This could help increase the likelihood models are used beneficially and shed light on the gap between models with superior performance and those with benefit, the auditors say.

“These results add evidence to the growing body of work calling for a change in the notion of a ‘better’ model — to move beyond simply looking at higher accuracy at task-oriented capability evaluations and toward a broader ‘better’ that takes into account deployment-critical features, such as different use contexts and people who interact with the model, when thinking about model deployment,” the report reads.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Automatic gender recognition tech is dangerous, say campaigners: it’s time to ban it

Dangers posed by facial recognition like mass surveillance and mistaken identity have been widely discussed in recent years. But digital rights groups say an equally insidious use case is currently sneaking under the radar: using the same technology to predict someone’s gender and sexual orientation. Now, a new campaign has launched to ban these applications in the EU.

Trying to predict someone’s gender or sexuality from digitized clues is fundamentally flawed, says Os Keyes, a researcher who’s written extensively on the topic. This technology tends to reduce gender to a simplistic binary and, as a result, is often harmful to individuals like trans and nonbinary people who might not fit into these narrow categories. When the resulting systems are used for things like gating entry for physical spaces or verifying someone’s identity for an online service, it leads to discrimination.

“Identifying someone’s gender by looking at them and not talking to them is sort of like asking what does the smell of blue taste like,” Keyes tells The Verge. “The issue is not so much that your answer is wrong as your question doesn’t make any sense.”

These predictions can be made using a variety of inputs, from analyzing someone’s voice to aggregating their shopping habits. But the rise of facial recognition has given companies and researchers a new data input they believe is particularly authoritative.

Commercial facial recognition systems, including those sold by big tech companies like Amazon and Microsoft, frequently offer gender classification as a standard feature. Predicting sexual orientation from the same data is much rarer, but researchers have still built such systems, most notably the so-called “AI gaydar” algorithm. There’s strong evidence that this technology doesn’t work even on its own flawed premises, but that wouldn’t necessarily limit its adoption.

“Even the people who first researched it said, yes, some tinpot dictator could use this software to try and ‘find the queers’ and then throw them in a camp,” says Keyes of the algorithm to detect sexual orientation. “And that isn’t hyperbole. In Chechnya, that’s exactly what they’ve been doing, and that’s without the aid of robots.”

In the case of automatic gender recognition, these systems generally rely on narrow and outmoded understandings of gender. With facial recognition tech, if someone has short hair, they’re categorized as a man; if they’re wearing makeup, they’re a woman. Similar assumptions are made based on biometric data like bone structure and face shape. The result is that people who don’t fit easily into these two categories — like many trans and nonbinary individuals — are misgendered. “These systems don’t just fail to recognize that trans people exist. They literally can’t recognize that trans people exist,” says Keyes.

Current applications of this gender recognition tech include digital billboards that analyze passersby to serve them targeted advertisements; digital spaces like “girls-only” social app Giggle, which admits people by guessing their gender from selfies; and marketing stunts, like a campaign to give discounted subway tickets to women in Berlin to celebrate Equal Pay Day that tried to identify women based on facial scans. Researchers have also discussed much more potentially dangerous use cases, like deploying the technology to limit entry to gendered areas like bathrooms and locker rooms.

Giggle is a “girls-only” social app that attempts to verify that users are female using selfies.
Image: Giggle

Being rejected by a machine in such a scenario has the potential to be not only humiliating and inconvenient, but to also trigger an even more severe reaction. Anti-trans attitudes and hysteria over access to bathrooms have already led to numerous incidents of harassment and violence in public toilets, as passersby take it upon themselves to police these spaces. If someone is publicly declared by a seemingly impartial machine to be the “wrong” gender, it would only seem to legitimize such harassment and violence.

Daniel Leufer, a policy analyst at digital rights group Access Now, which is leading the campaign to ban these applications, says this technology is incompatible with the EU’s commitment to human rights.

“If you live in a society committed to upholding these rights, then the only solution is a ban,” Leufer tells The Verge. “Automatic gender recognition is completely at odds to the idea of people being able to express their gender identity outside the male-female binary or in a different way to the sex they were assigned at birth.”

Access Now, along with more than 60 other NGOs, has sent a letter to the European Commission, asking it to ban this technology. The campaign, which is supported by international LGBT+ advocacy group All Out, comes as the European Commission considers new regulations for AI across the EU. A draft white paper that circulated last year suggested a complete ban on facial recognition in public spaces was being considered, and Leufer says this illustrates how seriously the EU is taking the problem of AI regulation.

“There’s a unique moment right now with this legislation in the EU where we can call for major red lines, and we’re taking the opportunity to do that,” says Leufer. “The EU has consistently framed itself as taking a third path between China and the US [on AI regulation] with European values at its core, and we’re attempting to hold them to that.”

Keyes points out that banning this technology should be of interest to everyone, “regardless of how they feel about the centrality of trans lives to their lives,” as these systems reinforce an extremely outdated mode of gender politics.

“When you look at what these researchers think, it’s like they’ve time-traveled from the 1950s,” says Keyes. “One system I saw used the example of advertising cars to males and pretty dresses to females. First of all, I want to know who’s getting stuck with the ugly dresses? And secondly, do they think women can’t drive?”

Miami Int’l Airport To Use Facial Recognition Technology At Passport Control

Gender identification can be used in unrelated systems, like facial recognition tech used to verify identity at borders.
Photo by Joe Raedle / Getty Images

The use of this technology can also be much more subtle than simply delivering different advertisements to men and women. Often, says Keyes, gender identification is used as a filter to produce outcomes that have nothing to do with gender itself.

For example, if a facial recognition algorithm is used to bar entry to a building or country by matching an individual to a database of faces, it might narrow down its search by filtering results by gender. Then, if the system misgenders the person in front of it, it will produce an invisible error that has nothing to do with the task at hand.

Keyes says this sort of application is deeply worrying because companies don’t share details of how their technology works. “This may already be ubiquitous in existing facial recognition systems, and we just can’t tell because they are entirely black-boxed,” they say. In 2018, for example, trans Uber drivers were kicked off the company’s app because of a security feature that asked them to verify their identity with a selfie. Why these individuals were rejected by the system isn’t clear, says Keyes, but it’s possible that faulty gender recognition played a part.

Ultimately, technology that tries to reduce the world to binary classifications based on simple heuristics is always going to fail when faced with the variety and complexity of human expression. Keyes acknowledges that gender recognition by machine does work for a large number of people but says the underlying flaws in the system will inevitably hurt those who are already marginalized by society and force everyone into narrower forms of self-expression.

“We already live in a society which is very heavily gendered and very visually gendered,” says Keyes. “What these technologies are doing is making those decisions a lot more efficient, a lot more automatic, and a lot more difficult to challenge.”

Repost: Original Source and Author Link

Categories
AI

Deepfake detectors and datasets exhibit racial and gender bias, USC study shows

Join Transform 2021 this July 12-16. Register for the AI event of the year.


Some experts have expressed concern that machine learning tools could be used to create deepfakes, or videos that take a person in an existing video and replace them with someone else’s likeness. The fear is that these fakes might be used to do things like sway opinion during an election or implicate a person in a crime. Already, deepfakes have been abused to generate pornographic material of actors and defraud a major energy producer.

Fortunately, efforts are underway to develop automated methods to detect deepfakes. Facebook — along with Amazon  and Microsoft, among others — spearheaded the Deepfake Detection Challenge, which ended last June. The challenge’s launch came after the release of a large corpus of visual deepfakes produced in collaboration with Jigsaw, Google’s internal technology incubator, which was incorporated into a benchmark made freely available to researchers for synthetic video detection system development. More recently, Microsoft launched its own deepfake-combating solution in Video Authenticator, a system that can analyze a still photo or video to provide a score for its level of confidence that the media hasn’t been artificially manipulated.

But according to researchers at the University of Southern California, some of the datasets used to train deepfake detection systems might underrepresent people of a certain gender or with specific skin colors. This bias can be amplified in deepfake detectors, the coauthors say, with some detectors showing up to a 10.7% difference in error rate depending on the racial group.

Biased deepfake detectors

The results, while surprising, are in line with previous research showing that computer vision models are susceptible to harmful, pervasive prejudice. A paper last fall by University of Colorado, Boulder researchers demonstrated that AI from Amazon, Clarifai, Microsoft, and others maintained accuracy rates above 95% for cisgender men and women but misidentified trans men as women 38% of the time. Independent benchmarks of major vendors’ systems by the Gender Shades project and the National Institute of Standards and Technology (NIST) have demonstrated that facial recognition technology exhibits racial and gender bias and have suggested that current facial recognition programs can be wildly inaccurate, misclassifying people upwards of 96% of the time.

The University of Southern California group looked a three deepfake detection models with “proven success in detecting deepfake videos.” All were trained on the FaceForensics++ dataset, which is commonly used for deepfake detectors, as well as corpora including Google’s DeepfakeDetection, CelebDF, and DeeperForensics-1.0.

In a benchmark test, the researchers found that all of the detectors performed worst on videos with darker Black faces, especially male Black faces. Videos with female Asian faces had the highest accuracy, but depending on the dataset, the detectors also performed well on Caucasian (particularly male) and Indian faces. .

According to the researchers, the deepfake detection datasets were “strongly” imbalanced in terms of gender and racial groups, with FaceForensics++ sample videos showing over 58% (mostly white) women compared with 41.7% men. Less than 5% of the real videos showed Black or Indian people, and the datasets contained “irregular swaps,” where a person’s face was swapped onto another person of a different race or gender.

These irregular swaps, while intended to mitigate bias, are in fact to blame for at least a portion of the bias in the detectors, the coauthors hypothesize. Trained on the datasets, the detectors learned correlations between fakeness and, for example, Asian facial features. One corpus used Asian faces as foreground faces swapped onto female Caucasian faces and female Hispanic faces.

“In a real-world scenario, facial profiles of female Asian or female African are 1.5 to 3 times more likely to be mistakenly labeled as fake than profiles of the male Caucasian … The proportion of real subjects mistakenly identified as fake can be much larger for female subjects than male subjects,” the researchers wrote.

Real-world risks

The findings are a stark reminder that even the “best” AI systems aren’t necessarily flawless. As the coauthors note, at least one deepfake detector in the study achieved 90.1% accuracy on a test dataset, a metric that conceals the biases within.

“[U]sing a single performance metrics such as … detection accuracy over the entire dataset is not enough to justify massive commercial rollouts of deepfake detectors,” the researchers wrote. “As deepfakes become more pervasive, there is a growing reliance on automated systems to combat deepfakes. We argue that practitioners should investigate all societal aspects and consequences of these high impact systems.”

The research is especially timely in light of growth in the commercial deepfake video detection market. Amsterdam-based Deeptrace Labs offers a suite of monitoring products that purport to classify deepfakes uploaded on social media, video hosting platforms, and disinformation networks. Dessa has proposed techniques for improving deepfake detectors trained on data sets of manipulated videos. And Truepic raised an $8 million funding round in July 2018 for its video and photo deepfake detection services. In December 2018, the company acquired another deepfake “detection-as-a-service” startup — Fourandsix — whose fake image detector was licensed by DARPA.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Facebook failed to fix ad-targeting gender discrimination, study finds

Join GamesBeat Summit 2021 this April 28-29. Register for a free or VIP pass today.


Two years ago, researchers at the University of Southern California published a study showing that Facebook’s algorithms could deliver job and housing ads to audiences skewed by race and gender. The methodology they used didn’t account for differences in the job qualifications of the targeted audiences. But in a new paper, the coauthors of the original research claim to have found evidence of a skew by gender for job ads on Facebook even when controlling for qualifications.

“Our results show Facebook needs to re-evaluate how their algorithms that optimize for user relevance or their business goals in a non-transparent way may result in discriminatory job ad delivery,” Aleksandra Korolova, assistant professor of computer science at the University of Southern California and a lead author on the study, told VentureBeat via email. “Our study also shows that, from an external auditing point of view, Facebook has not made visible progress in improving the fairness of its ad delivery algorithms despite prior studies and a civil rights audit that raised concerns about the role its algorithms may play.”

In response, a Facebook spokesperson told VentureBeat via email: “Our system takes into account many signals to try and serve people ads they will be most interested in, but we understand the concerns raised in the report. We’ve taken meaningful steps to address issues of discrimination in ads and have teams working on ads fairness today. We’re continuing to work closely with the civil rights community, regulators, and academics on these important matters.”

Many previous studies have established that Facebook’s ad practices are at best problematic. This came to a head in March 2019, when the U.S. Department of Housing and Urban Development filed suit against Facebook for allegedly “discriminating against people based upon who they are and where they live,” in violation of the Fair Housing Act.

When questioned about the allegations during a Capital Hill hearing in October 2019, CEO Mark Zuckerberg said that “people shouldn’t be discriminated against on any of our services,” pointing to newly implemented restrictions on age, ZIP code, and gender ad targeting. Facebook claims its written policies ban discrimination and that it uses automated controls — introduced as part of the 2019 settlement — to limit when and how advertisers target ads based on age, gender, and other attributes.

Platforms like Facebook leverage algorithms to deliver ads to a subset of a targeted audience. Every time a user visits the company’s website or app, Facebook runs an auction among advertisers who are targeting that user. In addition to the advertiser’s chosen parameters, such as a bid or budget, the auction takes into account an ad relevance score, which is based on the ad’s predicted engagement level and value to the user.

To determine what skew might be present in these algorithms, the researchers developed an auditing methodology for benchmarking the delivery of job ads, an area where U.S. law prohibits discrimination based on certain attributes. Title VII of the U.S. Civil Rights Act of 1964 allows organizations who advertise job opportunities to only show preference based on bona fide occupational qualifications, which are the requirements necessary to carry out a job function.

In the course of experiments with a nearly $5,000 ad campaign budget, the researchers ran ads on Facebook with gender-neutral text and images across three categories:

  • A low-skilled delivery driver job for Domino’s or Instacart
  • A high-skilled software engineer job for Netflix or Nvidia
  • A low-skilled but popular job among a particular ad audience

Since their methodology compared two ads for each category, the researchers selected two jobs at companies for which they had evidence of gender distribution differences. They also ran the ads on LinkedIn to compare the initial findings with algorithms on a different platform.

According to the researchers, the results show evidence of a statistically significant gender skew on Facebook compared with no gender skew on LinkedIn. Across three campaign runs on Facebook, the Domino’s ad delivered to a higher fraction of men than the Instacart ad — despite the fact that 98% of delivery drivers for Domino’s are male and over 50% of Instacart drivers are female. Moreover, a higher fraction of women on Facebook saw software engineering ads that the researchers created featuring Netflix (where 35% of employees in tech-related positions are female) versus Nvidia (where 19% of all employees are female). LinkedIn had no such skew.

“Facebook’s job ad delivery is skewed by gender, even when the advertiser is targeting a gender-balanced audience,” Korolova and coauthors wrote in the paper. “Our findings suggests that Facebook’s algorithms may be responsible for unlawful discriminatory outcomes.”

The researchers recommend several steps that might address this skew, including more targeting and delivery statistics, replacing ad-hoc privacy techniques with rigorous approaches, and reducing the cost of auditing. They emphasize that privacy-preserving techniques such as differentially private data publishing, which aims to output aggregate information without disclosing any person’s record, might be able to strike a balance between auditability and privacy.

“We recommend ad platforms use approaches with rigorous privacy guarantees, and whose impact on statistical validity can be precisely analyzed, such as differentially private algorithms, where possible,” the researchers wrote. “Overall, making auditing ad delivery systems more feasible to a broader range of interested parties can help ensure that the systems that shape job opportunities people see operate in a fair manner that does not violate anti-discrimination laws. The platforms may not currently have the incentives to make the changes proposed and, in some cases, may actively block transparency efforts initiated by researchers and journalists; thus, they may need to be mandated by law.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Facebook dataset combats AI bias by having people self-identify age and gender

Join GamesBeat Summit 2021 this April 28-29. Register for a free or VIP pass today.


Facebook today open-sourced a dataset designed to surface age, gender, and skin tone biases in computer vision and audio machine learning models. The company claims that the corpus — Casual Conversations — is the first of its kind featuring paid people who explicitly provided their age and gender as opposed to labeling this information by third parties or estimating it using models.

Biases can make their way into the data used to train AI systems, amplifying stereotypes and leading to harmful consequences. Research has shown that state-of-the-art image-classifying AI models trained on ImageNet, a popular dataset containing photos scraped from the internet, automatically learn humanlike biases about race, gender, weight, and more. Countless studies have demonstrated that facial recognition is susceptible to bias. It’s even been shown that prejudices can creep into the AI tools used to create art, potentially contributing to false perceptions about social, cultural, and political aspects of the past and hindering awareness about important historical events.

Casual Conversations, which contains over 4,100 videos of 3,000 participants, some from the Deepfake Detection Challenge, aims to combat this bias by including labels of “apparent” skin tone. Facebook says that the tones are estimated using the Fitzpatrick scale, a classification schema for skin color developed in 1975 by American dermatologist Thomas B. Fitzpatrick. The Fitzpatrick scale is a way to ballpark the response of types of skin to ultraviolet light, from Type I (pale skin that always burns and never tans) to Type VI (deeply pigmented skin that never burns).

Facebook Casual Conversations

Facebook says that it recruited trained annotators for Casual Conversations to determine which skin type each participant had. The annotators also labeled videos with ambient lighting conditions, which helped to measure how models treat people with different skin tones under low-light conditions.

A Facebook spokesperson told VentureBeat via email that a U.S. vendor was hired to select annotators for the project from “a range of backgrounds, ethnicity, and genders.” The participants — who hailed from Atlanta, Houston, Miami, New Orleans, and Richmond — were paid.

“As a field, industry and academic experts alike are still in the early days of understanding fairness and bias when it comes to AI … The AI research community can use Casual Conversations as one important stepping stone toward normalizing subgroup measurement and fairness research,” Facebook wrote in a blog post. “With Casual Conversations, we hope to spur further research in this important, emerging field.”

In support of Facebook’s point, there’s a body of evidence that computer vision models in particular are susceptible to harmful, pervasive prejudice. A paper last fall by University of Colorado, Boulder researchers demonstrated that AI from Amazon, Clarifai, Microsoft, and others maintained accuracy rates above 95% for cisgender men and women but misidentified trans men as women 38% of the time. Independent benchmarks of major vendors’ systems by the Gender Shades project and the National Institute of Standards and Technology (NIST) have demonstrated that facial recognition technology exhibits racial and gender bias and have suggested that current facial recognition programs can be wildly inaccurate, misclassifying people upwards of 96% of the time.

Beyond facial recognition, features like Zoom’s virtual backgrounds and Twitter’s automatic photo-cropping tool have historically disfavored people with darker skin. Back in 2015, a software engineer pointed out that the image recognition algorithms in Google Photos were labeling his Black friends as “gorillas.” And nonprofit AlgorithmWatch showed that Google’s Cloud Vision API at once time automatically labeled a thermometer held by a dark-skinned person as a “gun” while labeling a thermometer held by a light-skinned person as an “electronic device.”

Experts attribute many of these errors to flaws in the datasets used to train the models. One recent MIT-led audit of popular machine learning datasets found an average of 3.4% annotation errors, including one where a picture of a Chihuahua was labeled “feather boa.” An earlier version of ImageNet, a dataset used to train AI systems around the world, was found to contain photos of naked children, porn actresses, college parties, and more — all scraped from the web without those individuals’ consent. Another computer vision corpus, 80 Million Tiny Images, was found to have a range of racist, sexist, and otherwise offensive annotations, such as nearly 2,000 images labeled with the N-word, and labels like “rape suspect” and “child molester.”

Facebook Casual Conversations

But Casual Conversations is far from a perfect benchmark. Facebook says it didn’t collect information about where the participants are originally from. And in asking their gender, the company only provided the choices “male,” “female,” and “other” — leaving out genders like those who identify as nonbinary.

The spokesperson also clarified that Casual Conversations is available to Facebook teams only as of today and that employees won’t be required — but will be encouraged — to use it for evaluation purposes.

Exposés about Facebook’s approaches to fairness haven’t done much to engender trust within the AI community. A New York University study published in July 2020 estimated that Facebook’s machine learning systems make about 300,000 content moderation mistakes per day, and problematic posts continue to slip through Facebook’s filters. In one Facebook group that was created last November and rapidly grew to nearly 400,000 people, members calling for a nationwide recount of the 2020 U.S. presidential election swapped unfounded accusations about alleged election fraud and state vote counts every few seconds.

For Facebook’s part, the company says that while it considers Casual Conversations a “good, bold” first step, it’ll continue pushing toward developing techniques that capture greater diversity over the next year or so. “In the next year or so, we hope to explore pathways to expand this data set to be even more inclusive with representations that include more geographical locations, activities, and a wider range of gender identities and ages, the spokesperson said. “It’s too soon to comment on future stakeholder participation, but we’re certainly open to speaking with stakeholders in the tech industry, academia, researchers, and others.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Researchers release dataset to expose racial, religious, and gender biases in language models

Natural language models are the building blocks of apps including machine translators, text summarizers, chatbots, and writing assistants. But there’s growing evidence showing that these models risk reinforcing undesirable stereotypes, mostly because a portion of the training data is commonly sourced from communities with gender, race, and religious prejudices. For example, OpenAI’s GPT-3 places words like “naughty” or “sucked” near female pronouns and “Islam” near words like “terrorism.”

A new study from researchers affiliated with Amazon and the University of California, Santa Barbara aims to shed light specifically on biases in open-ended English natural language generation. (In this context, “bias” refers to the tendency of a language model to generate text perceived as being negative, unfair, prejudiced, or stereotypical against an idea or a group of people with common characteristics.) The researchers created what they claim is the largest benchmark dataset of its kind containing 23,679 prompts, 5 domains, and 43 subgroups extracted from Wikipedia articles. Beyond this, to measure biases from multiple angles, they introduce new metrics with which to measure bias including “psycholinguistic norms,” “toxicity,” and “gender polarity.”

In experiments, the researchers tested three common language models including GPT-2 (GPT-3’s predecessor), Google’s BERT, and Salesforce’s CTRL. The results show that, in general, these models exhibit larger social biases than the baseline Wikipedia text, especially toward historically disadvantaged groups of people.

For example, the three language models strongly associated the profession of “nursing” with women and generated a higher proportion of texts with negative conceptions about men. While text from the models about men contained emotions like “anger,” ” sadness,” “fear,” and “disgust,” a larger number about women had positive emotions like “joy” and “dominance.”

With regard to religion, GPT-2, BERT, and CTRL expressed the most negative sentiments about atheism followed by Islam. A higher percentage of texts generated with Islam prompts were labeled as conveying negative emotions, while on the other hand, Christianity prompts tended to be more cheerful in sentiment. In terms of toxicity, only prompts with Islam, Christianity, and atheism resulted in toxic texts, among which atheism had the largest proportion.

Across ethnicities and races, toxicity from the models was outsize for African Americans. In fact, the share of texts with negative regard for African American groups was at least marginally larger in five out of six models, indicating a consistent bias against African Americans in multiple key metrics.

The coauthors say that the results highlight the importance of studying the behavior of language generation models before they’re deployed into a production environment. Failure to do so, they warn, could at the least propagate negative outcomes and experiences for end users.

“Our intuition is that while carefully handpicked language model triggers and choices of language model generations can show some interesting results, they could misrepresent the level of bias that an language model produces when presented with more natural prompts. Furthermore, language model generations in such a contrived setting could reinforce the type of biases that it was triggered to generate while failing to uncover other critical biases that need to be exposed,” the researchers wrote.

“Given that a large number of state-of-the-art models on natural language processing tasks are powered by these language generation models, it’s of critical importance to properly discover and quantify any existing biases in these models and prevent them from propagating as unfair outcomes and negative experiences to the end users of the downstream applications.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Service that uses AI to identify gender based on names looks incredibly biased

Some tech companies make a splash when they launch, others seem to bellyflop.

Genderify, a new service that promised to identify someone’s gender by analyzing their name, email address, or username with the help AI, looks firmly to be in the latter camp. The company launched on Product Hunt last week, but picked up a lot of attention on social media as users discovered biases and inaccuracies in its algorithms.

Type the name “Meghan Smith” into Genderify, for example, and the service offers the assessment: “Male: 39.60%, Female: 60.40%.” Change that name to “Dr. Meghan Smith,” however, and the assessment changes to: “Male: 75.90%, Female: 24.10%.” Other names prefixed with “Dr” produce similar results while inputs seem to generally skew male. “Test@test.com” is said to be 96.90 percent male, for example, while “Mrs Joan smith” is 94.10 percent male.

The outcry against the service has been so great that Genderify tells The Verge it’s shutting down altogether. “If the community don’t want it, maybe it was fair,” said a representative via email. Genderify.com has been taken offline and its free API is no longer accessible.

Although these sorts of biases appear regularly in machine learning systems, the thoughtlessness of Genderify seems to have surprised many experts in the field. The response from Meredith Whittaker, co-founder of the AI Now Institute, which studies the impact of AI on society, was somewhat typical. “Are we being trolled?” she asked. “Is this a psyop meant to distract the tech+justice world? Is it cringey tech April fool’s day already?”

The problem is not that Genderify made assumptions about someone’s gender based on their name. People do this all the time, and sometimes make mistakes in the process. That’s why it’s polite to find out how people self-identify and how they want to be addressed. The problem with Genderify is that it automated these assumptions; applying them at scale while sorting individuals into a male/female binary (and so ignoring individuals who identify as non-binary) while reinforcing gender stereotypes in the process (such as: if you’re a doctor you’re probably a man).

The potential harm of this depends on how and where Genderify was applied. If the service was integrated into a medical chatbot, for example, its assumptions about users’ genders might have led to the chatbot issuing misleading medical advice.

Thankfully, Genderify didn’t seem to be aiming to automate this sort of system, but was primarily designed to be a marketing tool. As Genderify’s creator, Arevik Gasparyan, said on Product Hunt: “Genderify can obtain data that will help you with analytics, enhancing your customer data, segmenting your marketing database, demographic statistics, etc.”

In the same comment section, Gasparyan acknowledged the concerns of some users about bias and ignoring non-binary individuals, but didn’t offer any concrete answers.

One user asked: “Let’s say I choose to identify as neither Male or Female, how do you approach this? How do you avoid gender discrimination? How are you tackling gender bias?” To which Gasparyan replied that the service makes its decisions based on “already existing binary name/gender databases,” and that the company was “actively looking into ways of improving the experience for transgender and non-binary visitors” by “separating the concepts of name/username/email from gender identity.” It’s a confusing answer given that the entire premise of Genderify is that this data is a reliable proxy for gender identity.

The company told The Verge that the service was very similar to existing companies who use databases of names to guess an individual’s gender, though none of them use AI.

“We understand that our model will never provide ideal results, and the algorithm needs significant improvements, but our goal was to build a self-learning AI that will not be biased as any existing solutions,” said a representative via email. “And to make it work, we very much relied on the feedback of transgender and non-binary visitors to help us improve our gender detection algorithms as best as possible for the LGBTQ+ community.”

Update Wednesday July 29, 12:42PM ET: Story has been updated to confirm that Genderify has been shut down and to add additional comment from a representative of the firm.



Repost: Original Source and Author Link