Categories
AI

LinkedIn says it reduced bias in its connection suggestion algorithm

All the sessions from Transform 2021 are available on-demand now. Watch now.


In a blog post today, LinkedIn revealed that it recently completed internal audits aimed at improving People You May Know (PYMK), an AI-powered feature on the platform that suggests other members for users to connect with. LinkedIn claims the changes “level the playing field” for those who have fewer connections and spend less time building their online networks, making PYMK ostensibly useful for more people.

PYMK was the first AI-powered recommender feature at LinkedIn. Appearing on the My Network page, it provides connection suggestions based on commonalities between users and other LinkedIn members, as well as contacts users have imported from email and smartphone address books. Specifically, PYMK draws on shared connections and profile information and experiences, as well as things like employment at a company or in an industry and educational background.

PYMK worked well enough for most users, according to LinkedIn, but it gave some members a “very large” number of connection requests, creating a feedback loop that decreased the likelihood other, less-well-connected members would be ranked highly in PYMK suggestions. Frequently active members on LinkedIn tended to have greater representation in the data used to train the algorithms powering PYMK, leading it to become increasingly biased toward optimizing for frequent users at the expense of infrequent users.

“A common problem when optimizing an AI model for connections is that it often creates a strong ‘rich getting richer’ effect, where the most active members on the platform build a great network, but less active members lose out,” Albert Cui, senior product manager of AI and machine learning at LinkedIn, told VentureBeat via email. “It’s important for us to make PYMK as equitable as possible because we have seen that members’ networks, and their strength, can have a direct impact on professional opportunities. In order to positively impact members’ professional networks, we must acknowledge and remove any barriers to equity.”

Biased algorithms

This isn’t the first time LinkedIn has discovered bias in the recommendation algorithms powering its platform’s features. Years ago, the company found that the AI it used to match job candidates with opportunities was ranking candidates partly on the basis of how likely they were to apply for a position or respond to a recruiter. The system wound up referring more men than women for open roles simply because men are often more aggressive at seeking out new opportunities. To counter this, LinkedIn built an adversarial algorithm designed to ensure that the recommendation system includes a representative distribution of users across gender before referring the matches curated by the original system.

In 2016, a report in the Seattle Times suggested LinkedIn’s search algorithm might be giving biased results, too — along gender lines. According to the publication, searches for the 100 most common male names in the U.S. triggered no prompts asking if users meant predominantly female names, but similar searches of popular female first names paired with placeholder last names brought up LinkedIn’s suggestion to change “Andrea Jones” to “Andrew Jones,” “Danielle” to “Daniel,” “Michaela” to “Michael,” and “Alexa” to “Alex,” for example. LinkedIn denied at the time that its search algorithm was biased but later rolled out an update so any user who searches for a full name if they meant to look up a different name wouldn’t be prompted with suggestions.

Recent history has shown that social media recommendation algorithms are particularly prone to bias, intentional or not. A May 2020 Wall Street Journal article brought to light an internal Facebook study that found the majority of people who join extremist groups do so because of the company’s recommendation algorithms. In April 2019, Bloomberg reported that videos made by far-right creators were among YouTube’s most-watched content. And in a recent report by Media Matters for America, the media monitoring group presents evidence that TikTok’s recommendation algorithm is pushing users toward accounts with far-right views supposedly prohibited on the platform.

Correcting for imbalance

To address the problems with PYMK, LinkedIn researchers used a post-processing technique that reranked PYMK candidates to decrement the score of recipients who’d already had many unanswered invitations. These were mostly “ubiquitously popular” members or celebrities, who often received more invites than they could respond to due to their prominence or networks. LinkedIn thought that this would decrease the number of invitations sent to candidates suggested by PYMK and therefore overall activity. However, while connection requests sent by LinkedIn members indeed decreased 1%, sessions from the people receiving invitations increased by 1% because members with fewer invitations were now receiving more and invitations were less likely to be lost in influencers’ inboxes.

As a part of its ongoing Fairness Toolkit work, LinkedIn also developed and tested methods to rerank members according to theories of equality of opportunity and equalized odds. In PYMK, qualified IMs and FMs are now given equal representation in recommendations, resulting in more invites sent (a 5.44% increase) and connections made (a 4.8% increase) to infrequent members without majorly impacting frequent members.

“One thing that interested us about this work was that some of the results were counterintuitive to what we expected. We anticipated a decrease in some engagement metrics for PYMK as a result of these changes. However, we actually saw net engagement increases after making these adjustments,” Cui continued. “Interestingly, this was similar to what we saw a few years ago when we changed our Feed ranking system to also optimize for creators, and not just for viewers. In both of these instances, we found that prioritizing metrics other than those typically associated with ‘virality’ actually led to longer-term engagement wins and a better overall experience.”

All told, LinkedIn says it reduced the number of overloaded recipients — i.e., members who received too many invitations in the past week — on the platform by 50%. The company also introduced other product changes, such as a Follow button to ensure members could still hear from popular accounts. “We’ve been encouraged by the positive results of the changes we’ve made to the PYMK algorithms so far and are looking forward to continuing to use [our internal tools] to measure fairness to groups along the lines of other attributes beyond frequency of platform visits, such as age, race, and gender,” Cui said.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
Game

A programmer reduced GTA Online load times by 70 percent

Rockstar is one of the biggest game developers on the planet and has some of the most popular and profitable games out there. One of its very popular and profitable games is GTA Online. While the game has been available for seven years and is still wildly popular. One of the things that many gamers complain about is that the game loads very slowly no matter how powerful the hardware it’s being played on.

Recently, a programmer going by t0st use some creative programming techniques to reduce load times by 70 percent. T0st was sitting through a six-minute load for GTA Online on a mid-range gaming PC and opened Task Manager, where he discovered something interesting. After the one-minute mark, the programmer said the CPU usage of his computer spiked dramatically while storage and network usage were virtually nonexistent.

T0st realized that meant long load times weren’t caused by Rockstar servers or data being read from a drive. Something was running directly on the processor that needed a lot of processing power to complete while using only a single thread. T0st then turned to a series of programming and debugging tools leading to two significant issues being uncovered.

The first significant issue was that the game was reading a text file of all purchasable items in the game, 63,000 items in total. It was counting every character in the 10-megabyte text file for each of the 63,000 items meaning a lot of wasted processor time. The other issue was that in preparation for the data to be read, the game recorded both the data associated with that item, such as its name, price, category, and stats, and a hash of that item that uniquely identified it. That process happened 63,000 times.

Load time increases each time more items are loaded into the game. T0st estimated that the game was forming nearly 2 billion checks eating up massive amounts of processor time. To alleviate the issues, the programmer wrote code overwriting some of the game’s functions solving the item reading issue.

He created a basic cache to calculate the length of a list of items once that’s able to return the same value without doing the calculation again whenever the length is asked for by code in the game. That trick slashed the number of times the check needed to be performed from 63,000 to one. The custom code also skips the check for duplicate items chopping nearly 2 billion checks that didn’t need to occur, resulting in a load time cut from around six minutes to less than two minutes.

Repost: Original Source and Author Link

Categories
AI

Study shows that federated learning can lead to reduced carbon emissions

Carbon dioxide, methane, and nitrous oxide levels are at the highest they’ve been in the last 800,000 years. Together with other drivers, greenhouse gases likely catalyzed the global warming that’s been observed since the mid-20th century. Machine learning models, too, have contributed indirectly to the adverse environmental trend. That’s because they require a substantial amount of computational resources and energy — models are routinely trained for thousands of hours on specialized hardware accelerators in datacenters estimated to use 200 terawatt-hours per year. (The average U.S. home consumes about 10,000 kilowatt-hours per year, a fraction of that total.)

This state of affairs motivated researchers at the University of Cambridge, the University of Oxford, University College London, Avignon Universite, and Samsung to investigate more energy-efficient approaches to training AI models. In a newly published paper, they explore whether federated learning, which involves training models across a number of  machines, can lead to lowered carbon emissions compared with traditional learning. Their findings suggest that federated learning has a quantitatively greener impact despite being slower in some cases.

The effects of AI and machine learning model training on the environment are increasingly coming to light. Ex-Google AI ethicist Timnit Gebru recently coauthored a paper on large language models that discussed urgent risks, including carbon footprint. And in June 2020, researchers at the University of Massachusetts at Amherst released a report estimating that the amount of power required for training and searching a certain model involves the emissions of roughly 626,000 pounds of carbon dioxide, equivalent to nearly 5 times the lifetime emissions of the average U.S. car.

In machine learning, federated learning entails training algorithms across different devices holding data samples without exchanging those samples. A centralized server might be used to orchestrate the steps of the algorithm and act as a reference clock, or the arrangement might be peer-to-peer. Regardless, local algorithms are trained on local data samples, and the weights (the learnable parameters of the algorithms) are exchanged between the algorithms at some frequency to generate a global model.

carbon impact AI

Above: Carbon dioxide emissions expressed in grams (lower is better) for both centralized learning and federated learning when they reach the target accuracies, with different setups.

To measure the carbon footprint of a federated learning setup, the coauthors of the new paper trained two models — an image classification model and a speech recognition model — using a server with a single GPU and CPU and two chipsets, Nvidia Tegra X2 and Jetson Xavier NX. They recorded the power consumption of the server and chipsets during training, taking into account how energy usage might vary depending on the country where the chipsets and server are located.

The researchers found that while there’s a difference between carbon dioxide emission factors among countries, federated learning is reliably “cleaner” than centralized training. For example, training on the open source image dataset CIFAR10 in France using federated learning saves from 1.8 grams to 4.4 grams of carbon dioxide compared with centralized training in China. For larger datasets such as ImageNet, any federated learning setup in France emits less than any centralized setup in China and the U.S. And with the speech dataset the researchers used, federated learning is more efficient than centralized training in any country.

Federated learning has an environmental advantage partly due to the cooling needs of datacenters, the researchers explain. According to a recent paper in the journal Science, while strides in datacenter efficiency have mostly kept pace with growing demand for data, the total amount of energy consumed by datacenters made up about 1% of global energy use over the past decade. That’s roughly equivalent to 18 million U.S. homes.

The researchers caution that federated learning isn’t a silver bullet, because a number of factors could make it less efficient than it otherwise might be. Highly distributed databases can prolong training times, translating to a higher level of carbon dioxide emissions. The workload, model architecture, and hardware efficiency also play a role. Even data transfer via Wi-Fi can contribute significantly to carbon emissions depending on the size of model, the size of the dataset, and the energy consumed by devices during training.

Still, the researchers assert that considering the carbon dioxide emissions rate while optimizing AI models could lead to a decrease in pollution while maintaining good performance. Toward this, they call on data scientists to design algorithms that minimize emissions and device manufacturers to increase transparency with respect to energy consumption.

“Federated learning … is starting to be deployed at a global scale by companies that must adhere to new legal demands and policies originating from governments and civil society for privacy protection,” the researchers wrote. “By quantifying carbon emissions for federated  and demonstrating that a proper design of the federated setup leads to a decrease of these emissions, we encourage the integration of the released carbon dioxide as a crucial metric to the federated learning deployment.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more

Become a member

Repost: Original Source and Author Link