Sama aims to bring greater equality to crowd-labeling of datasets with new $70M

Sama, a company providing data to train machine learning systems, has raised $70 million in a series B found led by CDPQ with participation from First Ascent Ventures, Salesforce Ventures, Vistara Capital Partners, and existing investors. CEO Wendy Gonzalez says that the company will use the funding to grow its platform with new products that “enable teams to manage the complete AI lifecycle.”

Data scientists spend about 45% of their time on data preparation tasks including loading and cleaning data, according to Anaconda. A separate report from Alation found that 97% of data leaders have suffered the consequences of ignoring data, either missing out on new revenue opportunities, poorly forecasting performance, or making bad investments. Yet another study — this by MIT Technology Review Insights and commissioned by Databricks — reveals that machine learning’s business impact is limited largely by challenges in managing its end-to-end lifecycle.

Founded by Leila Janah, San Francisco, California-based Sama — formerly Samasource — developed its first relationships with partner delivery centers in 2018, focusing on data entry, sentiment analysis, and data transcription. In 2009, the company launched the initial version of its technology platform, SamaHub, and embarked on a slew of commercial projects — including providing images and annotations used by Microsoft to build out the company’s Xbox Kinect.

“Janah believed that giving meaningful, living-wage work was the best way to permanently lift people out of poverty,” Gonzalez told VentureBeat via email. “To date, we’re the only AI training data provider with a responsible training and employment program that provides actionable career skills for underserved communities to bring us closer to a more equitable future of AI.”

Data platform

Today, Sama hosts a crowd-powered platform through which companies can obtain data labeled to train AI models, like videos, images, computer-generated shapes, radar, and natural language. Customers in industries such as transportation and navigation, retail and ecommerce, and robotics and manufacturing pay for datasets while “crowdworkers” supply annotations in exchange for payment from Sama.

Sama competes with a host of data labeling and annotation platforms in the market, including DefinedCrowd, CrowdFlower, Labelbox, Superb AI, and as well as incumbents like Amazon Mechanical Turk. But the company asserts that it delivers a superior product by tracking 160 million events per month to improve its platform and processes, like machine learning-assisted annotation tools for crowdworkers.


Above: Objects labeled with Sama’s backend tools.

Image Credit: Sama

“Our labelers have three-year average tenure and are subject-matter experts who work with our customers to identify edge cases and recommend annotation best practices,” Sama explains on its website. “Sampling provides feedback to quality managers to ensure teams are working efficiently and effectively, while ‘hold’ tasks and advanced scripting detect errors early in the pipeline.”

When a company contracts with Sama, Sama’s platform creates “micromodels” that are used to generate prelabeled data to assist labelers with annotation. Annotators validate the machine learning-generated labels while Sama works with the company to identify edge cases and recommend annotation best practices.

Post-annotation and deployment, Sama can provide ongoing feedback and monitor models in production. Beyond this, the platform can generate data on “frame-level” annotation and edge cases, producing reports designed to help get models to market faster.


Supervised learning — one of the types of models that requires labels to train — is the most common form of machine learning used in the enterprise. In a recent O’Reilly report, 82% of respondents said that their organization opted to adopt supervised learning versus unsupervised (which doesn’t require labels) or semi-supervised learning (which only requires a small amount of labels). And according to Gartner, supervised learning will remain the type of machine learning that organizations leverage most through 2022.

Labels can bear the hallmarks of inequality, however. For example, an estimated less than 2% of Mechanical Turk workers come from Global South countries, with the vast majority originating from the U.S. and India. ImageNet — a dataset that’s been essential to recent progress in computer vision — wouldn’t have been possible without the work of data labelers. But the ImageNet workers themselves made a median wage of $2 per hour, with only 4% making more than the U.S. federal minimum wage of $7.25 per hour — itself a far cry from a living wage.

Sama claims that it pays a higher annotator rate than its competitors — about $8 a day — with the mission of providing opportunities to communities in underserved regions. In a three-year randomized trial conducted by MIT and Innovations for Poverty Action, crowdworkers in Nairobi, Kenya who received both training and inclusion in Sama’s hiring pool had lower unemployment rates and higher average monthly earnings in comparison to crowdworkers who only received training.


The study didn’t compare the outcomes of Sama’s crowdworkers with those employed with other data labeling startups. But Gonzalez says that the results “point to the indisputable facts” and “demonstrate the value of [Sama’s] impact-model on communities globally.”

Sama — which employs 120 full-time workers and 3,500 annotators — has customers in Google, Nvidia, GM, Walmart, Getty, and over 25% of the Fortune 50. Its crowdworkers annotated 1.5 billion data points in 2020 alone, and with the latest funding round, Sama’s total capital raised stands at nearly $85 million.

“Our customers include Fortune 2000 companies,” Gonzalez said. “Notably, Sama’s … training data was recently tapped by Google to power its AI algorithm for Project Guideline, which helps those with visual impairments run independently. With our high-quality, accurate training data, the application is able to accurately approximate the runner’s position and provide audio feedback so the runner can self-correct. Now, we’re working to scale Project Guideline with a goal of making the solution an accessible option for the blind [and] visually impaired community.”


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Tech News

The Deeper Connect Nano could replace your VPN and offer even greater peace of mind

TLDR: The Deeper Connect Nano serves as your own private VPN device, securing web traffic across your entire network safely under your surveillance at a one-time-only charge.

Over 31 percent of all internet users worldwide use a VPN for either professional or personal reasons. But how many of those millions of users actually know and trust their VPN provider? 

Do you know your VPN’s country of origin? Do you know how and where their servers are maintained? Do you know their logging policies? And most importantly, do you know that they won’t sell your information to the highest bidder? While most VPN services are fair and reputable, it’s often hard to distinguish between those and their less savory fly-by-night brothers.

Of course, one way to enjoy the benefits of a VPN without actually enlisting and relying on a VPN service is with a device like the new Deeper Connect Nano Decentralized VPN Cybersecurity Hardware. Right now, it’s $299 from TNW Deals.

Rather than falling back blindly on a VPN provider to facilitate your access to the web and protect your connection, the Deeper Connect Nano puts those matters in your own hands.

As a decentralized private network, the Nano uses the same blockchaining technology that drive cryptocurrency creation to help users essentially create their own personal access tunnel to the web, a fully secure connection that cloaks a user’s IP like a VPN while also serving as its own client and server. 

You just connect the Nano to your internet router, run some quick configurations, and you’re immediately as well protected as a VPN user, scouring the web while a 7-layer firewall watches your back, blocking ads and trackers while monitoring web traffic across your entire network. 

Unlike a VPN, which almost always only protects a handful of devices, you can set up the Nano to protect virtually every web-enable device on your entire network, ranging from computer, laptops, and mobile devices all the way to your smart thermostat, smart appliances and all the other web-connected items in your home.

Also unlike a VPN, you won’t be paying a monthly service charge to use the Nano either. Your one-time purchase price grants Nano owners complete protection forever. Right now, you can start securing your home network on your own with a Deeper Connect Nano Decentralized VPN for just $299.

Prices are subject to change.

Repost: Original Source and Author Link