Designing the optimal iteration loop for AI data (VB Live)

Presented by Labelbox

Looking for practical insights on improving your training data pipeline and getting machine learning models to production-level performance fast? Join industry leaders for an in-depth discussion on how to best structure your training data pipeline and create the optimal iteration loop for production AI in this VB Live event.

 Register here for free.

Companies with the best training data produce the best performing models. AI industry leaders like Andrew Ng have recently emerged as major proponents of data-centric machine learning for enterprises, which requires creating and maintaining high-quality training data. Unfortunately, the tremendous effort it takes to gather, label, and prep that training data often overwhelms teams (when the task is not outsourced) and can compromise both the quality and quantity of training data.

Just as importantly, model performance can only improve at the speed at which your training data improves, so fast iteration cycles for training data is crucial. Iteration helps ML teams find new edge cases and improve performance. Additionally, iteration helps to refine and course correct data throughout the AI development lifecycle to maintain its reflection of real-world conditions. Shrinking the length of that iteration cycle lets you hone your data and conduct a greater number of experiments, accelerating the path to production AI systems.

It’s clear that iterating on training data is vital to building performant models quickly — so how can ML teams create the optimal workflow for this data-first approach?

Overcoming the challenges of a data-first approach

A data-first approach to machine learning involves some unique challenges, including management, analysis, and labeling.

Because machine learning requires a great deal of iteration and experimentation, companies often find themselves with a management system that’s a patchwork of models and results, stored haphazardly. Without a centralized spot for data storage and standard, reliable tools for exploration, results become difficult to track and reproduce, and finding patterns in the data becomes a challenge.

That means teams are often overwhelmed when digging out the insights they need from their data. Of course, large quantities of data is technically the way to solve business problems. But unless teams can streamline the data labeling process by labeling only the data that has true value, the process will quickly become unmanageable.

Using data to build a competitive advantage

Building an AI data engine is a series of iteration loops, with each loop making the model better. As companies with the best training data generally produce the most performant models, these companies will attract more customers who will generate even more data. It continuously imports model outputs as pre-labeled data, ensuring that each cycle is shorter than the last for labelers. That data is used to improve the next iteration of training and deployment, again and again. This ongoing loop keeps your models up to date, boosts their efficiency, and strengthens your AI.

Building this often required a great deal of hands-on labeling from subject matter experts — medical doctors identifying images of tumors; office workers labeling receipts; and so on. Automation dramatically speeds up the process, sending labeled data to humans to check and correct, eliminating the need to start from scratch.

A robust data engine needs only the smallest set of data to label to improve model performance, automatically labeling a sample of data for the model to work with, and only requiring verification from humans in some instances.

Putting it all together to improve model performance

Speeding up your data-centric iteration process takes just a few steps.

The first is to bring all your data to a single place, enabling your teams to access the training data, metadata, previous annotations, and model predictions quickly at any time, and iterate faster. Once your data is accessible within your training data platform, you can annotate a small dataset to get your model going.

Then, evaluate your baseline model. Measure your performance early, and measure it often. One or more baseline models can speed up your ability to pivot, as its performance develops. To create a solid foundation, your team should focus on identifying any errors early on and iterating, rather than optimizing.

Next, curate your data set according to your model diagnosis. Rather than bulk-labeling a massive amount of data, which takes time, energy, and money, create a small, carefully selected set of data to build on the baseline version of your model. Choose the assets that will best improve model performance, taking into account any edge cases and trends you found during model evaluation and diagnosis.

Finally, annotate your small dataset, and keep the iterative process going by assessing your progress and correcting for any errors like data distribution, concept clarity, class frequency errors, and outlier errors.

Training data platforms (TDP) are purpose-built for just this advantage, helping combine data, people, and processes into one seamless experience, and enabling ML teams to produce performant models quicker and more efficiently.

To learn more about boosting the performance of your model, reducing labeling costs, eliminating errors, solving for outliers and more, don’t miss this VB Live event!

Register here for free.

Attendees will learn how to:

  • Visualize model errors and better understand where performance is weak so you can more effectively guide training data efforts
  • Identify trends in model performance and quickly find edge cases in your data
  • Reduce costs by prioritizing data labeling efforts that will most dramatically improve model performance
  • Improve collaboration between domain experts, data scientists, and labelers


  • Matthew McAuley, Senior Data Scientist, Allstate
  • Manu Sharma, CEO & Cofounder, Labelbox
  • Kyle Wiggers (moderator), AI Staff Writer, VentureBeat

Repost: Original Source and Author Link


Optimal Dynamics nabs $18.4M for AI-powered freight logistics

Join Transform 2021 this July 12-16. Register for the AI event of the year.

Optimal Dynamics, a New York-based startup applying AI to shipping logistics, today announced it has closed an $18.4 million round led by Bessemer Venture Partners. Optimal Dynamics says the funds will be used to more than triple its 25-person team and support engineering efforts, as well as bolstering sales and marketing departments.

Last-mile delivery logistics tend to be the most expensive and time-consuming part of the shipping process. According to one estimate, last-mile costs account for 53% of total shipping costs and 41% of total supply chain costs. With the rise of ecommerce in the U.S., retail providers are increasingly focusing on fulfilment and distribution at the lowest cost. Particularly in the construction industry, the pandemic continues to disrupt wholesalers — a 2020 Statista survey found that 73% of buyers and users of freight transportation and logistics services experienced an impact on their operations.

Founded in 2016, Optimal Dynamics offers a platform that taps AI to generate shipment plans likely to be profitable — and on time. The fruit of nearly 40 years of R&D at Princeton University, the company’s product generates simulations for freight transportation, enabling logistics companies to answer questions about what equipment they should buy, how many drivers they need, daily dispatching, load acceptance, and more.

Simulating logistics

Roughly 80% of all cargo in the U.S. is transported by the 7.1 million people who drive flatbed trailers, dry vans, and other heavy lifters for the country’s 1.3 million trucking companies. The trucking industry generates $726 billion in revenue annually and is forecast to grow 75% by 2026. Even before the pandemic, last-mile delivery was fast becoming the most profitable part of the supply chain, with research firm Capgemini pegging its share of the pie at 41%.

Optimal Dynamics’ platform can perform strategic, tactical, and real-time freight planning, forecasting shipment events as far as two weeks in advance. CEO Daniel Powell — who cofounded the company with his father, Warren Powell, a professor of operations research and financial engineering — says the underlying technology was deployed, tested, and iterated with trucking companies, railroads, and energy companies, along with projects in health, ecommerce, finance, and materials science.

“Use of something called ‘high-dimensional AI’ allows us to take in exponentially greater detail while planning under uncertainty. We also leverage clever methods that allow us to deploy robust AI systems even when we have very little training data, a common issue in the logistics industry,” Daniel Powell told VentureBeat via email. “The results are … a dramatic increase in companies’ abilities to plan into the future.”

The global logistics market was worth $10.32 billion in 2017 and is estimated to grow to $12.68 billion by 2023, according to Research and Markets. Optimal Dynamics competes with Uber, which offers a logistics service called Uber Freight. San Francisco-based startup KeepTruckin recently secured $149 million to further develop its shipment marketplace, while Next Trucking closed a $97 million investment. And Convoy raised $400 million at a $2.75 billion valuation to make freight trucking more efficient.

But Optimal Dynamics investor Mike Droesch, a partner at BVP, says demand for the company’s products remains strong. “Logistics operators need to consider a staggering number of variables, making this an ideal application for a software-as-a-service product that can help operators make more informed decisions by leveraging Optimal Dynamics’ industry-leading technology. We were really impressed with the combination of their deep technology and the commercial impact that Optimal Dynamics is already delivering to their customers,” he said in a statement.

Including this latest series A, Optimal Dynamics has raised over $22.4 million. Fusion Fund, the Westly Group, TenOneTen Ventures, Embark Ventures, FitzGate Ventures, and John Larkin, and John Hess also contributed to the round.

Updated on May 14 at 11:02 a.m. Pacific: This article has been updated to reflect that the funding round totaled $18.4, not ~$22 million as originally reported. We regret the error.


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link