Deep Dive: How synthetic data can enhance AR/VR and the metaverse

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

The metaverse has captivated our collective imagination. The exponential development in internet-connected devices and virtual content is preparing the metaverse for general acceptance, requiring businesses to go beyond traditional approaches to create metaverse content. However, next-generation technologies such as the metaverse, which employs artificial intelligence (AI) and machine learning (ML), rely on enormous datasets to function effectively. 

This reliance on large datasets brings new challenges. Technology users have become more conscious of how their sensitive personal data is acquired, stored and used, resulting in regulations designed to prevent organizations from using personal data without explicit permission

Without large amounts of accurate data, it’s impossible to train or develop AI/ML models, which severely limits metaverse development. As this quandary becomes more pressing, synthetic data is gaining traction as a solution.

In fact, According to Gartner, by 2024, 60% of the data required to create AI and analytics projects will be generated synthetically. 

Machine learning algorithms generate synthetic data by ingesting real data to train on behavioral patterns and generate simulated fake data that retains the statistical properties of the original dataset. Such data can replicate real-world circumstances and, unlike standard anonymized datasets, it’s not vulnerable to the same flaws as real data.

Reimagining digital worlds with synthetic data 

As AR/VR and metaverse developments progress towards more accurate digital environments, they now require new capabilities for humans to interact seamlessly with the digital world. This includes the ability to interact with virtual objects, on-device rendering optimization using accurate eye gaze estimation, realistic user avatar representation and the creation of a solid 3D digital overlay on top of the actual environment. ML models learn 3D objects such as meshes, morphable models, surface normals from photographs and obtaining such visual data to train these AI models is challenging.

Training a 3D model requires a large quantity of face and full body data, including precise 3D annotation. The model also must be taught to  perform tasks such as hand pose and mesh estimation, body pose estimation, gaze analysis, 3D environment reconstruction and codec avatar synthesis. 

“The metaverse will be powered by new and powerful computer vision machine learning models that can understand the 3D space around a user, capture motion accurately, understand gestures and interactions, and translate emotion, speech, and facial details to photorealistic avatars,” Yashar Behzadi, CEO and founder of Synthesis AI, told VentureBeat.  

 “To build these, foundational models will require large amounts of data with rich 3D labels,” Behzadi said.  

An example of rendering gesture estimation for digital avatars. Source: Synthesis AI

For  these reasons, the metaverse is experiencing a paradigm shift — moving away from modeling and toward a data-centric approach to development. Rather than making incremental improvements to an algorithm or model, researchers can optimize a metaverse’s AI model performance much more effectively by improving the quality of the training data.

“Conventional approaches to building computer vision rely on human annotators who can not provide the required labels. However, synthetic data or computer-generated data that mimics reality has proven a promising new approach,” said Behzadi. 

Using synthetic data, companies can generate customizable data that can make projects run more efficiently as it can be easily distributed between creative teams without worrying about complying with privacy laws. This provides greater autonomy, enabling developers to be more efficient and focus on revenue-driving tasks. 

Behzadi says he believes coupling cinematic visual effects technologies with generative AI models will allow synthetic data technologies to provide vast amounts of diverse and perfectly labeled data to power the metaverse.

To enhance user experience, hardware devices used to step into the metaverse play an equally important role. However, hardware has to be supported by software that makes the transition between the real and virtual worlds seamless, and this would be impossible without computer vision. 

To function properly, AR/VR hardware  needs to understand its position in the real world to augment users with a detailed and accurate 3D map of the virtual environment. Therefore, gaze estimation( i.e., finding out where a person is looking by the picture of their face and eyes), is a crucial problem for current AR and VR devices. In particular, VR depends heavily on foveated rendering, a technique in which the image in the center of a field of view is produced in high resolution and excellent detail, but the image on the periphery deteriorates progressively.

Eye-gaze estimation and tracking architecture for VR devices deploys foveated rendering. That is, the image in the center of a field of view is produced in high resolution but the image on the periphery deteriorates progressively for more efficient performance. Source: Synthesis AI

According to Richard Kerris, vice president of the Omniverse development platform at NVIDIA, synthetic data generation can act as a remedy for such cases, as it can provide visually accurate examples of use cases when interacting with objects or constructing environments for training. 

“Synthetic data generated with simulation expedites AR/VR application development by providing continuous development integration and testing workflows,” Kerris told VentureBeat. “Furthermore, when created from the digital twin of the actual world, such data can help train AIs for various near-field sensors that are invisible to human eyes, in addition to improving the tracking accuracies of location sensors.”

When entering virtual reality, one needs to be represented by an avatar for an immersive virtual social experience. Future metaverse environments would need photorealistic virtual avatars that represent real people and can capture their poses. However, constructing such an avatar is a tricky computer vision problem, which is now being addressed by the use of synthetic data. 

Kerries explained that the biggest challenges for virtual avatars is how highly personalized they are. This generation of users want a diverse variety of avatars with high fidelity, along with accessories like clothes and hairstyles, and related emotions, without compromising privacy. 

“Procedural generation of diverse digital human characters at a large scale can create endlessly different human poses and animate characters for specific use cases. Procedural generation by using synthetic data helps address these many styles of avatars,”Kerries said. 

Identifying objects with computer vision

For estimating the position of 3D objects and their material properties in digital worlds such as the metaverse, light must interact with the object and its environment to generate an effect similar to the real world. Therefore, AI-based computer vision models for the metaverse require understanding the object’s surfaces to render them accurately within the 3D environment.

According to Swapnil Srivastava, global head of data and analytics at Evalueserve, by using synthetic data, AI models could predict and make more realistic tracking based on body types, lighting/illumination, backgrounds and environments among others.

“Metaverse/omniverse or similar ecosystems will depend highly on photorealistic expressive and behavioral humans, now achievable with synthetic data. It is humanly impossible to annotate 2D and 3D images at a pixel-perfect scale. With synthetic data, this technological and physical barrier is bridged, allowing for accurate annotation, diversity, and customization while ensuring realism,” Srivastava told VentureBeat. 

Gesture recognition is another primary mechanism for interacting with virtual worlds. However, building models for accurate hand tracking is intricate, given the complexity of the hands and the need for 3D positional tracking. Further complicating the task is the need to capture data that accurately represents the diversity of users, from skin tone to the presence of rings, watches, shirt sleeves and more. 

Behzadi says that the industry is now using  synthetic data to train hand-tracking systems to overcome such challenges.

“By leveraging 3D parametric hand models, companies can create vast amounts of accurately 3D labeled data across demographics, confounds, camera viewpoints and environments,” Behzadi said. 

“Data can then be produced across environments and camera positions/types for unprecedented diversity since the data generated has no underlying privacy concerns. This level of detail is orders of magnitude greater than what can be provided by humans and is enabling a greater level of realism to power the metaverse,” he added.

Srivastava said that compared to the current process, the metaverse will collect more personal data like facial features, body gestures, health, financial, social preference, and biometrics, among many others. 

“Protecting these personal data points should be the highest priority. Organizations need effective data governance and security policies, as well as a consent governance process. Ensuring ethics in AI would be very important to scaling effectiveness in the metaverse while creating responsible data for training, storing, and deploying models in production,” he said. 

Similarly, Behzadi said that synthetic data technologies will allow building more inclusive models in  privacy-compliant and ethical ways. However, because  the concept is new, broad adoption will require education. 

“The metaverse is a broad and evolving term, but I think we can expect new and deeply immersive experiences — whether it’s for social interactions, reimaging consumer and shopping experiences, new types of media, or applications we have yet to imagine. New initiatives like are a step in the right direction to help build a community of researchers and industrial partners to advance the technology,” said Behzadi. 

Creating simulation-ready data sets is challenging for companies wanting to use synthetic data generation to build and operate virtual worlds in the metaverse. Kerris says that off-the-shelf 3D assets aren’t enough to implement accurate training paradigms. 

“These data sets must have the information and characteristics that make them useful. For example, weight, friction and other factors must be included in the asset for them to be useful in training,” Kerris said. “We can expect an increased set of sim-ready libraries from companies, which will help accelerate the use cases for synthetic data generation in metaverse applications, for industrial use cases like robotics and digital twins.”

GamesBeat’s creed when covering the game industry is “where passion meets business.” What does this mean? We want to tell you how the news matters to you — not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it. Discover our Briefings.

Repost: Original Source and Author Link


Intel’s confidential computing solution for protecting cloud data is tested in healthcare

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

Ensuring the integrity of software isn’t easy. At one level or another, you have to place trust that a third party implements the necessary security controls to protect your data. Or do you?

Today, at Intel Innovation, Intel announced that health provider, Leidos, and professional services company, Accenture, are beginning to implement Project Amber, the organization’s verification service for cloud-to-edge and on-premises trust assurance. 

Project Amber provides enterprises with a solution to independently verify the trustworthiness of computing assets throughout their environment.

Essentially, it provides enterprises with a solution they can use to help verify the integrity of the software supply chain to ensure that they aren’t using any computing assets or services that leave data exposed.


MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

Restoring faith in the software supply chain

The release of Project Amber comes as more and more organizations are struggling to place trust in the security of third-party software vendors. Currently, only 37% of IT professionals feel very confident in the security of the supply chain. 

While there are many reasons for this lack in confidence, a spate of supply chain attacks, starting with the SolarWinds breach in 2020, have highlighted that organizations can face serious exposure to risk if third-party vendors fail to secure their environments against threat actors. 

One of the key technologies that has the potential to address supply chain security is confidential computing. Confidential computing has the potential to mitigate supply chain risks by encrypting data-in-use so that it’s not accessible to unauthorized third parties processing or transmitting the data. 

“With the introduction of Project Amber at Intel Vision in May ’22, Intel is taking confidential computing to the next level in our commitment to a zero-trust approach to attestation and the verification of computing assets at the network, edge and in the cloud,” said Intel senior vice president, chief technology officer, and general manager of the software and advanced technology group (SATG), Greg Lavender. 

Intel essentially combines zero-trust attestation with confidential computing to help enterprises verify the security of third-party cloud services and software.

How Leidos and Accenture are using Project Amber 

At this stage, Leidos has a new Project Amber proof of concept that offers the potential to support its QTC Mobile Medical Clinics, where vans perform in-field medical exams and health information processing for U.S. veterans in rural areas.

In this instance, Intel’s solution provides additional security protections for internet of things (IoT) and medical internet of things (MIoT) devices that sit beyond the network’s edge. 

In another part of healthcare, Accenture is integrating Project Amber into an artificial intelligence (AI)-based framework for protecting data. As part of this proof of concept, healthcare institutions can share data securely to build a central AI model trained to detect and prevent diseases.

With the AI models needing to be trained on data taken from multiple hospitals and then aggregated in a single location, Project Amber enables Accenture to run machine learning (ML) workloads across multiple cloud service providers within a secure trusted execution environment (TEE).

This TEE prevents sensitive information from exposure to unauthorized third parties and verifies the trustworthiness of computing assets including TEEs, devices, policies and roots of trust. 

An overview of confidential computing approaches 

Confidential computing services are picking up momentum due to their ability to prevent unauthorized users from viewing or interacting with the underlying code at rest and in use. According to Everest Group, the confidential computing market has the potential to grow to $54 billion by 2026, as organization’s need for data privacy grows. 

Of course, Intel isn’t the only provider experimenting with confidential computing. 

Fortanix helped to pioneer this technology and offers a Confidential Computing Manager that can run applications in TEEs, while offering other security controls such as identity verification, data access control and code attestation. Fortanix also announced raising $90 million in series C funding earlier this year. 

Other providers like Google Cloud are also experimenting with confidential computing to encrypt data-in-use for confidential VMs and confidential GKE nodes to bolster the security of a wider cloud environment. Earlier this year, Google Cloud surpassed $6 billion in revenue during the second quarter of 2022. 

However, what makes Intel’s approach unique is that most TEE’s are self-attested by individual cloud service providers and software vendors. In effect, a provider verifies that their own infrastructure is secure. This means enterprises have to trust that a vendor accurately verifies the security of their own systems. Instead, Intel acts as an impartial third party who can testify that another vendor’s or cloud service provider’s workload or TEE is secure for an organization to use. 

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Repost: Original Source and Author Link


LinkedIn’s Al leader shares 3 traits of top data science talent

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

In a new interview with VentureBeat, Ya Xu, VP of engineering and head of data and artificial intelligence (AI) at LinkedIn, is more than happy to share her thoughts on everything from her passion for bringing science and engineering together to the top traits she looks for when interviewing data science talent.

She has far less to say about a New York Times article from last weekend that focused on a study published in Science that “analyzed data from multiple large-scale randomized experiments on LinkedIn’s People You May Know algorithm, which recommends new connections to LinkedIn members, to test the extent to which weak ties increased job mobility in the world’s largest professional social network.” The Times’ said that LinkedIn ran “experiments” on more than 20 million users over five years that, “while intended to improve how the platform worked for members, could have affected some people’s livelihoods.”

According to Xu, who leads LinkedIn’s centralized data team that includes all AI, data science and privacy engineering teams, the study involved “no experimenting.” Instead, she told VentureBeat the research “was entirely based on observational causal study – this means we used cutting-edge social science methods (the same ones that won the 2021 Nobel Prize in Economics) to analyze historical data and discover causal patterns.”

A bridge between research and product

Xu said she thinks a great deal about the ethical implications of LinkedIn research, especially when it comes to using new algorithms and machine learning architecture like GPT and Transformers. At the same time, AI is core to LinkedIn products, as it is for so many of today’s businesses — so she explained that her philosophy is that research and product groups have to work hand-in-hand to meet the needs of the company’s three different customer ecosystems — job seekers and hiring companies; B2B buyers and sellers; and knowledge seekers/producers.


MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

“True magic really comes when we can create a very tight connection and bridge between the research and the practical applications,” she said.

That starts with the organizational structure, with researchers and engineers working together.

“The problem itself should inform the research agenda, but at the same time the production constraints should actually inspire the research itself,” she explained. “For example, if you don’t have any scalability constraints, you can come up with the most complicated algorithm, but if have to fit everything within this memory, you have to use this kind of computational constraint, you have these latency constraints, all of a sudden you actually inspire and motivate the research to be done in a different way.”

3 top traits of LinkedIn data science talent

That collaborative culture requires the right data science talent — Xu said there are three important things that she looks for in candidates. First, is the individual mission-driven and impact-driven?

“They want to achieve something in the end,” she explained. “They may have a different approaches to achieving it…but ultimately they want to do right by members and customers.”

Next, Xu wants to hire people who are — not surprisingly — collaborative. They should be those “who really care for each other, who really respect people who are coming with different skill sets,” she said. “You don’t want to hire individuals who are like, ‘hey, I’m the smartest and the best and the brightest and no one else is right.’”

Finally, Xu said she wants people who are willing to learn, adapt and stay curious. “Nobody can come into this field and be like, ‘I know everything,’” she said. “I mean, I had my Ph.D. in machine learning statistics 10 years ago, and if I compare what I did to what is [going on] today, oh my gosh, it’s night and day,” she said.

LinkedIn’s AI and data challenges

LinkedIn’s three ecosystems create AI and data challenges, said Xu, because their heterogeneity makes it hard to define a “true north” value. “AI works the best if you can say ‘This is the objective function’ and optimize towards that,” she said.

That means there needs to be a multi-objective optimization framework for AI, complicated further by the fact that there are so many different personas involved. “It’s another challenging thing to understand what their needs are and how to balance those different needs,” she said.

Finally, from a technical standpoint, each of those personas comes with various problems at different scales: “We have a lot more posts on LinkedIn than we have on learning courses, for example,” she said. “And they come with different latency requirements — you have to return ads within milliseconds, but you have a lot more flexibility when it comes to, maybe, a search returned from our Sales Navigator, or recommendations by email.”

AI opportunities and responsible AI

The latest AI advancements, such as large language models including GPT-3, offer opportunities for LinkedIn to tie its marketplaces together with common technology that can be used across the board, said Xu.

“Whether it’s a feed post, a job description, or a member’s profile, we can understand that text a lot better, and we can then map to topics that a post is about or maybe job skills and then connect that back to what this member is looking for,” she said, adding that advances in algorithms, hardware and software will be a key focus overall in advancing LinkedIn’s AI and data ambitions.

She added that better technology methods also now exist to better measure AI fairness in LinkedIn’s feed recommendations or connection recommendations.

However, fairness is just one area LinkedIn is investing in when it comes to responsible AI using Microsoft’s Responsible AI framework.

“In the fairness area, we are continuously pushing for both measurement and mitigation — how can we understand how our algorithm is doing relative to what it’s intended to do?” she said. “And then mitigation is, if we identify areas that there are gaps, what are the approaches that we can do in order to mitigate it?”

Transparency is another focus area. about explaining what algorithms are doing, she said: “Can the modelers who are building these algorithms explain them to the developers? Can we explain it then to the users who are interacting with algorithms?”

It’s a “very challenging” space, she admits: “But it’s really, really exciting from a technology standpoint.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Repost: Original Source and Author Link


How open-source data labeling technology can mitigate bias

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

Data labeling is one of the most fundamental aspects of machine learning. It is also often an area where organizations struggle – both to accurately categorize data and reduce potential bias.

With data labeling technology, a dataset used to train a machine learning model is first analyzed and given a label that provides a category and a definition of what the data is actually about. While data labeling is a critical component of the machine learning process, recently it has also proven to be highly inconsistent, according to multiple studies. The need for accurate data labeling has fuelled a bustling marketplace of data labeling vendors.

Among the most popular data labeling technologies is the open-source Label Studio, which is backed by San Francisco-based startup Heartex. The new Label Studio 1.6 update being released today will provide users with new features to help better analyze and label data inside of videos.

According to Michael Malyuk, cofounder and CEO of Heartex, the challenge for most companies with artificial intelligence (AI) is having good data to work with.


MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

“We think about labeling as a broader category of dataset developments and Label Studio is a solution that ultimately enables you to do any sort of dataset development,” Malyuk said.

Defining data labeling categories is a challenge

While the 1.6 release of Label Studio has a video player capability as the primary new feature, Malyuk emphasized that the technology is useful for any type of data including text, audio, time series and video.

Among the biggest issues with any labeling approach for all types of data is actually defining the categories used for data labels.

“Some people can name things one way, some people can name things a different way, but they essentially mean the same thing,” Malyuk said.

He explained that Label Studio provides taxonomies for labels that users can choose from to describe a piece of data, be it a text, audio or image file. If two or more people in the same organization label the same data differently, the Label Studio system will identify the conflict so that it can be analyzed and remediated. Label Studio provides both a manual conflict resolution system and an automated approach.

Vector database vs. data labeling?

The process of data labeling can often involve manual work, with humans assigning a label or validating that a label is accurate.

There are a number of approaches to automating the process, startup Lightly AI is using a self-supervised machine learning model that can integrate with Label Studio. Then there are vendors that will use a vector database to convert data into math, rather than using data labeling to identify data and its relationships.

Malyuk said that vector databases do have their uses and can be effective for doing tasks such as similarity searches. The problem, in his view, is that the vector approach isn’t as effective with unstructured data types such as audio and video. He noted that a vector database can make use of identification types for common objects.

“As soon as you start deviating from that common knowledge to something that is a little bit different, it’s going to become very complicated without manual labeling,” Malyuk said.

How data labeling can identify and mitigate AI bias

Bias in AI is an ongoing challenge that many in the industry are trying to combat. At the root of machine learning is the actual data, and the way that data is labeled can potentially lead to bias as well. Bias can be intentional, and it can also be circumstantial.

“If you’re labeling a very subjective dataset in the morning before coffee and then again after coffee, you may get very different answers,” Malyuk said.

While it’s not always possible to make sure that data labeling processes are only executed by those that are fully caffeinated, there are processes that can help. Malyuk said what Label Studio does on the software side is it provides a way to build a process so that everyone contributes individually. The system identifies and builds all the matrices where it matches people with each other and how they label the same items. It’s an approach that Malyuk said can potentially identify bias for a specific label.

The open-source Label Studio technology is intended to be used by individuals and small groups, while the commercial project provides enterprise features for larger teams around security, collaboration and scalability.

“With open source, we focus on the user and we are trying to make the individual user’s life as easy as possible from a labeling perspective,” Malyuk said. “With the enterprise, we focus on the organization and whatever the business needs, there are.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Repost: Original Source and Author Link


How enterprises can realize the full potential of their data for AI (VB On Demand)

Presented by Wizeline

Many enterprises are facing barriers to leveraging their data, and making AI a company-wide reality. In this VB On-Demand event, industry experts dig into how enterprises can unlock all the potential of data to tackle complex business problems and more.

Watch on demand now!

Across industries and regions, realizing the promise of AI can mean very different things for every enterprise — but for every business, it starts with exploding the potential of the wealth of data they’re sitting on. But according to Hayde Martinez, data technology program lead at Wizeline, the obstacles to unlocking data have less to do with actually implementing AI, and more with the AI culture inside a company. That means companies are stalled at step zero — defining objectives and goals.

For a company just beginning to realize the benefits of data, AI efforts are usually an isolated undertaking, managed by an isolated team, with goals that aren’t aligned with the overall company vision. Larger companies further down the data and AI road also have to break down silos, so that all departments and teams are aligned and efforts aren’t duplicated or at cross purposes.

“In order to be aligned, you need to define that strategy, define priorities, define the needs of the business,” Martinez says. “Some of the biggest obstacles right now are just being sure of what you’re going to do and how you’re going to do it, rather than the implementation itself, as well as bringing everyone on board with AI efforts.”

The steps in the data process

Data has to go through a number of steps in order to be leveraged: data extraction, cleansing, data processing, creating predictive models, creating new experiments and then finally, creating data visualization. But step zero is still always defining the goals and objectives, which is what drives the whole process.

One of the first considerations is to start with a discovery workshop — soliciting input from all stakeholders that will use this information or are asking for predictive models, or anyone that has a weighted opinion on the business. To ensure that the project goes smoothly, don’t prioritize hard skills over soft skills. Stakeholders are often not data scientists or machine learning engineers; they might not even have a technical background.

“You have to be able, as a team or as an individual, to make others trust your data and your predictions,” she explains. “Even though your model was amazing and you used a state-of-the-art algorithm, if you’re not able to demonstrate that, your stakeholders will not see the benefit of the data, and that work can be thrown in the trash.”

Making sure that you clearly understand the objectives and goals is key here, as well as ongoing communication. Keep stakeholders in the loop and go back to them to reaffirm your direction, and ask questions to continue to adjust and refine. That helps ensure that when you deliver your predictive model or your AI promise, it will be strongly aligned to what they’re expecting.

Another consideration in the data process is iteration, trying new things and building from there, or taking a new tack if something doesn’t work, but never taking too long to decide what you’ll do next.

“It’s called data science because it’s a science, and follows the scientific method,” Martinez says. “The scientific method is building hypotheses and proving them. If your hypothesis was not proven, then try another way to prove it. If then that’s not possible, then create another hypothesis. Just iterate.”

Common step zero mistakes

Often companies stepping into AI waters look first at similar companies to mimic their efforts, but that can actually slow down or even stop an AI project. Business problems are as unique as fingerprints, and there are myriad ways to tackle any one issue with machine learning.

Another common issue is going immediately to hiring a data scientist with the expectation that it’s one and done — that they’ll be able to not only handle the entire process from extracting data and cleaning data to defining objectives, graphic visualization, predictive models, and so on, but can immediately jump into making AI happen. That’s just not realistic.

First a centralized data repository needs to be created to not only make it easier to build predictive models, but to also break down silos so that any team can access the data it needs.

Data scientists and data engineers also cannot work alone, separately from the rest of the company — the best way to take advantage of data is to be familiar with its business context, and the business itself.

“If you understand the business, then every decision, every change, every process, every modification of your data will be aligned,” she says. “This is a multidisciplinary work. You need to involve strong business understanding along with UI/UX, legal, ethics and other disciplines. The more diverse, the more multidisciplinary the team is, the better the predictive model can be.”

To learn more about how enterprises can fully leverage their data to launch AI with real ROI, how to choose the right tools for every step of the data process and more, don’t miss this VB On Demand event.

Start streaming now!


  • How enterprises are leveraging AI and machine learning, NLP, RPA and more
  • Defining and implementing an enterprise data strategy
  • Breaking down silos, assembling the right teams and increasing collaboration
  • Identifying data and AI efforts across the company
  • The implications of relying on legacy stacks and how to get buy-in for change


  • Paula Martinez, CEO and Co-Founder, Marvik
  • Hayde Martinez, Data Technology Program Lead, Wizeline
  • Victor Dey, Tech Editor, VentureBeat (moderator)

Repost: Original Source and Author Link


Why you need a data champion to score AI wins

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

Making artificial intelligence (AI) work with humans requires having an internal data champion to help overcome fears and create a safe environment. That was the advice offered at Digital Procurement World in Amsterdam today in a session focused on combining humans and AI to help procurement become a top value driver in modern business. The biggest pain points for panelists? Working through overwhelming amounts of data and dealing with employee concerns that AI will take their jobs.

“I need to be a champion for the digital journey,’’ said Ralf Peters, vice president of procurement, Europacific partners, at Coca-Cola. “Once you have that established, everything else will fall into place, because then you can figure out how many resources to have dedicated.” Having a digital champion allows you to build an AI strategy, Peters added.

It’s also important for leadership to show how people and AI can align. Then they can help employees overcome the fear of new technologies that will change the way they work, the panelists said.

How to take advantage of a data-challenged world

In a procurement context, the trick to dealing effectively with data is creating an environment that allows every procurement category manager and team to move from, “I need to use that to I want to use that” and see the benefits of that approach, Peters said. 


MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

Creating a safe, consistent environment will help category managers feel confident about using AI-driven data insights from their systems and providing feedback. “Every insight created by a category manager matters,’’ Peters said. “There’s a value for me and I can choose over time to use it and trust the tool because it helps me fulfill an overarching task.”

A safe environment means you don’t have to present feedback as the answer to a problem but, rather, as a potential solution, added moderator Omer Abdullah, cofounder of The Smart Cube, and author of “Risk & Your Supply Chain: Preparing For The Next Global Crisis.”

This gives people the opportunity to “play with tools and present something, and when a situation happens, you’re much more confident in how to solve it,’’ Abdullah said.  

Jurriaan Lombaers, senior vice president and chief procurement officer of Air France-KLM said that leaders must make changing their organization and growing their employees’ skills a priority. 

By default, Lombaers said, every buyer needs to be data-driven and focused on sustainability, since that is a key change occurring. This takes time, but he advised the audience to “find the people who love it” and “nurture them,’’ because part of having a safe environment is focusing on the champions and then the vast majority will follow.

Organizations also need to have data scientists, analysts and people who understand digital procurement build new services into the organization — something Air France-KLM is currently doing “as we speak,’’ Lombaers said.

How to transform people

Transforming people to be more data- and analytically driven also needs to be top of mind. 

A safe environment plays a key role here, too, but there’s also the challenge of transitioning from creating reports to creating reports with insights, said panelist Paula Martinez, chief procurement officer at Novartis. This requires figuring out the right questions to ask the data and knowing the business insights you want to achieve. She advised the audience to interact with stakeholders and find out what they want to solve. 

“Data can be cleaned. It’s coming up with questions that is the biggest struggle for organizations,’’ Martinez said. When analytic capabilities are built, the procurement organization should “continue using that new muscle,’’ she said, adding that this is a learning curve everyone has to go through. It’s good to do a lot of experimentation around what is working and what isn’t, she noted.

Abdullah called this an “interesting foundational point,’’ saying that it‘s one thing to understand the potential of what technology can unleash, “but it’s a little scary because it exposes how people need to change their thinking and be more strategic and less tactical.” Internal stakeholders need to know how to translate that, he said.

In response to a question about how organizations should inventory people’s skill sets to determine who is ready for transformation, Lombaers said, “you learn by doing. Get on with it and allow for failures.”

Peters recalled years ago being part of the team responsible for implementing handheld devices for drivers. “I had to put technology into the hands of guys who carry pallets of beverages from forklifts to trucks,” he said. But the process proved quite simple. Peters asked the drivers if they are able to use a mobile phone in their private life. Demystifying it took the bias out, he said. 

Evolving your organization with data champions

Becoming attuned to the digital world doesn’t require anyone to learn to code, Martinez stressed. What is required is developing a digital mindset and understanding and owning your data, she said. That means partnering with a data analytics team to clean it and make it work for you by thinking of value cases and questions. 

“Data is an asset, that’s the fuel you have to thrive in your job,’’ Martinez said. “Understand the basics and own it and understand … what is the data subset being used” as well as the timeframe. “All those things are enabling you to have a digital mindset, which will enable you to change your attitude and behaviors.”

She added that people must be open to change. “Every industry, every function, every role is affected by digital. If you don’t have a basic understanding around what is an algorithm and new solutions and the intelligence embedded there … then you won’t be able to thrive.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Repost: Original Source and Author Link


Your data may be in danger if you use a spellchecker

If you like to be thorough and use an advanced spellchecker, we have some bad news — your personal information could be in danger.

Using the extended spellcheck in Google Chrome and Microsoft Edge transmits everything you input in order for it to be checked. Unfortunately, this includes information that should be strictly encrypted, such as passwords.

Chrome & Edge Enhanced Spellcheck Features Expose PII, Even Your Passwords

This issue, first reported by JavaScript security firm otto-js, was discovered accidentally while the company was testing its script behaviors detection. Josh Summitt, co-founder and CTO of otto-js, explains that pretty much everything you enter in form fields with advanced spellchecker enabled is later transmitted to Google and Microsoft.

“If you click on ‘show password,’ the enhanced spellcheck even sends your password, essentially spell-jacking your data,” said otto-js in its report. “Some of the largest websites in the world have exposure to sending Google and Microsoft sensitive user PII [personally identifiable information], including username, email, and passwords, when users are logging in or filling out forms. An even more significant concern for companies is the exposure this presents to the company’s enterprise credentials to internal assets like databases and cloud infrastructure.”

Many people use “show password” in order to make sure they haven’t made a typo, so potentially, a lot of passwords could be at risk here. Bleeping Computer tested this further and found that entering your username and password on CNN and Facebook sent the data to Google, while, Bank of America, and Verizon only sent the usernames.

Both Microsoft Edge and Google Chrome come with built-in spellcheckers that are pretty basic. These tools don’t require any further verification — what you input stays within your browser. However, if you’re using Chrome’s Enhanced Spellcheck or Microsoft’s Editor Spelling & Grammar Checker, everything you type in the browser is then sent to Google and Microsoft respectively.

That, in itself, is not unexpected. When you enable the enhanced spellchecker in Chrome, the browser tells you that the “text that you type in the browser is sent to Google.” However, many people would expect that this excludes PII that is often submitted in forms.

The severity of this depends on the websites you visit. Some form data may include Social Security numbers and Social Insurance numbers, your full name, address, and payment information. Login credentials also fall under this category.

It’s understandable that your inputs are sent outside of the browser in order to utilize the improved spellchecker, but it’s hard not to question how secure this is when personal data also receives that same treatment.

How to stay safe

Andrew Brookes/Getty Images

If you’d rather not have your personal data transmitted to Microsoft and Google, you should stop using the advanced spellchecker for the time being. This means disabling the feature in your Chrome settings. Simply copy and paste this into your browser’s address bar: chrome://settings/?search=Enhanced+Spell+Check.

For Microsoft Edge, the advanced spellchecker comes in the form of a browser add-on, so simply right-click the icon of that extension in your browser and then tap on Remove from Microsoft Edge.

Google has ensured that it doesn’t attach any user identity to the data it processes for the spellchecker. However, it will work on excluding passwords from this entirely. Microsoft said it will investigate the problem, but didn’t follow up with Bleeping Computer beyond that just yet. Microsoft currently has another problem with Edge: hackers are using it to run a malvertising campaign.

Editors’ Choice

Repost: Original Source and Author Link


How to sort your data in Google Sheets

Google Sheets is a remarkably powerful and convenient tool for collecting and analyzing data, but sometimes it can be hard to understand what that raw data means. One of the best ways to see the big picture is to sort it to help bring the most important information to the top, showing which is the largest or smallest value relative to the rest.

It’s not surprising that Google Sheets has a powerful search feature. It’s also easy to sort data in Google Sheets, but a few concepts need to be clear to achieve the best result and glean the most valuable insights. With a few tips, you’ll quickly master sorting by one or more columns and be able to bring up different views for a better understanding of what the data means.

How to quickly sort a simple sheet

Sorting a Google sheet by a single column is quick and easy. For example, with a table of foods that are good sources of protein, you might want to sort by name, serving size, or amount of protein. Here’s how to do that.

Step 1: Move the mouse pointer over the column that you want to sort by and select the Downward arrow that appears to open a menu of options.

Step 2: A little over halfway down the menu are the sort options. Select Sort sheet A to Z to sort the entire sheet so the chosen text column is in alphabetical order. Sort sheet Z to A places that column in reverse alphabetical order. 

There are two text sort options.

Step 3: To sort a numerical column, follow the same procedure. Choose Sort sheet A to Z for a low-to-high order of numerical values or Sort sheet Z to A to see the highest values at the top.

The same sort options are available for numbers.

Step 4: No matter which column is sorted, the information from all other columns in the sheet is moved to keep the same order. In our example, a cup of walnuts continues to show as 30 grams, and a 2.5-ounce steak has 22 grams of protein.

Sorting a sheet by a column keeps row data aligned.

Step 5: If a numerical column shows values in more than one unit, the sort will fail to give the expected result. In our example, the Google Sheet about protein sources, the serving-size column includes cups, ounces, tablespoons, and slices. Units are ignored by the built-in sort feature conversion, so 1 cup will incorrectly be treated as if it’s smaller than 3 ounces. The best solution is to convert the data to use only one unit per column.

Google Sheets doesn't consider units when sorting.

Sorting a sheet with headers

The quick-sort method described above is convenient; however, it doesn’t work as expected when the sheet has headers. When you’re sorting the entire sheet, all rows are included by default, which can mix headings and units with the data. Google Sheets can lock cells so they can’t be changed, but it’s also possible to lock one or more rows in place at the top so headers don’t get confused with your information.

Step 1: To freeze the position of header rows, choose the Google Sheets View menu. Instead of using your browser’s view menu, open the View menu at the top of the web page that’s open. Then, select the Freeze option and choose 1 row or 2 rows, depending on the number of header rows in your sheet.

Freeze one or two rows to lock the header rows.

Step 2: If there are more headers, it’s possible to freeze more rows. Simply select a cell in the lowest header row, then choose Freeze, then Up to row X in the View* menu, where X is the row number.

Google Sheets can freeze more header rows.

Step 3: After freezing headers, a thick gray line appears to show where the split occurs. A sheet can then be sorted using the column menu’s Sort sheet A to Z or Sort sheet Z to A option. This leaves the headers in place at the top of your Google sheet while rearranging your data into a more useful table.

The column menu can sort while keeping frozen headers in place.

Sort by more than one column

A single-column sort is quick and handy, but often, there’s more than one variable to consider when comparing figures. In our example, you might be most interested in which has the least fat but also want to get more protein. That’s easy to do with a sort range.

Step 1: Select the cells that you want to sort, including one header row. This can be done quickly by choosing the top-left cell, then holding down the Control key (Command key on a Mac) and pressing the Right arrow to select the full width of the table.

Use Control+Right Arrow to select a horizontal range

Step 2: The same can be done by selecting the full height, holding Control, and pressing the Down arrow. Now the entire table of data will be selected.

Use Control+Right Arrow then Control+Down Arrow to select a range.

Step 3: From the Google Sheets Data menu, choose Sort range > Advanced range sorting options.

Google Sheets advanced sorting is in the Data menu

Step 4: A window will open that lets you pick multiple sort columns. If your sort range includes a header row, check the box beside Data has header row.

You can see the names of header rows if that box is checked

Step 5: The Sort by field will now show your header column names instead of making you choose columns by using their letter designation. Pick the primary sort column — for example, Fat (g) — and make sure A-Z is selected for sort order.

With header row checked, the names show up in the Sort by field.

Step 6: Choose the Add another sort column button to have a secondary sort, such as protein. It’s possible to add as many sort columns as you want before selecting the Sort button to see the results.

You can add more sort columns before sorting your data.

Save a data range

Instead of selecting a range every time you want to re-sort it, you can save the table as a named range.

Step 1: Select the range of data that you want to save for easier access, then choose Named ranges from the Google Sheets Data menu.

Named ranges can be selected faster.

Step 2: A panel will open on the right, and you can type a name for this range.

Type a name for the selected range.

Step 3: The Named ranges panel will remain open, and you can select the entire range again by choosing it from this panel.

Choose a named range to select all of its data.

Saving a sort view

It’s also possible to save more than one advanced sort operation for easy access in the future. This lets you switch between different views when making a presentation or when you need to analyze information from different angles. This is possible by creating a Google Sheets filter view.

Step 1: Select the range of data you want to sort, then choose Create new filter view from the Filters submenu in Google Sheets Data menu.

Create a filter view to save advanced sorts.

Step 2: A new bar will appear at the top of the sheet with filter view options. Choose the name field and type a name for this filter view, such as “Low Fat/High Protein.”

Filter views can be named so they are easy to remember

Step 3: The name of each header column will now include a sort menu on the right side. Using our example data, choose the Protein sort menu and then select Sort Z-A to place the highest values at the top.

Use the header row's sort menu

Step 4: Repeat this process on the Fat sort menu but choose Sort A-Z to show the lowest values first. This new filter view will show a sort with low fat being the top priority and high protein as a secondary consideration.

Multiple columns can be sorted with a filter view.

Step 5: You can create more filter views in the same way and use as many sort columns as needed. To load a filter view, open the Data menu, select Filter views, then select the view that you’d like to see.

Filter views can be loaded in the Data menu.

How to share a sorted Google Sheet

Google Sheets is readily available to anyone with an internet connection, making this a great tool for sharing information with others.

Step 1: Sorted spreadsheets can be shared by selecting the big green Share button in the upper-right corner.

The Share button makes it easy to collaborate with others.

Step 2: It’s also possible to download a Google Sheet to share over email or print it to send via regular mail.

You can also share a Google sheet via email.

Google Sheets provides several ways to sort data, and this can make in big difference when trying to analyze a complicated data set. For more information, check out our complete beginner’s guide that shows how to use Google Sheets.

We also have a guide that explains how to make graphs and charts in Google Sheets, a great way to see data in an easy-to-digest visual form.

Editors’ Choice

Repost: Original Source and Author Link


Transforming the supply chain with unified data management

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

Many organizations lack the technology and architecture required to automate decision-making and create intelligent responses across the supply chain, as has been shown by the past few years’ supply chain disruptions. However, these critical breakdowns can no longer be blamed solely on the COVID-19 pandemic. Rather, they can be blamed on businesses’ slow adoption of automated supply chain decision-making, which has resulted in inventory backlogs, price inflation, shortages and more. Further contributing to backlogs is continued single sourcing to one region rather than leveraging distributed regional capabilities. These factors have added to the complexity of systems and the disadvantages of lack of automation and the pandemic brought these existing critical breakdowns into stark relief.

This brings us to today and how this inability to effectively manage data streams is proving debilitating to many companies. In a Gartner study of more than 400 organizations, 84% of chief supply chain officers reported that they could service their customers better with data-driven insights. An equal number of respondents stated that they needed more accurate data in order to predict future conditions and make better decisions.

The challenge here is that companies are managing their supply chains with a series of disparate and disconnected tools and datasets. Instead of residing in a centralized location, critical information may be scattered across the supply chain, kept in functional siloes and tied to individual technology solutions and operating teams, limiting transparency and optimization. 

Ultimately, this impacts the overall results of supply chain digitalization. Human analysts, as well as advanced technology engines, may have trouble accessing data that is relevant, current and reliable. Data may be segregated across functions, resulting in a lack of end-to-end transparency. Lag times can significantly impact an organization’s ability to sense and respond immediately to disruptions or new information.


MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

End-to-end connectivity across the supply chain

The supply and demand disruptions in 2020 and 2021 clearly demonstrated the need for digital transformation and end-to-end visibility and orchestration. And the availability of new digital capabilities like artificial intelligence (AI), machine learning (ML), data science and advanced analytics has been nothing short of a game-changer for connecting the world’s supply chains. To keep pace with manufacturers’ and retailers’ demand surges, supply chains must evolve to become real-time, adaptive ecosystems.

Whenever an exception or a disruptive event occurs anywhere in the ecosystem, it can be recognized and addressed autonomously in a synchronized and collaborative manner. No matter how geographically distributed the value network is and how many suppliers it includes, today even the most complex global supply chain can be digitally connected via intelligent solutions in near real-time. 

The advanced technology that enables near real-time monitoring and communication depends on data for its success. Across the value chain, each supplier is digitally contributing information regarding costs, timing, inventory levels, availability and other key metrics — offering the opportunity for key partners to gain and offer feedback in real time, thus gaining key insights into the evolution of demand. 

But that is just the beginning. Today’s forecasting, business planning and execution optimization engines also depend on enormous volumes of third-party data — including news, weather and even social media — that impact end-to-end supply chain performance. Enabled by new, advanced capabilities such as AI, ML and predictive analytics, these new cognitive engines are incredibly powerful and accurate at translating huge amounts of raw data into strategic, actionable recommendations, often autonomously, allowing supply chain teams to shift focus from firefighting to strategic improvements.  

Leveraging partners to build a supply chain ecosystem 

Digital platforms can bring together these disparate data sources and functions to enable faster decisions and greater collaboration. Unified data management makes companies more agile and flexible in responding to changes. Through a best-of-breed network of partners and internal developers, companies can share data and ideas across teams, enabling real-time response and cognitive planning across stakeholders. However, to deliver a synchronized response across the global supply network, traditional walls will have to be overcome with advanced technology that supports real-time, end-to-end orchestration. 

Breaking down these traditional walls requires a partner- and developer-friendly platform, fully integrated across the network, to help democratize data access, streamline data management and encourage self-learning and continuous improvement. Through a digital command center, information can be shared across the supply chain to generate cognitive insights, identify disruptions and opportunities, and recommend strategic actions. These partnerships can transform data into a competitive edge by unifying the entire supply chain around a holistic, truly integrated technology ecosystem. 

And as data is aggregated and made accessible to every stakeholder, companies can make intelligent, strategic decisions based on a single set of real-time insights. The supply chain is a robust ecosystem fed by data, and it requires scalability, security, data integrity, real-time responsiveness and exceptional processing speeds. Think about the massive amounts of data from customers, partners and suppliers consumed by companies. Millions of bits of information inundate every network touchpoint. Without collaboration, users will find themselves siloed by their disparate data-driven workflows, making decisions based on slow, incomplete and disconnected data. 

To truly harness this vast amount of data, companies should be looking to solutions that support self-learning. Democratized supply chains are not created overnight. They require every partner and function to have equal access to data and optimization engines that take into consideration every outcome and priority — ingesting data and making decisions more rapidly than ever before. Such ecosystems result in supply chains that are strategic, functional and built to withstand today’s fluctuations and obstacles. 

Jim Beveridge is Senior Director of Product Marketing at Blue Yonder


Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers

Repost: Original Source and Author Link


What are data scientists’ biggest concerns? The 2022 State of Data Science report has the answers

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

Data science is a quickly growing technology as organizations of all sizes embrace artificial intelligence (AI) and machine learning (ML), and along with that growth has come no shortage of concerns.

The 2022 State of Data Science report, released today by data science platform vendor Anaconda, identifies key trends and concerns for data scientists and the organizations that employ them. Among the trends identified by Anaconda is the fact that the open-source Python programming language continues to dominate the data science landscape. 

Among the key concerns identified in the report was the barriers to adoption of data science overall.

“One area that did surprise me was that 2/3 of respondents felt that the biggest barrier to successful enterprise adoption of data science is insufficient investment in data engineering and tooling to enable production of good models,” Peter Wang, Anaconda CEO and cofounder, told VentureBeat. “We’ve always known that data science and machine learning can suffer from poor models and inputs, but it was interesting to see our respondents rank this even higher than the talent/headcount gap.”


MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

AI bias in data science is far from a solved issue

The issue of AI bias is one that is well known for data science. What isn’t as well known is exactly what organizations are actually doing to combat the issue.

Last year, Anaconda’s 2021 State of Data Science found that 40% of orgs were planning or doing something to help with the issue of bias. Anaconda didn’t ask the same question this year, opting instead to take a different approach.

“Instead of asking if organizations were planning to address bias, we wanted to look at the specific steps organizations are now taking to ensure fairness and mitigate bias,” Wang said. “We realized from our findings last year that organizations had plans in the works to address this, so for 2022, we wanted to look into what actions they took, if any, and where their priorities are.”

As part of AI bias prevention efforts, 31% of respondents noted that they evaluate data collection methods according to internally set standards for fairness. In contrast, 24% noted that they do not have standards for fairness and bias mitigation in datasets and models.

AI explainability is a foundational element for helping to identify and prevent bias. When asked what tools are used for AI explainability, 35% of respondents noted that their organizations perform a series of controlled tests to assess model interpretability, while 24% do not have any measures or tools to ensure model explainability.

“While each response measure has less than 50% of these efforts in place, the results here tell us that organizations are taking a varied approach to mitigating bias,” Wang said. “Ultimately, organizations are taking action, they’re just early in their journey of addressing bias.”

How data scientists spend their time

Data scientists have a number of different tasks they need to do as part of their jobs.

While actually deploying models is the desired end goal, that’s not where data scientists actually spend most of their time. In fact, the study found that data scientists only spend 9% of their time on deploying models. Similarly, respondents reported they only spend 9% of their time on model selection.

The biggest time sink is data preparation and cleansing, which accounts for 38% of the time.

The love and fear relationship with open source

The report also asked data scientists about how they use and view open-source software.

Eighty-seven percent responded that their organizations allowed for open-source software. Yet despite that use, 54% of respondents noted that they are worried about open-source security.

“Today, open source is embedded across nearly every piece of software and technology, and it’s not just because it’s cheaper in the long run,” Wang said. “The innovation occurring around AI, machine learning and data science is all happening within the open-source ecosystem at a speed that can’t be matched by a closed system.”

That said, Wang said that it’s understandable for organizations to be aware of the risks involved with open source and develop a plan for mitigating any potential vulnerabilities.

“One of the benefits of open source is that patches and solutions are built out in the open instead of behind closed doors,” he said.

 The Anaconda report was based on a survey of 3,493 respondents from 133 countries.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Repost: Original Source and Author Link