Categories
AI

Microsoft open-sources SynapseML for developing AI pipelines

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more


Microsoft today announced the release of SynapseML (previously MMLSpark), an open source library designed to simplify the creation of machine learning pipelines. With SynapseML, developers can build “scalable and intelligent” systems for solving challenges across domains, including text analytics, translation, and speech processing, Microsoft says.

“Over the past five years, we have worked to improve and stabilize the SynapseML library for production workloads. Developers who use Azure Synapse Analytics will be pleased to learn that SynapseML is now generally available on this service with enterprise support [on Azure Synapse Analytics],” Microsoft software engineer Mark Hamilton wrote in a blog post.

Scaling up AI

Building machine learning pipelines can be difficult even for the most seasoned developer. For starters, composing tools from different ecosystems requires considerable code, and many frameworks aren’t designed with server clusters in mind.

Despite this, there’s increasing pressure on data science teams to get more machine learning models into use. While AI adoption and analytics continue to rise, an estimated 87% of data science projects never make it to production. According to Algorithmia’s recent survey, 22% of companies take between one and three months to deploy a model so it can deliver business value, while 18% take over three months.

SynapseML aims to address the challenge by unifying existing machine learning frameworks and Microsoft-developed algorithms in an API, usable across Python, R, Scala, and Java. SynapseML enables developers to combine frameworks for use cases that require more than one framework, such as search engine creation, while training and evaluating models on resizable clusters of computers.

As Microsoft explains on the project’s website, SynapseML expands Apache Spark, the open source engine for large-scale data processing, in several new directions: “[The tools in SynapseML] allow users to craft powerful and highly-scalable models that span multiple [machine learning] ecosystems. SynapseML also brings new networking capabilities to the Spark ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models and use their Spark clusters for massive networking workflows.”

SynapseML

SynapseML also enables developers to use models from different machine learning ecosystems through the Open Neural Network Exchange (ONNX), a framework and runtime co-developed by Microsoft and Facebook. With the integration, developers can execute a variety of classical and machine learning models with only a few lines of code.

Beyond this, SynapseML introduces new algorithms for personalized recommendation and contextual bandit reinforcement learning using the Vowpal Wabbit framework, an open source machine learning system library originally developed at Yahoo Research. In addition, the API features capabilities for “unsupervised responsible AI,” including tools for understanding dataset imbalance (e.g., whether “sensitive” dataset features like race or gender are over- or under-represented) without the need for labeled training data and explainability dashboards that explain why models make certain predictions — and how to improve the training datasets.

Where labeled datasets don’t exist, unsupervised learning — also known as self-supervised learning — can help to fill the gaps in domain knowledge. For example, Facebook’s recently announced SEER, an unsupervised model, trained on a billion images to achieve state-of-the-art results on a range of computer vision benchmarks. Unfortunately, unsupervised learning doesn’t eliminate the potential for bias or flaws in the system’s predictions. Some experts theorize that removing these biases might require a specialized training of unsupervised models with additional, smaller datasets curated to “unteach” biases.

“Our goal is to free developers from the hassle of worrying about the distributed implementation details and enable them to deploy them into a variety of databases, clusters, and languages without needing to change their code,” Hamilton said.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Intel open-sources AI-powered tool to spot bugs in code

Let the OSS Enterprise newsletter guide your open source journey! Sign up here.

Intel today open-sourced ControlFlag, a tool that uses machine learning to detect problems in computer code — ideally to reduce the time required to debug apps and software. In tests, the company’s machine programming research team says that ControlFlag has found hundreds of defects in proprietary, “production-quality” software, demonstrating its usefulness.

“Last year, ControlFlag identified a code anomaly in Client URL (cURL), a computer software project transferring data using various network protocols over one billion times a day,” Intel principal AI scientist Justin Gottschlich wrote in a blog post on LinkedIn. “Most recently, ControlFlag achieved state-of-the-art results by identifying hundreds of latent defects related to memory and potential system crash bugs in proprietary production-level software. In addition, ControlFlag found dozens of novel anomalies on several high-quality open-source software repositories.”

The demand for quality code draws an ever-growing number of aspiring programmers to the profession. After years of study, they learn to translate abstracts into concrete, executable programs — but most spend the majority of their working hours not programming. A recent study found that the IT industry spent an estimated $2 trillion in 2020 in software development costs associated with debugging code, with an estimated 50% of IT budgets spent on debugging.

ControlFlag, which works with any programming language containing control structures (i.e., blocks of code that specify the flow of control in a program), aims to cut down on debugging work by leveraging unsupervised learning. With unsupervised learning, an algorithm is subjected to “unknown” data for which no previously defined categories or labels exist. The machine learning system — ControlFlag, in this case — must teach itself to classify the data, processing the unlabeled data to learn from its inherent structure.

ControlFlag continually learns from unlabeled source code, “evolving” to make itself better as new data is introduced. While it can’t yet automatically mitigate the programming defects it finds, the tool provides suggestions for potential corrections to developers, according to Gottschlich.

“Intel is committed to making software more robust and less cumbersome to maintain while retaining excellent performance without introducing security vulnerabilities. We hope that projects like ControlFlag can substantially reduce the time it takes to develop software globally,” Gottschlich wrote. “Due to the overwhelming amount of time spent on debugging, even a small savings of time in this space could result in time and monetary savings and thereby allow us — as a community — to accelerate the advancement of technology.”

AI-powered coding tools like ControlFlag, as well as platforms like Tabnine, Ponicode, Snyk, and DeepCode, have the potential to reduce costly interactions between developers, such as Q&A sessions and repetitive code review feedback. IBM and OpenAI are among the many companies investigating the potential of machine learning in the software development space. But studies have shown that AI has a ways to go before it can replace many of the manual tasks that human programmers perform on a regular basis.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

DeepMind acquires and open-sources robotics simulator MuJoCo

Join gaming leaders online at GamesBeat Summit Next this upcoming November 9-10. Learn more about what comes next. 


Let the OSS Enterprise newsletter guide your open source journey! Sign up here.

DeepMind, the AI lab owned by Google parent company Alphabet, today announced that it has acquired and released the MuJoCo simulator, making it freely available to researchers as a precompiled library. In a blog post, the lab says that it’ll work to prepare the codebase for a release in 2022 and “continue to improve” MuJoCo as open-source software under the Apache 2.0 license.

recent article in the Proceedings of the National Academy of Sciences exploring the state of simulation in robotics identifies open source tools as critical for advancing research. The authors’ recommendations are to develop open source simulation platforms as well as establish community-curated libraries of models, a step that DeepMind claims it has now taken.

“Our robotics team has been using MuJoCo as a simulation platform for various projects … Ultimately, MuJoCo closely adheres to the equations that govern our world,” DeepMind wrote. “We’re committed to developing and maintaining MuJoCo as a free, open-source, community-driven project with best-in-class capabilities. We’re currently hard at work preparing MuJoCo for full open sourcing.”

Simulating physics

MuJoCo, which stands for Multi-Joint Dynamics with Contact, is widely used within the robotics community alongside simulators like Facebook’s Habitat, OpenAI’s Gym, and DARPA-backed Gazebo. Initially developed by Emo Todorov, a neuroscientist and director of the Movement Control Laboratory at the University of Washington, MuJoCo was made available through startup Roboti LLC as a commercial product in 2015.

Unlike many simulators designed for gaming and film applications, MuJoCo takes few shortcuts that prioritize stability over accuracy. For example, the library accounts for gyroscopic forces, implementing full equations of motion — the equations that describe the behavior of a physical system in terms of its motion as a function of time. MuJoCo also supports musculoskeletal models of humans and animals, meaning that applied forces can be distributed correctly to the joints.

MuJoCo

MuJoCo’s core engine is written in the programming language C, which makes it easily translatable other other architectures. Moreover, the library’s scene description and simulation state are stored in just two data structures, which constitute all the information needed to recreate a simulation including results from intermediate stages.

“MuJoCo’s scene description format uses cascading defaults — avoiding multiple repeated values ​​– and contains elements for real-world robotic components like equality constraints, motion-capture markers, tendons, actuators, and sensors. Our long-term roadmap includes standardising [it] as an open format, to extend its usefulness beyond the MuJoCo ecosystem,” DeepMind wrote.

MuJoCo

Of course, no simulator is perfect. A paper published by researchers at Carnegie Mellon outlines the issues with them, including:

  • The reality gap: No matter how accurate, simulated environments don’t always adequately represent physical reality.
  • Resource costs: The computational overhead of simulation requires specialized hardware like graphics cards, which drives high cloud costs.
  • Reproducibility: Even the best simulators can contain “non-deterministic” elements that make reproducing tests impossible.

Overcoming these is a grand challenge in simulation research. In fact, some experts believe that developing a simulation with 100% accuracy and complexity might require as much problem-solving and resources as developing robots themselves, which is why simulators are likely to be used in tandem with real-world testing for the foreseeable future.

MuJoCo 2.1 has been released as unlocked binaries, available at the project’s original website and on GitHub along with updated documentation. DeepMind is granting licenses to provide an unlocked activation key for legacy versions of MuJoCo (2.0 and earlier), which will expire on October 18, 2031.

DeepMind’s acquisition of MuJoCo comes after the company’s first profitable year. According to a filing last week, the company raked in £826 million ($1.13 billion USD) in revenue in 2020, more than three times the £265 million ($361 million USD) it filed in 2019.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Facebook open-sources robotics development platform Droidlet

All the sessions from Transform 2021 are available on-demand now. Watch now.


Facebook today open-sourced Droidlet, a platform for building robots that leverage natural language processing and computer vision to understand the world around them. Droidlet simplifies the integration of machine learning algorithms in robots, according to Facebook, facilitating rapid software prototyping.

Robots today can be choreographed to vacuum the floor or perform a dance, but they struggle to accomplish much more than that. This is because they fail to process information at a deep level. Robots can’t recognize what a chair is or know that bumping into a spilled soda can will make a bigger mess, for example.

Facebook Droidlet

Droidlet isn’t a be-all and end-all solution to the problem, but rather a way to test out different computer vision and natural language processing models. It allows researchers to build systems that can accomplish tasks in the real world or in simulated environments like Minecraft or Facebook’s Habitat, supporting the use of the same system on different robotics by swapping out components as needed. The platform provides a dashboard researchers can add debugging and visualization widgets and tools to, as well as an interface for correcting errors and annotation. And Droidlet ships with wrappers for connecting machine learning models to robots, in addition to environments for testing vision models fine-tuned for the robot setting.

Modular design

Droidlet is made up of a collection of components — some heuristic, some learned — that can be trained with static data when convenient or dynamic data where appropriate. The design consists of several module-to-module interfaces:

  • A memory system that acts as a store for information across the various modules
  • A set of perceptual modules that process information from the outside world and store it in memory
  • A set of lower-level tasks, such as “Move three feet forward” and “Place item in hand at given coordinates,” that can affect changes in a robot’s environment
  • A controller that decides which tasks to execute based on the state of the memory system

Each of these modules can be further broken down into trainable or heuristic components, Facebook says, and the modules and dashboards can be used outside of the Droidlet ecosystem. For researchers and hobbyists, Droidlet also offers “battery-included” systems that can perceive their environment via pretrained object detection and pose estimation models and store their observations in the robot’s memory. Using this representation, the systems can respond to language commands like “Go to the red chair,” tapping a pretrained neural semantic parser that converts natural language into programs.

Facebook Droidlet

“The Droidlet platform supports researchers building embodied agents more generally by reducing friction in integrating machine learning models and new capabilities, whether scripted or learned, into their systems, and by providing user experiences for human-agent interaction and data annotation,” Facebook wrote in a blog post. “As more researchers build with Droidlet, they will improve its existing components and add new ones, which others in turn can then add to their own robotics projects … With Droidlet, robotics researchers can now take advantage of the significant recent progress across the field of AI and build machines that can effectively respond to complex spoken commands like ‘Pick up the blue tube next to the fuzzy chair that Bob is sitting in.’”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

DeepMind open-sources AlphaFold 2 for protein structure predictions

All the sessions from Transform 2021 are available on-demand now. Watch now.


DeepMind this week open-sourced AlphaFold 2, its AI system that predicts the shape of proteins, to accompany the publication of a paper in the journal Nature. With the codebase now available, DeepMind says it hopes to broaden access for researchers and organizations in the health care and life science fields.

The recipe for proteins — large molecules consisting of amino acids that are the fundamental building blocks of tissues, muscles, hair, enzymes, antibodies, and other essential parts of living organisms — are encoded in DNA. It’s these genetic definitions that circumscribe their three-dimensional structures, which in turn determine their capabilities. But protein “folding,” as it’s called, is notoriously difficult to figure out from a corresponding genetic sequence alone. DNA contains only information about chains of amino acid residues and not those chains’ final form.

In December 2018, DeepMind attempted to tackle the challenge of protein folding with AlphaFold, the product of two years of work. The Alphabet subsidiary said at the time that AlphaFold could predict structures more precisely than prior solutions. Its successor, AlphaFold 2, announced in December 2020, improved on this to outgun competing protein-folding-predicting methods for a second time. In the results from the 14th Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 had average errors comparable to the width of an atom (or 0.1 of a nanometer), competitive with the results from experimental methods.

DeepMind AlphaFold

AlphaFold draws inspiration from the fields of biology, physics, and machine learning.  It takes advantage of the fact that a folded protein can be thought of as a “spatial graph,” where amino acid residues (amino acids contained within a peptide or protein) are nodes and edges connect the residues in close proximity. AlphaFold leverages an AI algorithm that attempts to interpret the structure of this graph while reasoning over the implicit graph it’s building using evolutionarily related sequences, multiple sequence alignment, and a representation of amino acid residue pairs.

In the open source release, DeepMind says it significantly streamlined AlphaFold 2. Whereas the system took days of computing time to generate structures for some entries to CASP, the open source version is about 16 times faster. It can generate structures in minutes to hours, depending on the size of the protein.

Real-world applications

DeepMind makes the case that AlphaFold, if further refined, could be applied to previously intractable problems in the field of protein folding, including those related to epidemiological efforts. Last year, the company predicted several protein structures of SARS-CoV-2, including ORF3a, whose makeup was formerly a mystery. At CASP14, DeepMind predicted the structure of another coronavirus protein, ORF8, that has since been confirmed by experimentalists.

Beyond aiding the pandemic response, DeepMind expects AlphaFold will be used to explore the hundreds of millions of proteins for which science currently lacks models. Since DNA specifies the amino acid sequences that comprise protein structures, advances in genomics have made it possible to read protein sequences from the natural world, with 180 million protein sequences and counting in the publicly available Universal Protein database. In contrast, given the experimental work needed to translate from sequence to structure, only around 170,000 protein structures are in the Protein Data Bank.

DeepMind says it’s committed to making AlphaFold available “at scale” and collaborating with partners to explore new frontiers, like how multiple proteins form complexes and interact with DNA, RNA, and small molecules. Earlier this year, the company announced a new partnership with the Geneva-based Drugs for Neglected Diseases initiative, a nonprofit pharmaceutical organization that used AlphaFold to identify fexinidazole as a replacement for the toxic compound melarsoprol in the treatment of sleeping sickness.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
Game

Amazon open-sources its in-house game engine

Amazon made its Lumberyard game engine free to use from the outset, but it’s now opening development of the technology to everyone, too. GamesBeat reports that Amazon has made Lumberyard an open source project, rebranding it as the Open 3D Engine. The Linux Foundation will manage the project and form an Open 3D Foundation to foster development. Amazon is a founding member alongside tech heavyweights like Adobe, Huawei, Niantic and Red Hat.

While the original Lumberyard was based on the Crytek engine, the version you’ll get as Open 3D Engine was rewritten and is free of possible patent headaches, according to Amazon. It also boasts a new, more photorealistic renderer as well as many of the other tools you’d need to build a game or simulation, including an animation system, a content editor and visual scripting.

This is relatively untouched ground for developers. They don’t always have to pay for engines, but they rarely have full freedom to modify the code for their own ends — and those that do often keep the modifications for themselves. Open 3D Engine not only allows extensive customization, but will encourage creators to contribute to the wider community. That theoretically strengthens the technology and helps it move faster than commercial tech like Unreal Engine and Unity.

There is a financial incentive for Amazon, though — open source may be its best chance at fostering growth. Amazon hasn’t had much success with in-house games built on Lumberyard, having cancelled Crucible and delayed New World. The shift to Open 3D Engine could spur adoption and encourage studios to use AWS, Twitch and other services that hook into the platform. Amazon could reap the rewards of Open 3D Engine even if its dreams of becoming a AAA game creator never come to pass.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.

Repost: Original Source and Author Link

Categories
AI

LinkedIn open-sources Greykite, a library for time series forecasting

Join Transform 2021 this July 12-16. Register for the AI event of the year.


LinkedIn today open-sourced Greykite, a Python library for long- and short-term predictive analytics. Greykite’s main algorithm, Silverkite, delivers automated forecasting, which LinkedIn says it uses for resource planning, performance management, optimization, and ecosystem insight generation.

For enterprises using predictive models to forecast consumer behavior, data drift was a major challenge in 2020 due to never-before-seen circumstances related to the pandemic. This being the case, accurate knowledge about the future remains helpful to any business. Automation, which enables reproducibility, may improve accuracy and can be consumed by algorithms downstream to make decisions.

For example, LinkedIn says that Silverkite improved revenue forecasts for 1-day ahead and 7-day ahead, as well as Weekly Active User forecasts for 2-week ahead. Median absolute percent error for revenue and Weekly Active User forecasts grew by more than 50% and 30%, respectively.

Greykite library

Greykite provides time series tools for trends, seasonality, holidays, and more so that users can fit the AI models of their choice. The library provides exploratory plots and templates for tuning, which define regressors based on data characteristics and forecast requirements like hourly short-term forecast and daily long-term forecast. Tuning knobs provided by the templates reduce the search to find a satisfactory forecast. And the Greykite library has flexibility to customize a model template for algorithms, letting users label (and specify whether to ignore or adjust) known anomalies.

Greykite, which provides outlier detection, can also select the optimal model from multiple candidates using past performance data. Instead of tuning each forecast separately, users can define a set of candidate forecast configurations that capture different types of patterns. Lastly, the library provides a summary that can be used to assess the effect of individual data points. For example, Greykite can check the magnitude of a holiday, see how much a changepoint affected the trend, or show how a certain feature might be beneficial to a model.

Greykite Silverkite

With Greykite, a “next 7-day” forecast trained on over 8 years of daily data takes only a few seconds to produce forecasts. LinkedIn says that its whole pipeline, including automatic changepoint detection, cross-validation, backtest, and evaluation, completes in under 45 seconds.

“The Greykite library provides a fast, accurate, and highly customizable algorithm — Silverkite — for forecasting. Greykite also provides intuitive tuning options and diagnostics for model interpretation. It is extensible to multiple algorithms, and facilitates benchmarking them through a single interface,” the LinkedIn research team wrote in a blog post. “We have successfully applied Greykite at LinkedIn for multiple business and infrastructure metrics use cases.”

The Greykite library is available on GitHub and PyPI, and it joins the many other tools LinkedIn has open-sourced to date. They include Iris, for managing website outages; PalDB, a low-key value store for handling side data; Ambry, an object store for media files; GDMix, a framework for training AI personalization models; LiFT, a toolkit to measure AI model fairness; and Dagli, a machine learning library for Java.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Microsoft open-sources Counterfit, an AI security risk assessment tool

Join Transform 2021 this July 12-16. Register for the AI event of the year.


Microsoft today open-sourced Counterfit, a tool designed to help developers test the security of AI and machine learning systems. The company says that Counterfit can enable organizations to conduct assessments to ensure that the algorithms used in their businesses are robust, reliable, and trustworthy.

AI is being increasingly deployed in regulated industries like health care, finance, and defense. But organizations are lagging behind in their adoption of risk mitigation strategies. A Microsoft survey found that 25 out of 28 businesses indicated they don’t have the right resources in place to secure their AI systems, and that security professionals are looking for specific guidance in this space.

Microsoft says that Counterfit was born out the company’s need to assess AI systems for vulnerabilities with the goal of proactively securing AI services. The tool started as a corpus of attack scripts written specifically to target AI models and then morphed into an automation product to benchmark multiple systems at scale.

Under the hood, Counterfit is a command-line utility that provides a layer for adversarial frameworks, preloaded with algorithms that can be used to evade and steal models. Counterfit seeks to make published attacks accessible to the security community while offering an interface from which to build, manage, and launch those attacks on models.

When conducting penetration testing on an AI system with Counterfit, security teams can opt for the default settings, set random parameters, or customize each for broad vulnerability coverage. Organizations with multiple models can use Counterfit’s built-in automation to scan — optionally multiple times in order to create operational baselines.

Counterfit also provides logging to record the attacks against a target model. As Microsoft notes, telemetry might drive engineering teams to improve their understanding of a failure mode in a system.

The business value of responsible AI

Internally, Microsoft says that it uses Counterfit as a part of its AI red team operations and in the AI development phase to catch vulnerabilities before they hit production. And the company says it’s tested Counterfit with several customers, including aerospace giant Airbus, which is developing an AI platform on Azure AI services. “AI is increasingly used in industry; it is vital to look ahead to securing this technology particularly to understand where feature space attacks can be realized in the problem space,” Matilda Rhode, a senior cybersecurity researcher at Airbus, said in a statement.

The value of tools like Counterfit is quickly becoming apparent. A study by Capgemini found that customers and employees will reward organizations that practice ethical AI with greater loyalty, more business, and even a willingness to advocate for them — and in turn, punish those that don’t. The study suggests that there’s both reputational risk and a direct impact on the bottom line for companies that don’t approach the issue thoughtfully.

Basically, consumers want confidence that AI is secure from manipulation. One of the recommendations from Gartner’s Top 5 Priorities for Managing AI Risk framework, published in January, is that organizations “[a]dopt specific AI security measures against adversarial attacks to ensure resistance and resilience.” The research firm estimates that by 2024, organizations which implement dedicated AI risk management controls will avoid negative AI outcomes twice as often as those that don’t.”

According to a Gartner report, through 2022, 30% of all AI cyberattacks will leverage training-data poisoning, model theft, or adversarial samples to attack machine learning-powered systems.

Counterfit is a part of Microsoft’s broader push toward explainable, secure, and “fair” AI systems. The company’s attempts at solutions to those and other challenges include AI bias-detecting tools, an open adversarial AI framework, internal efforts to reduce prejudicial errors, AI ethics checklists, and a committee (Aether) that advises on AI pursuits. Recently, Microsoft debuted WhiteNoise, a toolkit for differential privacy, as well as Fairlearn, which aims to assess AI systems’ fairness and mitigate any observed unfairness issues with algorithms.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Red Hat open-sources TrustyAI, an auditing tool for AI decision systems

Join Transform 2021 this July 12-16. Register for the AI event of the year.


The ability to automate decisions is becoming essential for enterprises that deal in industries where mission-critical processes involve many variables. For example, in the financial sector, assessing the risk of even a single transaction can become infinitely complex. But while the utility of AI-powered, automated decision-making systems is undeniable, utility often plays second fiddle to transparency. Automated decision-making systems can be hard to interpret in practice, particularly when they integrate with other AI systems.

In search of a solution, researchers at Red Hat developed the TrustyAI Explainability Toolkit, a library leveraging techniques for explaining automated decision-making systems. Part of Kogito, Red Hat’s cloud-native business automation framework, TrustyAI enriches AI model execution information through algorithms while extracting, collecting, and publishing metadata for auditing and compliance.

TrustyAI arrived in Kogito last summer but was released as a standalone open source package this week.

Transparency with TrustyAI

As the development team behind TrustyAI explains in a whitepaper, the toolkit can introspect black-box AI decision-making models to describe predictions and outcomes by looking at a “feature importance” chart. The chart orders a model’s inputs by the most important ones for the decision-making process, which can help determine whether a model is biased, the team says.

TrustyAI offers a dashboard, called Audit UI, that targets business users or auditors, where each automated decision-making workload is recorded and can be analyzed at a later date. For individual workloads, the toolkit makes it possible to access the inputs, the outcomes the model produced, and a detailed explanation of every one of them. Monitoring dashboards are generated based on model information so users can keep track of business aspects and have an aggregated view of decision behaviors.

TrustyAI’s runtime monitoring also allows for business and operational metrics to be displayed in a Grafana dashboard. Moreover, the toolkit can monitor operational aspects to keep track of the health of the automated decision-making system.

TrustyAI

Above: The TrustyAI monitoring dashboard.

Image Credit: TrustyAI

“Within TrustyAI, [we combine] machine learning models and decision logic to enrich automated decisions by including predictive analytics. By monitoring the outcome of decision making, we can audit systems to ensure they … meet regulations,” Rebecca Whitworth, part of the TrustyAI initiative at Red Hat, wrote in a blog post. “We can also trace these results through the system to help with a global overview of the decisions and predictions made. TrustyAI [relies] on the combination of these two standards to ensure trusted automated decision making.”

Transparency is an aspect of so-called responsible AI, which also benefits enterprises. A study by Capgemini found that customers and employees will reward organizations that practice ethical AI with greater loyalty, more business, and even a willingness to advocate for them — and punish those that don’t. The study suggests companies that don’t approach the issue thoughtfully can incur both reputational risk and a direct hit to their bottom line.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Pinterest open-sources big data analytics tool Querybook

Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.


Pinterest today open-sourced Querybook, a data management solution for enterprise-scale remote engineering collaboration. The company says the tool, which it uses internally, can help engineers compose queries, create analyses, and collaborate with one another via a notebook interface.

Querybook started in 2017 as an intern project at Pinterest. The development team early on decided on a document-like interface where users could write queries and analyses in one place, with collocated metadata and the simplicity of a note-taking app. Released internally in March 2018, Querybook became the go-to solution for big data analytics at Pinterest. It now averages 500 daily active users and 7,000 daily query runs.

“With Querybook, Pinterest engineers have brought together the power of metadata with the simplicity of a note-taking app for a better querying interface, where teams can compose queries and write analyses all in one place,” a spokesperson told VentureBeat. “Querybook can be set up and deployed in minutes.”

Every query executed on Querybook gets analyzed to extract metadata like referenced tables and query runners. Querybook uses this information to automatically update its data schema and search ranking, as well as to show a table’s frequent users and query examples. The more queries in Querybook, the better documented the tables become.

Querybook also features an admin interface that lets companies configure query engines, table metadata ingestion, and access permissions. From this interface, admins can make live Querybook changes without going through code or config files. And they can create visualizations, including lines, bars, stacked areas, pies, donuts, scatter charts, and table charts.

“The common starting point for any analysis at Pinterest is an ad-hoc query that gets executed on the internal Hadoop or Presto cluster. To continuously make these improvements, especially in an increasingly remote environment, it’s more important than ever for teams to be able to compose queries, create analyses, and collaborate with one another,” Pinterest wrote in a blog post. “We built Querybook to provide a responsive and simple web user interface for such analysis so data scientists, product managers, and engineers can discover the right data, compose their queries, and share their findings.”

Pinterest previously open-sourced Teletraan, a tool that can deploy code onto virtual machines, such as those available from public cloud Amazon Web Services. Prior to this, the company released Terrapin, software designed to more efficiently push data out of the Hadoop open source big data software and make it available for other systems to use.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link