Categories
AI

Microsoft’s Tutel optimizes AI model training

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more


Let the OSS Enterprise newsletter guide your open source journey! Sign up here.

Microsoft this week announced Tutel, a library to support the development of mixture of experts (MoE) models — a particular type of large-scale AI model. Tutel, which is open source and has been integrated into fairseq, one of Facebook’s toolkits in PyTorch, is designed to enable developers across AI disciplines to “execute MoE more easily and efficiently,” a statement from Microsoft explained.

MoE are made up of small clusters of “neurons” that are only active under special, specific circumstances. Lower “layers” of the MoE model extract features and experts are called upon to evaluate those features. For example, MoEs can be used to create a translation system, with each expert cluster learning to handle a separate part of speech or special grammatical rule.

Compared with other model architectures, MoEs have distinct advantages. They can respond to circumstances with specialization, allowing the model to display a greater range of behaviors. The experts can receive a mix of data, and when the model is in operation, only a few experts are active — even a huge model needs only a small amount of processing power.

In fact, MoE is one of the few approaches demonstrated to scale to more than a trillion parameters, paving the way for models capable of powering computer vision, speech recognition, natural language processing, and machine translation systems, among others. In machine learning, parameters are the part of the model that’s learned from historical training data. Generally speaking, especially in the language domain, the correlation between the number of parameters and sophistication has held up well.

Tutel mainly focuses on the optimizations of MoE-specific computation. In particular, the library is optimized for Microsoft’s new Azure NDm A100 v4 series instances, which provide a sliding scale of Nvidia A100 GPUs. Tutel has a “concise” interface intended to make it easy to integrate into other MoE solutions, Microsoft says. Alternatively, developers can use the Tutel interface to incorporate standalone MoE layers into their own DNN models from scratch.

A line graph comparing the end-to-end performance of Meta’s MoE language model using Azure NDm A100 v4 nodes with and without Tutel. The x-axis is the number of A100 (80GB) GPUs, beginning at 8 and going up to 512, and the y-axis is the throughput (K tokens/s), beginning with 0 and going up to 1,000 in intervals of 100. Tutel always achieves higher throughput than fairseq.

Above: For a single MoE layer, Tutel achieves an 8.49 times speedup on an NDm A100 v4 node with 8 GPUs and a 2.75 times speedup on 64 NDm A100 v4 nodes with 512 A100 GPUs, Microsoft claims.

“Because of the lack of efficient implementations, MoE-based models rely on a naive combination of multiple off-the-shelf operators provided by deep learning frameworks such as PyTorch and TensorFlow to compose the MoE computation. Such a practice incurs significant performance overheads thanks to redundant computation,” Microsoft wrote in a blog post. (Operators provide a model with a known dataset that includes desired inputs and outputs). “Tutel designs and implements multiple highly optimized GPU kernels to provide operators for MoE-specific calculation.”

Tutel is available in open source on GitHub. Microsoft says that the Tutel development team will “be actively integrating” various emerging MoE algorithms from the community into future releases.

“MoE is a promising technology. It enables holistic training based on techniques from many areas, such as systematic routing and network balancing with massive nodes, and can even benefit from GPU-based acceleration. We demonstrate an efficient MoE implementation, Tutel, that resulted in significant gain over the fairseq framework. Tutel has been integrated [with our] DeepSpeed framework, as well, and we believe that Tutel and related integrations will benefit Azure services, especially for those who want to scale their large models efficiently,” Microsoft added.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Unity moves robotics design and training to the metaverse

Join gaming leaders online at GamesBeat Summit Next this upcoming November 9-10. Learn more about what comes next. 


Unity, the San Francisco-based platform for creating and operating games and other 3D content, on November 10 announced the launch of Unity Simulation Pro and Unity SystemGraph to improve modeling, testing, and training complex systems through AI.

With robotics usage in supply chains and manufacturing increasing, such software is critical to ensuring efficient and safe operations.

Danny Lange, senior vice president of artificial intelligence for Unity, told VentureBeat via email that the Unity SystemGraph uses a node-based approach to model the complex logic typically found in electrical and mechanical systems. “This makes it easier for roboticists and engineers to model small systems, and allows grouping those into larger, more complex ones — enabling them to prototype systems, test and analyze their behavior, and make optimal design decisions without requiring access to the actual hardware,” said Lange.

Unity’s execution engine, Unity Simulation Pro, offers headless rendering — eliminating the need to project each image to a screen and thus increasing simulation efficiency by up to 50% and lowering costs, the company said.

Use cases for robotics

“The Unity Simulation Pro is the only product built from the ground up to deliver distributed rendering, enabling multiple graphics processing units (GPUs) to render the same Unity project or simulation environment simultaneously, either locally or in the private cloud,” the company said. This means multiple robots with tens, hundreds, or even thousands of sensors can be simulated faster than real time on Unity today.

According to Lange, users in markets like robotics, autonomous driving, drones, agriculture technology, and more are building simulations containing environments, sensors, and models with million-square-foot warehouses, dozens of robots, and hundreds of sensors. With these simulations, they can test software against realistic virtual worlds, teach and train robot operators, or try physical integrations before real-world implementation. This is all faster, more cost-effective, and safer, taking place in the metaverse.

“A more specific use case would be using Unity Simulation Pro to investigate collaborative mapping and mission planning for robotic systems in indoor and outdoor environments,” Lange said. He added that some users have built a simulated 4,000 square-foot building sitting within a larger forested area and are attempting to identify ways to map the environment using a combination of drones, off-road mobile robots, and walking robots. The company reports it has been working to enable creators to build and model the sensors and systems of mechatronic systems to run in simulations.

A major application of Unity SystemGraph is how it enables those looking into building simulations with a physically accurate camera, lidar models, and SensorSDK to take advantage of SystemGraph’s library of ready-to-use models and easily configure them to their specific cases.

Customers can now simulate at scale, iterate quickly, and test more to drive insights at a fraction of current simulation costs, Unity says. The company adds that customers like Volvo Cars, Allen Institute of AI, and Carnegie Mellon University are already seeing results.

While there are several companies that have built simulators targeted especially at AI applications like robotics or synthetic data generation, Unity claims that the ease of use of its authoring tools makes it stand out above its rivals, including top competitors like Roblox, Aarki, Chartboost, MathWorks, and Mobvista. Lange says this is evident in the size of Unity’s existing user base of over 1.5 million creators using its editor tools.

Unity says its technology is aimed at impacting the industrial metaverse, where organizations continue to push the envelope on cutting-edge simulations.

“As these simulations grow in complexity in terms of the size of the environment, the number of sensors used in that environment, or the number of avatars operating in that environment, the need for our product increases. Our distributed rendering feature, which is unique to Unity Simulation Pro, enables you to leverage the increasing amount of GPU compute resources available to customers, in the cloud or on-premise networks, to render this simulation faster than real time. This is not possible with many open source rendering technologies or even the base Unity product — all of which will render at less than 50% real time for these scenarios,” Lange said.

The future of AI-powered  technologies

Moving into 2022, Unity says it expects to see a steep increase in the adoption of AI-powered technologies, with two key adoption motivators. “On one side, companies like Unity will continue to deliver products that help lower the barrier to entry and help increase adoption by wider ranges of customers. This is combined with the decreasing cost of compute, sensors, and other hardware components,” Lange said. “Then on the customer adoption side, the key trends that will drive adoption are broader labor shortages and the demand for more operational efficiencies — all of which have the effect of accelerating the economics that drive the adoption of these technologies on both fronts.”

Unity is doubling down on building purpose-built products for its simulation users, enabling them to mimic the real world by simulating environments with various sensors, multiple avatars, and agents for significant performance gains with lower costs. The company says this will help its customers to take the first step into the industrial metaverse.

Unity will showcase the Unity Simulation Pro and Unity SystemGraph through in-depth sessions at the forthcoming Unity AI Summit on November 18, 2021.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Data Society launches AI-driven meldR platform for data science training

Data Society, a Washington-headquartered organization providing data science training programs and AI/ML solutions to corporations and government agencies, has announced the launch of meldR, a learning experience and communication platform (LXCP).

Targeted at the health care and life sciences industries, the offering allows learning and development teams of businesses to deliver AI/ML-generated data science learning pathways to their employees. It curates courses according to the organization’s goals and the learner’s needs, and even works with proprietary datasets, allowing teams to offer courses using their own data.

“The healthcare and life science industry today faces the challenge of delivering centralized training programs effectively, which is a big roadblock to building an internal data culture,” Merav Yuravlivker, CEO of Data Society, said in a statement. “meldR supports an organization’s desire to prepare its employees with the skills needed to solve complex challenges and unlock the new potential that further their organization’s goals.”

According to a recent survey commissioned by Domino Data Lab, 97% of U.S. data executives say data science is crucial to maintaining profitability and boosting the bottom line. However, nearly as many said that flawed approaches to staffing, processes, and tooling are causing failure in scaling data science projects, making achieving that goal difficult. This is where meldR comes in.

Community of practice with meldR

In addition to providing industry-tailored, domain-specific data science academies, the platform also creates an internal community of practice that fosters innovation, empowers communication between employees and their L&D teams, and streamlines the process of finding the right talent for the right team.

This, as Data Society explains, is done through a series of tools on the platform such as messaging, email platform integration, notifications, discussion boards, calendars, online events, and one-on-one TA and instructor meetings.

Beyond this, L&D team leaders can even use their meldR dashboard to take a quick look at learner badges, pathways, and certifications to gather metrics and quickly identify up-skilled internal resources, matching internal talent and data science department requirements.

Data Society is offering the solution as a freemium product available on a rapid deployment model. It remains restricted to the healthcare and life sciences industry but should expand to other segments at a later stage.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

AI Weekly: AI model training costs on the rise, highlighting need for new solutions

This week, Microsoft and Nvidia announced that they trained what they claim is one of the largest and most capable AI language models to date: Megatron-Turing Natural Language Generation (MT-NLP). MT-NLP contains 530 billion parameters — the parts of the model learned from historical data — and achieves leading accuracy in a broad set of tasks, including reading comprehension and natural language inferences.

But building it didn’t come cheap. Training took place across 560 Nvidia DGX A100 servers, each containing 8 Nvidia A100 80GB GPUs. Experts peg the cost in the millions of dollars.

Like other large AI systems, MT-NLP raises questions about the accessibility of cutting-edge research approaches in machine learning. AI training costs dropped 100-fold between 2017 and 2019, but the totals still exceed the compute budgets of most startups, governments, nonprofits, and colleges. The inequity favors corporations and world superpowers with extraordinary access to resources at the expense of smaller players, cementing incumbent advantages.

For example, in early October, researchers at Alibaba detailed M6-10T, a language model containing 10 trillion parameters (roughly 57 times the size of OpenAI’s GPT-3) trained across 512 Nvidia V100 GPUs for 10 days. The cheapest V100 plan available through Google Cloud Platform costs $2.28 per hour, which would equate to over $300,000 ($2.28 per hour multiplied by 24 hours over 10 days) — further than most research teams can stretch.

Google subsidiary DeepMind is estimated to have spent $35 million training a system to learn the Chinese board game Go. And when the company’s researchers designed a model to play StarCraft II, they purposefully didn’t try multiple ways of architecting a key component because the training cost would have been too high. Similarly, OpenAI didn’t fix a mistake when it implemented GPT-3 because the cost of training made retraining the model infeasible.

Paths forward

It’s important to keep in mind that training costs can be inflated by factors other than an algorithm’s technical aspects. As Yoav Shoham, Stanford University professor emeritus and cofounder of AI startup AI21 Labs, recently told Synced, personal and organizational considerations often contribute to a model’s final price tag.

“[A] researcher might be impatient to wait three weeks to do a thorough analysis and their organization may not be able or wish to pay for it,” he said. “So for the same task, one could spend $100,000 or $1 million.”

Still, the increasing cost of training — and storing — algorithms like Huawei’s PanGu-Alpha, Naver’s HyperCLOVA, and the Beijing Academy of Artificial Intelligence’s Wu Dao 2.0 is giving rise to a cottage industry of startups aiming to “optimize”  models without degrading accuracy. This week, former Intel exec Naveen Rao launched a new company, Mosaic ML, to offer tools, services, and training methods that improve AI system accuracy while lowering costs and saving time. Mosaic ML — which has raised $37 million in venture capital — competes with Codeplay Software, OctoML, Neural Magic, Deci, CoCoPie, and NeuReality in a market that’s expected to grow exponentially in the coming years.

In a sliver of good news, the cost of basic machine learning operations has been falling over the past few years. A 2020 OpenAI survey found that since 2012, the amount of compute needed to train a model to the same performance on classifying images in a popular benchmark — ImageNet — has been decreasing by a factor of two every 16 months.

Approaches like network pruning prior to training could lead to further gains. Research has shown that parameters pruned after training, a process that decreases the model size, could have been pruned before training without any effect on the network’s ability to learn. Called the “lottery ticket hypothesis,” the idea is that the initial values parameters in a model receive are crucial for determining whether they’re important. Parameters kept after pruning receive “lucky” initial values; the network can train successfully with only those parameters present.

Network pruning is far from a solved science, however. New ways of pruning that work before or in early training will have to be developed, as most current methods apply only retroactively. And when parameters are pruned, the resulting structures aren’t always a fit for the training hardware (e.g., GPUs), meaning that pruning 90% of parameters won’t necessarily reduce the cost of training a model by 90%.

Whether through pruning, novel AI accelerator hardware, or techniques like meta-learning and neural architecture search, the need for alternatives to unattainably large models is quickly becoming clear. A University of Massachusetts Amherst study showed that using 2019-era approaches, training an image recognition model with a 5% error rate would cost $100 billion and produce as much carbon emissions as New York City does in a month. As IEEE Spectrum’s editorial team wrote in a recent piece, “we must either adapt how we do deep learning or face a future of much slower progress.”

For AI coverage, send news tips to Kyle Wiggers — and be sure to subscribe to the AI Weekly newsletter and bookmark our AI channel, The Machine.

Thanks for reading,

Kyle Wiggers

AI Staff Writer

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Mindtickle raises $100M to gamify sales training

All the sessions from Transform 2021 are available on-demand now. Watch now.


Mindtickle, which provides a “sales readiness” platform for enterprises, today announced it has closed a $100 million series E funding round led by SoftBank, with participation from Chimera Capital, Norwest Venture Partners, Canaan, NewView Capital, and Qualcomm Ventures. The funds, which bring the company’s total raised to $281 million at a $1.2 billion post-money valuation, will be used to expand Mindtickle’s sales enablement, revenue operations, and training teams, according to cofounder and CEO Krishna Depura.

There’s a real and present need for sales readiness solutions. According to Forbes, 58% of buyers report that sales reps are unable to answer their questions accurately or effectively. Moreover, an estimated 84% of all sales training is lost after 90 days due to the lack of information retention among sales personnel. Perhaps unsurprisingly, high-performing sales teams use nearly 3 times the amount of sales technology as underperforming teams, one source found.

Mindtickle’s platform offers continuous learning modules, like simulated scenarios, structured coaching programs, and quizzes and polls. It gamifies lessons and skill-building activities with points, badges, certifications, and leaderboards, which it funnels to a dashboard to expose potential knowledge gaps. On the admin side, Mindtickle creates competency maps that identify problem areas and automatically assigns training based on results, tracking real-time engagement and readiness while delivering personalized feedback to reps as they progress through course materials.

“Throughout our professional lives, I and my cofounders, Nishant Mungali and Mohit Garg, experienced the challenges of unengaging, ineffective training and coaching, firsthand. To solve that problem, we first built a gamification platform that could be used by HR leaders to engage and inform their teams,” Depura told VentureBeat via email. “Over time, through innumerable discussions with customers and prospects, we discovered that the customer-facing teams, particularly the sales teams, could leverage our platform to become more effective and achieve better business results. From that point on, we engaged deeply with revenue leaders across the globe to solve specific use cases in the sales organization and help them create high-performing sales teams.”

Supercharging sales

Mindtickle taps machine learning models to optimize administrative tasks like data entry, aiming to identify knowledge and skill gaps that could impact customer interactions. Conversational intelligence capabilities provide insight into what’s happening across all real-world deal interactions. By learning what’s important to both sellers and buyers, the models deliver opportunities for salespeople to be coached and to reinforce their knowledge, Depura says.

“Perhaps the most impactful AI for sales is focused on seller preparation and call execution to reduce or eliminate the need for human review, pattern detection, and decision-making … [Our AI can] give sellers more time to sell and extends to helping them become more effective,” he added. “AI can help sellers get ready for every customer interaction, [preparing] sellers so they’re ready with the right knowledge, skills, and execution at every stage of the sales process.”

In 2020, Mindtickle claims to have doubled the number of Fortune 500 and Forbes Global 2000 companies it counts as customers, which span health and life sciences organizations, insurance carriers, and tech brands. In total, it has more than 1 million users and 220 brands on the platform, 9 out of 10 of which expanded the scope of their workforce readiness programs after adopting Mindtickle.

MindTickle

“The pandemic accelerated the digitization trends of business-to-business buying and selling, with fewer on-site sales meetings, convergence of inside and field sales, and increased adoption of digital tools. [R]evenue leaders must ensure their sales teams are agile enough to adapt to the shifting landscape by equipping them with the knowledge, skills, and behaviors needed to be successful,” Depura added. “Combined, these trends have resulted in an increased demand for remote-first approaches and technologies that enable and prepare customer-facing employees … [With Mindtickle], revenue leaders can partner with their enablement organizations to define a singular measurement that sets a baseline for what knowledge, skills, and capabilities each sales rep in your organization should possess … Mindtickle’s sales content management capabilities allow prescriptive guidance on not only what content to use, but how it should be deployed and when.”

San Francisco, California-based Mindtickle currently has around 480 employees and says it’s hiring “aggressively” across all areas of the business. By the end of the year, it expects to employ “well north” of 500.

Mindtickle competes in a sales enablement market that’s anticipated to be worth $2.6 billion by 2024, according to Markets and Markets. Rival startup Seismic has raised tens of millions of dollars to roll out its automated sales and marketing enablement suite, as has Showpad. There’s also Outreach, which is creating a semiautomated sales engagement software, along with AI-powered sales enablement toolset developer Highspot.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Spell unveils deep learning operations platform to cut AI training costs

All the sessions from Transform 2021 are available on-demand now. Watch now.


Spell today unveiled an operations platform that provides the tooling needed to train AI models based on deep learning algorithms.

The platforms currently employed to train AI models are optimized for machine learning algorithms. AI models based on deep learning algorithms require their own deep learning operations (DLOps) platform, Spell head of marketing Tim Negris told VentureBeat.

The Spell platform automates the entire deep learning workflow using tools the company developed in the course of helping organizations build and train AI models for computer vision and speech recognition applications that require deep learning algorithms.

Deep roots

Deep learning algorithms trace their lineage back to neural networks in a field of machine learning that structures algorithms in layers to create a neural network that can learn and make intelligent decisions on its own. The artifacts and models that are created using deep learning algorithms, however, don’t lend themselves to the same platforms used to manage machine learning operations (MLOps), Negris said.

An AI model based on deep learning algorithms can require tracking and managing hundreds of experiments with thousands of parameters spanning large numbers of graphical processor units (GPUs), Negris noted. The Spell platform specifically addresses the need to manage, automate, orchestrate, document, optimize, deploy, and monitor deep learning models throughout their entire lifecycle, he said. “Data science teams need to be able to explain and reproduce deep learning results,” Negris added.

While most existing MLOps platforms are not well suited to managing deep learning algorithms, Negris said the Spell platform can also be employed to manage AI models based on machine learning algorithms. Spell does not provide any tools to manage the lifecycle of those models, but data science teams can add their own third-party framework for managing them to the Spell platform.

The Spell platform also reduces cost by automatically invoking spot instances that cloud service providers make available for a finite amount of time whenever feasible, Negris said. That capability can reduce the total cost of training an AI model by as much as 66%, he added. That’s significant because the cost of training AI models based on deep learning algorithms can in some cases reach millions of dollars.

A hybrid approach

In time, most AI applications will be constructed using a mix of machine and deep learning algorithms. In fact, as the building of AI models using machine learning algorithms becomes more automated, many data science teams will spend more of their time constructing increasingly complex AI models based on deep learning algorithms. The cost of building AI models based on deep learning algorithms should also steadily decline as GPUs deployed in an on-premises IT environment or accessed via a cloud service become more affordable.

In the meantime, Negris said that while the workflows for building AI models will converge, it’s unlikely traditional approaches to managing application development processes based on DevOps platforms will be extended to incorporate AI models. The continuous retraining of AI models that are subject to drift does not lend itself to the more linear processes that are employed today to build and deploy traditional applications, he said.

Nevertheless, all the AI models being trained eventually need to find their way into an application deployed in a production environment. The challenge many organizations face today is aligning the rate at which AI models are developed with the faster pace at which applications are now deployed and updated.

One way or another, it’s only a matter of time before every application — to varying degrees — incorporates one or more AI models. The issue going forward is finding a way to reduce the level of friction that occurs whenever an AI model needs to be deployed within an application.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Training AI: Reward is not enough

Join AI & data leaders at Transform 2021 on July 12th for the AI/ML Automation Technology Summit. Register today.


This post was written for TechTalks by Herbert Roitblat, the author of Algorithms Are Not Enough: How to Create Artificial General Intelligence.

In a recent paper, the DeepMind team, (Silver et al., 2021) argue that rewards are enough for all kinds of intelligence. Specifically, they argue that “maximizing reward is enough to drive behavior that exhibits most if not all attributes of intelligence.” They argue that simple rewards are all that is needed for agents in rich environments to develop multi-attribute intelligence of the sort needed to achieve artificial general intelligence. This sounds like a bold claim, but, in fact, it is so vague as to be almost meaningless. They support their thesis, not by offering specific evidence, but by repeatedly asserting that reward is enough because the observed solutions to the problems are consistent with the problem having been solved.

The Silver et al. paper represents at least the third time that a serious proposal has been offered to demonstrate that generic learning mechanisms are sufficient to account for all learning. This one goes farther to also propose that it is sufficient to attain intelligence, and in particular, sufficient to explain artificial general intelligence.

The first significant project that I know of that attempted to show that a single learning mechanism is all that is needed is B.F. Skinner’s version of behaviorism, as represented by his book Verbal Behavior. This book was devastatingly critiqued by Noam Chomsky (1959), who called Skinner’s attempt to explain human language production an example of “play acting at science.” The second major proposal was focused on past-tense learning of English verbs by Rumelhart and McClelland (1986), which was soundly criticized by Lachter and Bever (1988). Lachter and Bever showed that the specific way that Rumelhart and McClelland chose to represent the phonemic properties of the words that their connectionist system was learning to transform contained the specific information that would allow the system to succeed.

Both of these previous attempts failed in that they succumbed to confirmation bias. As Silver et al. do, they reported data that were consistent with their hypothesis without consideration of possible alternative explanations and they interpreted ambiguous data as supportive. All three projects failed to take account of the implicit assumptions that were built into their models. Without these implicit TRICS (Lachter and Bever’s name for the “the representations it crucially supposes”), there would be no intelligence in these systems.

The Silver et al. argument can be summarized by three propositions:

  1. Maximizing reward is enough to produce intelligence: “The generic objective of maximising reward is enough to drive behaviour that exhibits most if not all abilities that are studied in natural and artificial intelligence.”
  2. Intelligence is the ability to achieve goals: “Intelligence may be understood as a flexible ability to achieve goals.”
  3. Success is measured by maximizing reward: “Thus, success, as measured by maximising reward.”

In short, they propose that the definition of intelligence is the ability to maximize reward and at the same time they use the maximization of reward to explain the emergence of intelligence. Following the 17th Century author Moliere, some philosophers would call this kind of argument virtus dormativa (a sleep-inducing virtue). When asked to explain why opium causes sleep, Moliere’s bachelor (in the Imaginary Invalid) responds that it has a dormitive property (a sleep-inducing virtue). That, of course, is just a naming of the property for which an explanation is being sought. Reward maximization plays a similar role in Silver’s hypothesis, which is also entirely circular. Achieving goals is both the process of being intelligent and explains the process of being intelligent.

B. F. Skinner Verbal Behavior

Above: American psychologist Burrhus Frederic Skinner, known for his work on behaviorism (Source: Wikipedia, with modifications).

Image Credit: Nintendo

Chomsky also criticized Skinner’s approach because it assumed that for any exhibited behavior there must have been some reward. If someone looks at a painting and says “Dutch,” Skinner’s analysis assumes that there must be some feature of the painting for which the utterance “Dutch” had been rewarded. But, Chomsky, argues, the person could have said anything else, including “crooked,” “hideous,” or “let’s get some lunch.” Skinner cannot point to the specific feature of the painting that caused any of these utterance or provide any evidence that that utterance was previously rewarded in the presence of that feature. To quote an 18th Century French author (Voltaire), his Dr. Pangloss (in Candide) says: “Observe that the nose has been formed to bear spectacles — thus we have spectacles.” There must be a problem that is solved by any feature and in this case, he claims that the nose has been formed just so spectacles can be held up. Pangloss also says “It is demonstrable … that things cannot be otherwise than as they are; for all being created for an end, all is necessarily for the best end.” For Silver et al. that end is the solution to a problem and intelligence has been learned just for that purpose, but we do not necessarily know what that purpose is or what environmental features induced it. There must have been something.

Gould and Lewontin (1979) famously exploit Dr. Pangloss to criticize what they call the “adaptationist” or “Panglossian” paradigm in evolutionary biology. The core adaptationist tenet is that there must be an adaptive explanation for any feature. They point out that the highly decorated spandrels (the approximately triangular shape where two arches meet) of St. Mark’s Cathedral in Venice is an architectural feature that follows from the choice to design the Cathedral with four arches, rather than the driver of the architectural design. The spandrels followed the choice of arches, not the other way around. Once the architect chose the arches, the spandrels were necessary, and they could be decorated. Gould and Lewontin say “Every fan-vaulted ceiling must have a series of open spaces along the midline of the vault, where the sides of the fans intersect between the pillars. Since the spaces must exist, they are often used for ingenious ornamental effect.”

Gould and Lewontin give another example — an adaptationist explanation of Aztec sacrificial cannibalism. Aztecs engaged in human sacrifice. An adaptationist explanation was that the system of sacrifice was a solution to the problem of a chronic shortage of meat. The limbs of victims were frequently eaten by certain high-status members of the community. This “explanation” argues that the system of myth, symbol, and tradition that constituted this elaborate ritualistic murder were the result of a need for meat, whereas the opposite was probably true. Each new king had to outdo his predecessor with increasingly elaborate sacrifices of larger numbers of individuals; the practice seems to have increasingly strained the economic resources of the Aztec empire. Other sources of protein were readily available, and only certain privileged people, who had enough food already, ate only certain parts of the sacrificial victims. If getting meat into the bellies of starving people were the goal, then one would expect that they would make more efficient use of the victims and spread the food source more broadly. The need for meat is unlikely to be a cause of human sacrifice; rather it would seem to be a consequence of other cultural practices that were actually maladaptive for the survival of the Aztec civilization.

To paraphrase Silver et al.’s argument so far, if the goal is to be wealthy, it is enough to accumulate a lot of money. Accumulating money is then explained by the goal of being wealthy. Being wealthy is defined by having accumulated a lot of money. Reinforcement learning provides no explanation for how one goes about accumulating money or why that should be a goal. Those are determined, they argue, by the environment.

Reward by itself, then, is not really enough, at a minimum, the environment also plays a role. But there is more to adaptation than even that. Adaptation requires a source of variability from which certain traits can be selected. The primary source of this variation in evolutionary biology is mutation and recombination. Reproduction in any organism involves a copying of genes from the parents into the children. The copying process is less than perfect and errors are introduced. Many of those errors are fatal, but some of them are not and are then available for natural selection. In sexually reproducing species, each parent contributes a copy (along with any potential errors) of its genes and the two copies allow for additional variability through recombination (some genes from one parent and some from the other are passed to the next generation).

Reward is the selection. Alone, it is not sufficient. As Dawkins pointed out, evolutionary reward is the passing of a specific gene to the next generation. The reward is at the gene level, not at the level of the organism or the species. Anything that increases the chances of a gene being passed from one generation to the next mediates that reward, but notice that the genes themselves are not capable of being intelligent.

In addition to reward and environment, other factors also play a role in evolution and reinforcement learning. Reward can only select from the raw material that is available. If we throw a mouse into a cave, it does not learn to fly and to use sonar like a bat. Many generations and perhaps millions of years would be required to accumulate enough mutations and even then, there is no guarantee that it would evolve the same solutions to the cave problem that bats have evolved. Reinforcement learning is a purely selective process. Reinforcement learning is the process of increasing the probabilities of actions that together form a policy for dealing with a certain environment. Those actions must already exist for them to be selected. At least for now, those actions are supplied by the genes in evolution and by the program designers in artificial intelligence.

richard dawkins the selfish gene

Above: British biologist Richard Dawkins, author of The Selfish Gene (Source: Flickr, modified under Creative Commons license).

Image Credit: Nintendo

As Lachter and Bever pointed out, learning does not start with a tabula rasa, as claimed by Silver et al., but with a set of representational commitments. Skinner based most of his theory building on the reinforcement learning of animals, particularly pigeons and rats. He and many other investigators studied them in stark environments. For the rats, that was a chamber that contained a lever for the rat to press and a feeder to deliver the reward. There was not much else that the rat could do but to wander a short distance and contact the lever. Pigeons were similarly tested in an environment that contained a pecking key (usually a plexiglass circle on the wall that could be illuminated) and a grain feeder to deliver the reward. In both situations, the animal had a pre-existing bias to respond in the way that the behaviorist wanted. Rats would contact the lever and, it turned out, pigeons would peck an illuminated key in a dark box even without a reward. This proclivity to respond in a desirable way made it easy to train the animal and the investigator could study the effects of reward patterns without a lot of trouble, but it was not for many years that it was discovered that the choice of a lever or a pecking key was not simply an arbitrary convenience, but was an unrecognized “fortunate choice.”

The same unrecognized fortunate choices occurred when Rumelhart and McClelland built their past-tense learner. They chose a representation that just happened to reflect the very information that they wanted their neural network to learn. It was not a tabula rasa relying solely on a general learning mechanism. Silver et al. (in another paper with an overlapping set of authors) also got “lucky” in their development of AlphaZero, to which they refer in the present paper.

In the previous paper, they give a more detailed account of AlphaZero along with this claim:

Our results demonstrate that a general-purpose reinforcement learning algorithm can learn, tabula rasa — without domain-specific human knowledge or data, as evidenced by the same algorithm succeeding in multiple domains — superhuman performance across multiple challenging games.

They also note:

AlphaZero replaces the handcrafted knowledge and domain-specific augmentations used in traditional game-playing programs with deep neural networks, a general-purpose reinforcement learning algorithm, and a general-purpose tree search algorithm.

They do not include explicit game-specific computational instructions, but they do include a substantial human contribution to solving the problem. For example, their model includes a “neural network fθ(s) [which] takes the board position s as an input and outputs a vector of move probabilities.” In other words, they do not expect the computer to learn that it is playing a game, or that the game is played by taking turns, or that it cannot just stack the stones (the go game pieces) into piles or throw the game board on the floor. They provide many other constraints as well, for example, by having the machine play against itself. The tree representation they use was once a huge innovation for representing game playing. The branches of the tree correspond to the range of possible moves. No other action is possible. The computer is also provided with a way to search the tree using a Monte Carlo tree search algorithm and it is provided with the rules of the game.

Far from being a tabula rasa, then, AlphaZero is given substantial prior knowledge, which greatly constrains the range of possible things it can learn. So it is not clear what “reward is enough” means even in the context of learning to play go. For reward to be enough, it would have to work without these constraints. Moreover, it is unclear whether even a general game-playing system would count as an example of general learning in less constrained environments. AlphaZero is a substantial contribution to computational intelligence, but its contribution is largely the human intelligence that went into designing it, to identifying the constraints that it would operate in, and to reducing the problem of playing a game to a directed tree search. Furthermore, its constraints do not even apply to all games, but only games of a limited type. It can only play certain kinds of board games that can be characterized as a tree search where the learner can take a board position as input and output a probability vector. There is no evidence that it could even learn another kind of board game, such as Monopoly or even parchisi.

Absent the constraints, reward does not explain anything. AlphaZero is not a model for all kinds of learning, and certainly not for general intelligence.

Silver et al. treat general intelligence as a quantitative problem.

“General intelligence, of the sort possessed by humans and perhaps also other animals, may be defined as the ability to flexibly achieve a variety of goals in different contexts.”

How much flexibility is required? How wide a variety of goals? If we had a computer that could play go, checkers, and chess interchangeably, that would still not constitute general intelligence. Even if we added another game, shogi, we still would have exactly the same computer that would still work by finding a model that “takes the board position s as an input and outputs a vector of move probabilities.” The computer is completely incapable of entertaining any other “thoughts” or solving any problem that cannot be represented in this specific way.

The “general” in artificial general intelligence is not characterized by the number of different problems it can solve, but by the ability to solve many types of problems. A general intelligence agent must be able to autonomously formulate its own representations. It has to invent its own approach to solving problems, selecting its own goals, representations, methods, and so on. So far, that is all the purview of human designers who reduce problems to forms that a computer can solve through the adjustment of model parameters. We cannot achieve general intelligence until we can remove the dependency on humans to structure problems. Reinforcement learning, as a selective process, cannot do it.

Conclusion: As with the confrontation between behaviorism and cognitivism, and the question of whether backpropagation was sufficient to learn linguistic past-tense transformations, these simple learning mechanisms only appear to be sufficient if we ignore the heavy burden carried by other, often unrecognized constraints. Rewards select among available alternatives but they cannot create those alternatives. Behaviorist rewards work so long as one does not look too closely at the phenomena and as long as one assumes that there must be some reward that reinforces some action. They are good after the fact to “explain” any observed actions, but they do not help outside the laboratory to predict which actions will be forthcoming. These phenomena are consistent with reward, but it would be a mistake to think that they are caused by reward.

Contrary to Silver et al.’s claims, reward is not enough.

Herbert Roitblat is the author of Algorithms Are Not Enough: How to Create Artificial General Intelligence (MIT Press, 2020).

This story originally appeared on Bdtechtalks.com. Copyright 2021

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Charli.ai CEO on training AI-driven personal assistants

Where does your enterprise stand on the AI adoption curve? Take our AI survey to find out.


Charli.ai’s digital personal assistant that employs AI to organize a user’s life is expected to enter beta this summer. Known as Charli, this personal assistant promises to do everything from finding all related documents in email and cloud storage systems to organizing receipts and tax filings.

VentureBeat sat down with Charli.ai founder and CEO Kevin Collins to better understand how AI is about to transform the way we all work.

This interview has been edited for brevity and clarity.

VentureBeat: What exactly does Charli do?

Kevin Collins: It’s really about keeping yourself organized and finding what you need. If you go into email and try to find something you’re looking for, it’s a nightmare trying to search for it. If you’re going into your cloud storage to find documents that you filed a year ago, it’s a nightmare to find them. Charli is really about keeping your digital content organized, allowing you to find things instantly when you need to find them. That includes files that you might be keeping in email or cloud storage, but it also includes links to the internet. We get inundated with links to every piece of content that’s out there. [Keeping] track of it can just be handed off to Charli.

VentureBeat: Sounds like everybody can now have their own digital assistant?

Collins: We wanted Charli to be like that personal assistant for you. We’ve got a whole natural language processor in the front end of it that Charlie can understand. For example, if I say “Charli, show my expenses for this month,” it will understand that. It’s also designed for speech that we haven’t yet integrated. We haven’t integrated with Alexa or Google. We’re still at an early stage. We’re coming out of our beta program in the summer. The whole idea about Charli was providing a personal assistant for me. The name Charli is a bit of a play on chief of staff. We dropped the E because we wanted [the name] to be gender-neutral.

Venture Beat: What is the business strategy for Charli?

Collins: There will be a free version of Charli so that people are comfortable with it just doing the organization. If there are more sophisticated use cases, there’s going to be a subscription-based service. If you sign up for the premium package or professional package, you’ll get different access to different aspects of the AI. For example, the pro package will allow you to enable Charli to read your invoices and receipts, pull out the very specific information, and then send that off to QuickBooks for you.

Venture Beat: What role do you think AI has to play in this helping people and kind of navigating their work from anywhere scenario?

Collins: Getting back to that pre-pandemic normal isn’t going to happen. There’s going to be a lot more remote work. That’s going to put a lot of pressure on organizations to be far more productive with the remote workers. There’s no longer the ability to walk down the hallway to talk to somebody. There’s no longer the ability to just put something on the internal mail and send it off to somebody. There needs to be a new way of getting employees productive, and that means more automation. AI has a massive role to play.

Venture Beat: Will employees embrace that idea, or are they fearful of AI?

Collins: When we talk to our customers and our users, they’re quite excited about the potential of AI. They see AI taking a big load off for them. There’s a glimmer of hope that AI can take some of the pain away, but there is an undercurrent of fear of what that means from a job perspective. Is it going to take jobs away? And the short answer to that is yes. You will see a shift in the labor market. A lot of these manual tedious jobs that require a lot of people are now going to go away, but there’s going to be demand for other types of jobs, especially in skilled labor. There is a shift in the labor market that’s going to affect different people in different ways. Some are going to be negatively impacted, while others going to be positively impacted.

Venture Beat: What else are people fearful of when it comes to AI?

Collins: The other fear, and I think it’s a real one, is that there is an inherent bias in AI. We’re starting to talk about that a lot more. There’s more emphasis on auditing the AI algorithms. There’s a lot more emphasis on making sure that they are behaving in a standardized way that is positive rather than negative. AI is mathematical models. They’re trained to behave a certain way, and they’re trained by people and organizations. There is an underlying fear that they’re going to be trained in a negative way.

VentureBeat: Do you think we will get to the point where we have instances of AI that are optimized for opposite outcomes that will eventually do battle with one another?

Collins: I would say yes. We want Charli to be biased [in your favor] because it’s your personal assistant to a certain degree. But we need to introduce diversity into that, so that means competing viewpoints. If you’ve got various decisions that have to be made, there has to be a decision criterion that comes from diverse AI models. You have to be able to consult those diverse models and make a decision on which one suits the case or the instance. AI has to be very contextually aware, which means different training, different algorithms, and they have to compete against each other for the right decision at the right time for the right reason.

VentureBeat: Don’t I just wind up with multiple personal secretaries making different recommendations based on their biases?

Collins: That actually gets into some of our intellectual property because we don’t want multiple personal assistants. You want that one personal assistant, but you want that personal assistant to have a diverse set of inputs in order to make the right decision for you. There are competing AI models that have to really fight each other in order to make a confidence-level decision for you. It’s your personal assistant inspecting these competing decisions and making the right decision for you. You don’t have to deal with multiple personal assistants, you deal with one, but you want the confidence in that one to make the right decision.

Venture Beat: How long does it take Charli to be trained?

Collins: It’s a loaded question. The short answer is this is really hard. To get Charli to be brilliant at doing all this takes a couple of years, but for Charli to learn some basics to understand how to organize your life is quite quick. I’ve been using Charli now for a little over a year. Charli rarely comes back and asks me questions anymore, whereas at the beginning Charli was asking me a lot of clarification questions. It’s just not a matter of going to the shelf, getting a machine learning algorithm, and thinking it’s going to work out of the box. That doesn’t happen anywhere. You can go and get all these algorithms out of the box, but you have to invest in the data, the training, the testing, and the automation of the continuous learning processes. That is a very heavy investment. For other companies considering this, they’re really going to be in for a bit of a shock of how much work it is to get this right.

VentureBeat: How will we ultimately know Charli is getting it right?

Collins: We’ve invested a lot in guardrails because we need the AI to behave a certain way. There are a lot of guardrails around laws, restrictions, rules, and policies for humans. We need those types of constraints in AI as well. That is another heavy investment area because we just don’t want AI to think outside the box for us. We want AI to simply take the pain and aggravation of automation off of our plate. There’s a massive investment that went into testing AI.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
Tech News

SEO is tough. This Google SEO training can make you even tougher and earn all the web traffic you deserve

TLDR: The 2021 Google SEO and SERP Business Marketing Bundle is a one-stop deepdive into the steps for earning top results in Google searches and driving traffic to your content.

Keyword research, backlinking and other tried and true tactics for optimizing search results still matter. But it’s grown increasingly tough for brand and website managers to keep up with the changing factors that determine how a search engine will evaluate, then rank a particular page or piece of content in its results.

But search results vary by the type of information being searched, page variables, and even the device being used, making it more and more difficult to even know whether the search intelligence you’re following is even giving you the correct SEO picture for today’s latest search changes.

The 2021 Google SEO and SERP Business Marketing Bundle ($34.98, over 90 percent off, from NTW Deals) can give interested web watchers insight into what’s driving today’s search engine results, and how to capture some of that attention for their content and products.

Over 11 courses covering more than 26 hours of material, web watchers will examine all of the hottest issues in SEO, including social media marketing, link building Google Citations, and more that can help boost a brand’s presence online and offer a better chance for nabbing that coveted top spot in Google search results.

Even if you’ve never considered the importance of search, the Perfect On-Page SEO in 1 Day That Users and Google Will Love is a great place to start. The course explains the key terms to know and the key tools at your disposal to improve the clickthrough rate to your site or content from Google search results. By knowing how to build high converting, targeted SEO, users are armed with the ability to not only increase traffic, but get a better idea of what visitors or customers really want.

Once you’ve established the basic knowledge, the remaining courses in this package help explore ever deeper into proper SEO and page management etiquette, including the proven SEO and social media marketing strategies to help reach over 1 million people.

From Google’s own link building tactics to optimizing for both voice and local searches to even how to achieve a proper Google citation so your brand shows up in multiple locations to maximize your reach, students will find loads of helpful techniques for earning every possible drop of web traffic.

There are even a pair of technical SEO courses for getting your brand in Google searches with enhanced rich snippet listings, training in ensuring your images are properly tagged for SEO, and even the most advanced training in proper keywords and SERP results that expert users live by.

The 2021 Google SEO and SERP Business Marketing Bundle includes almost $2,200 worth of intensive coursework, but right now, the entire collection is available for just over $3 per course at $34.98.

Prices are subject to change.

Repost: Original Source and Author Link

Categories
AI

Nvidia benchmark tests show impressive gains in training AI models

Where does your enterprise stand on the AI adoption curve? Take our AI survey to find out.


Nvidia announced that systems based on its graphics processor units (GPUs) are delivering 3 to 5 times better performance when it comes to training AI models than they did a year ago, according to the latest MLPerf benchmarks published yesterday.

The MLPerf benchmark is maintained by the MLCommons Association, a consortium backed by Alibaba, Facebook AI, Google, Intel, Nvidia, and others that acts as an independent steward.

The latest set of benchmarks span eight different workloads covering a range of use cases for AI model training, including speech recognition, natural language processing, object detection, and reinforcement learning. Nvidia claims its OEM partners were the only systems vendors to run all the workloads defined by the MLPerf benchmark across a total of 4,096 GPUs. Dell, Fujitsu, Gigabyte Technology, Inspur, Lenovo, Nettrix, and Supermicro all provided on-premises systems certified by Nvidia that were used to run the benchmark.

Nvidia claims that overall it improved more than any of its rivals, delivering as much as 2.1 times more performance than the last time the MLPerf benchmarks were run. Those benchmarks provide a reliable point of comparison that data scientists and IT organizations can use to make an apples-to-apples comparison between systems, said Paresh Kharya, senior director for product management for Nvidia. “MLPerf is an industry-standard benchmark,” he said.

Trying to quantify the unknown

It’s not clear to what degree IT organizations are relying on consortiums’ benchmarks to decide what class of system to acquire. Each workload deployed by an IT team is fairly unique, so benchmarks are no guarantee of actual performance. Arguably, the most compelling thing about the latest benchmark results is they show that systems acquired last year or even earlier continue to improve in overall performance as software updates are made. That increased level of performance could reduce the pace at which Nvidia-based systems may need to be replaced.

Of course, the number of organizations investing in on-premises IT platforms to run AI workloads is unknown. Some certainly prefer to train AI models in on-premises IT environments for a variety of security, compliance, and cloud networking reasons. However, the cost of acquiring a GPU-based server tends to make consuming GPUs on demand via a cloud service a more attractive alternative for training AI models until the organization hits a certain threshold in number of models being trained simultaneously.

Alternatively, providers of on-premises platforms are increasingly offering pricing plans that enable organizations to consume on-premises IT infrastructure using the same model as a cloud service provider.

Other classes of processors might end up being employed to train an AI model. Right now, however, GPUs — thanks to their inherent parallelization capabilities — have proven themselves to be the most efficient option.

Regardless of the platform employed, the number of AI models being trained continues to steadily increase. There is no shortage of use cases involving applications that could be augmented using AI. The challenge in many organizations now is prioritizing AI projects given the cost of GPU-based platforms. Of course, as consumption of GPUs increases, the cost of manufacturing them will eventually decline.

As organizations create their road maps for AI, they should be able to safely assume that both the amount of time required and the total cost of training an AI model will continue to decline in the years ahead — even allowing for the occasional processor shortage brought on by unpredictable “black swan” events such as the COVID-19 pandemic.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link