Categories
AI

Is DeepMind’s new reinforcement learning system a step toward general AI?

All the sessions from Transform 2021 are available on-demand now. Watch now.


This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

One of the key challenges of deep reinforcement learning models — the kind of AI systems that have mastered Go, StarCraft 2, and other games — is their inability to generalize their capabilities beyond their training domain. This limit makes it very hard to apply these systems to real-world settings, where situations are much more complicated and unpredictable than the environments where AI models are trained.

But scientists at AI research lab DeepMind claim to have taken the “first steps to train an agent capable of playing many different games without needing human interaction data,” according to a blog post about their new “open-ended learning” initiative. Their new project includes a 3D environment with realistic dynamics and deep reinforcement learning agents that can learn to solve a wide range of challenges.

The new system, according to DeepMind’s AI researchers, is an “important step toward creating more general agents with the flexibility to adapt rapidly within constantly changing environments.”

The paper’s findings show some impressive advances in applying reinforcement learning to complicated problems. But they are also a reminder of how far current systems are from achieving the kind of general intelligence capabilities that the AI community has been coveting for decades.

The brittleness of deep reinforcement learning

The key advantage of reinforcement learning is its ability to develop behavior by taking actions and getting feedback, similar to the way humans and animals learn by interacting with their environment. Some scientists describe reinforcement learning as “the first computational theory of intelligence.”

The combination of reinforcement learning and deep neural networks, known as deep reinforcement learning, has been at the heart of many advances in AI, including DeepMind’s famous AlphaGo and AlphaStar models. In both cases, the AI systems were able to outmatch human world champions at their respective games.

But reinforcement learning systems are also notoriously renowned for their lack of flexibility. For example, a reinforcement learning model that can play StarCraft 2 at an expert level won’t be able to play a game with similar mechanics (e.g., Warcraft 3) at any level of competency. Even slight changes to the original game will considerably degrade the AI model’s performance.

“These agents are often constrained to play only the games they were trained for — whilst the exact instantiation of the game may vary (e.g. the layout, initial conditions, opponents) the goals the agents must satisfy remain the same between training and testing. Deviation from this can lead to catastrophic failure of the agent,” DeepMind’s researchers write in a paper that provides the full details on their open-ended learning.

Humans, on the other hand, are very good at transferring knowledge across domains.

The XLand environment

The goal of DeepMind’s new project was to create “an artificial agent whose behaviour generalises beyond the set of games it was trained on.”

To this end, the team created XLand, an engine that can generate 3D environments composed of static topology and moveable objects. The game engine simulates rigid-body physics and allows players to use the objects in various ways (e.g., create ramps, block paths, etc.).

XLand is a rich environment in which you can train agents on a virtually unlimited number of tasks. One of the main advantages of XLand is the capability to use programmatic rules to automatically generate a vast array of environments and challenges to train AI agents. This addresses one of the key challenges of machine learning systems, which often require vast amounts of manually curated training data.

According to the blog post, the researchers created “billions of tasks in XLand, across varied games, worlds, and players.” The games include very simple goals such as finding objects to more complex settings in which the AI agents much weigh the benefits and tradeoffs of different rewards. Some of the games include cooperation or competition elements involving multiple agents.

Deep reinforcement learning

DeepMind uses deep reinforcement learning and a few clever tricks to create AI agents that can thrive in the XLand environment.

The reinforcement learning model of each agent receives a first-person view of the world, the agent’s physical state (e.g., whether it holding an object), and its current goal. Each agent finetunes the parameters of its policy neural network to maximize its rewards on the current task. The neural network architecture contains an attention mechanism to ensure the agent can balance optimization for the subgoals required to accomplish the main goal.

Once the agent masters its current challenge, the computational task generator creates a new challenge for the agent. Each new task is generated according to the agent’s training history and in a way to help distribute the agent’s skills across a vast range of challenges.

DeepMind also used its vast computational resources (courtesy of its owner Alphabet Inc.) to train a large population of agents in parallel and transfer learned parameters across different agents to improve the general capabilities of the reinforcement learning systems.

DeepMind XLand agent training
DeepMind uses a multi-step and population-based mechanism to train many reinforcement learning agents

The performance of the reinforcement learning agents was evaluated based on their general ability to accomplish a wide range of tasks they had not been trained on. Some of the test tasks include well-known challenges such as “capture the flag” and “hide and seek.”

According to DeepMind, each agent played around 700,000 unique games in 4,000 unique worlds within XLand and went through 200 billion training steps across 3.4 million unique tasks (in the paper, the researchers write that 100 million steps are equivalent to approximately 30 minutes of training).

“At this time, our agents have been able to participate in every procedurally generated evaluation task except for a handful that were impossible even for a human,” the AI researchers wrote. “And the results we’re seeing clearly exhibit general, zero-shot behaviour across the task space.”

Zero-shot machine learning models can solve problems that were not present in their training dataset. In a complicated space such as XLand, zero-shot learning might imply that the agents have obtained fundamental knowledge about their environment as opposed to memorizing sequences of image frames in specific tasks and environments.

The reinforcement learning agents further manifested signs of generalized learning when the researchers tried to adjust them for new tasks. According to their findings, 30 minutes of fine-tuning on new tasks was enough to create an impressive improvement in a reinforcement learning agent trained with the new method. In contrast, an agent trained from scratch for the same amount of time would have near-zero performance on most tasks.

High-level behavior

According to DeepMind, the reinforcement learning agents exhibit the emergence of “heuristic behavior” such as tool use, teamwork, and multi-step planning. If proven, this can be an important milestone. Deep learning systems are often criticized for learning statistical correlations instead of causal relations. If neural networks could develop high-level notions such as using objects to create ramps or cause occlusions, it could have a great impact on fields such as robotics and self-driving cars, where deep learning is currently struggling.

But those are big ifs, and DeepMind’s researchers are cautious about jumping to conclusions on their findings. “Given the nature of the environment, it is difficult to pinpoint intentionality — the behaviours we see often appear to be accidental, but still we see them occur consistently,” they wrote in their blog post.

But they are confident that their reinforcement learning agents “are aware of the basics of their bodies and the passage of time and that they understand the high-level structure of the games they encounter.”

Such fundamental self-learned skills are another one of the highly sought goals of the artificial intelligence community.

Theories of intelligence

Some of DeepMind’s top scientists published a paper recently in which they hypothesize that a single reward and reinforcement learning are enough to eventually reach artificial general intelligence (AGI). An intelligent agent with the right incentives can develop all kinds of capabilities such as perception and natural language understanding, the scientists believe.

Although DeepMind’s new approach still requires the training of reinforcement learning agents on multiple engineered rewards, it is in line with their general perspective of achieving AGI through reinforcement learning.

“What DeepMind shows with this paper is that a single RL agent can develop the intelligence to reach many goals, rather than just one,” Chris Nicholson, CEO of Pathmind, told TechTalks. “And the skills it learns in accomplishing one thing can generalize to other goals. That is very similar to how human intelligence is applied. For example, we learn to grab and manipulate objects, and that is the foundation of accomplishing goals that range from pounding a hammer to making your bed.”

Nicholson also believes that other aspects of the paper’s findings hint at progress toward general intelligence. “Parents will recognize that open-ended exploration is precisely how their toddlers learn to move through the world. They take something out of a cupboard, and put it back in. They invent their own small goals—which may seem meaningless to adults — and they master them,” he said. “DeepMind is programmatically setting goals for its agents within this world, and those agents are learning how to master them one by one.”

The reinforcement learning agents have also shown signs of developing embodied intelligence in their own virtual world, Nicholson said, like the kind humans have. “This is one more indication that the rich and malleable environment that people learn to move through and manipulate is conducive to the emergence of general intelligence, and that the biological and physical analogies of intelligence can guide further work in AI,” he said.

Sathyanaraya Raghavachary, Associate Professor of Computer Science at the University of Southern California, is a bit more skeptical on the claims made in DeepMind’s paper, especially the conclusions on proprioception, awareness of time, and high-level understanding of goals and environments.

“Even we humans are not fully aware of our bodies, let alone those VR agents,” Raghavachary said in comments to TechTalks, adding that perception of the body requires an integrated brain that is co-designed for suitable body awareness and situatedness in space. “Same with the passage of time — that too would require a brain that has memory of the past, and a sense for time in relation to that past. What they (paper authors) might mean relates to the agents’ tracking progressive changes in the environment resulting from their actions (eg. as a resulting of moving a purple pyramid), state changes which the underlying physics simulator would generate.

Raghavachary also points out, if the agents could understand the high-level structure of their tasks, they would not need 200 billion steps of simulated training to reach optimal results.

“The underlying architecture lacks what it takes, to achieve these three things (body awareness, time passage, understanding high-level task structure) they point out in conclusion,” he said. “Overall, XLand is simply ‘more of the same.’”

The gap between simulation and the real world

In a nutshell, the paper proves that if you can create a complex enough environment, design the right reinforcement learning architecture, and expose your models to enough experience (and have a lot of money to spend on compute resources), you’ll be able to generalize to various kinds of tasks in the same environment. And this is basically how natural evolution has delivered human and animal intelligence.

In fact, DeepMind has already done something similar with AlphaZero, a reinforcement learning model that managed to master multiple two-player turn-based games. The XLand experiment has extended the same notion to a much greater level by adding the zero-shot learning element.

But while I think that the experience from the XLand-trained agents will ultimately be transferable to real-world applications such as robotics and self-driving cars, I don’t think it will be a breakthrough. You’ll still need to make compromises (such as creating artificial limits to reduce the complexity of the real world) or create artificial enhancements (such as imbuing the machine learning models with prior knowledge or extra sensors).

DeepMind’s reinforcement learning agents might have become the masters of the virtual XLand. But their simulated world doesn’t even have a fraction of the intricacies of the real world. That gap will continue to remain a challenge for a long time.

Ben Dickson is a software engineer and the founder of TechTalks. He writes about technology, business, and politics.

This story originally appeared on Bdtechtalks.com. Copyright 2021

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Amazon launches HealthLake in general availability

Join executive leaders at the AI at the Edge & IoT Summit. Watch now!


Amazon today announced the general availability of HealthLake, a HIPAA-eligible (but compliant by default) service for health care and life sciences organizations to ingest, store, query, and analyze health data. First launched in preview last December at Amazon’s re:Invent conference, HealthLake leverages machine learning to extract medical information from unstructured data and organize, index, and store that information in chronological order.

Health care organizations are increasingly embracing emerging technologies, including the cloud and big data analytics. In a recent survey, 95% of executives in the industry said that trends including automation, cybersecurity, and hybrid cloud will impact how they provide patient care in the future. Cost savings is a motivator — particularly when it comes to the cloud. One report found that 88% of health care organizations using cloud computing have reduced IT costs by an average of 20% annually.

Analyzing health data

HealthLake, a new service that’s part of Amazon’s Amazon Web Services (AWS) for Health portfolio, moves health data from on-premises systems to a data lake in the AWS cloud. Amazon says that HealthLake uses “specially tuned” machine learning models to understand medical terminology and identify and tag clinical information. The service then enriches data with standardized labels for medications, conditions, diagnoses, and more.

HealthLake also indexes events like patient visits into a single timeline, allowing customers to apply analytics and machine learning on top. And it recognizes interoperability standards including the Fast Healthcare Interoperability Resources (FHIR), a standard format to enable data sharing of health systems in a consistent format.

“More and more of our customers in the health care and life sciences space are looking to organize and make sense of their reams of data, but are finding this process challenging and cumbersome,” Amazon machine learning for AWS VP Swami Sivasubramanian said in a press release. “We built Amazon HealthLake to remove this heavy lifting for healthcare organizations so they can transform health data in the cloud in minutes and begin analyzing that information securely at scale. Alongside AWS for Health, we’re excited about how Amazon HealthLake can help medical providers, health insurers, and pharmaceutical companies provide patients and populations with data-driven, personalized, and predictive care.”

Amazon HealthLake is available today in the US East (N. Virginia), US East (Ohio), and US West (Oregon) AWS server regions, with additional availability coming soon.

Amazon views AI in health care as a frontier worth exploring — and perhaps its next major revenue driver. The AI in health care market is anticipated to reach $19.25 billion by 2026, driven in part by a demand for telemedical and remote monitoring services. The launch of HealthLake comes a year after Amazon debuted Transcribe Medical, a service that’s designed to transcribe medical speech for clinical staff in primary care settings. And in 2018, Amazon made three AWS offerings HIPAA eligible — Transcribe, Translate, and Comprehend — following on the heels of rival Google Cloud and Microsoft Azure.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
Tech News

Reinforcement learning can deliver general AI, says DeepMind

In their decades-long chase to create artificial intelligence, computer scientists have designed and developed all kinds of complicated mechanisms and technologies to replicate vision, language, reasoning, motor skills, and other abilities associated with intelligent life. While these efforts have resulted in AI systems that can efficiently solve specific problems in limited environments, they fall short of developing the kind of general intelligence seen in humans and animals.

In a new paper submitted to the peer-reviewed Artificial Intelligence journal, scientists at UK-based AI lab DeepMind argue that intelligence and its associated abilities will emerge not from formulating and solving complicated problems but by sticking to a simple but powerful principle: reward maximization.

Titled “Reward is Enough,” the paper, which is still in pre-proof as of this writing, draws inspiration from studying the evolution of natural intelligence as well as drawing lessons from recent achievements in artificial intelligence. The authors suggest that reward maximization and trial-and-error experience are enough to develop behavior that exhibits the kind of abilities associated with intelligence. And from this, they conclude that reinforcement learning, a branch of AI that is based on reward maximization, can lead to the development of artificial general intelligence.

Two paths for AI

 

One common method for creating AI is to try to replicate elements of intelligent behavior in computers. For instance, our understanding of the mammal vision system has given rise to all kinds of AI systems that can categorize images, locate objects in photos, define the boundaries between objects, and more. Likewise, our understanding of language has helped in the development of various natural language processing systems, such as question answering, text generation, and machine translation.

These are all instances of narrow artificial intelligence, systems that have been designed to perform specific tasks instead of having general problem-solving abilities. Some scientists believe that assembling multiple narrow AI modules will produce higher intelligent systems. For example, you can have a software system that coordinates between separate computer vision, voice processing, NLP, and motor control modules to solve complicated problems that require a multitude of skills.

A different approach to creating AI, proposed by the DeepMind researchers, is to recreate the simple yet effective rule that has given rise to natural intelligence. “[We] consider an alternative hypothesis: that the generic objective of maximising reward is enough to drive behaviour that exhibits most if not all abilities that are studied in natural and artificial intelligence,” the researchers write.

This is basically how nature works. As far as science is concerned, there has been no top-down intelligent design in the complex organisms that we see around us. Billions of years of natural selection and random variation have filtered lifeforms for their fitness to survive and reproduce. Living beings that were better equipped to handle the challenges and situations in their environments managed to survive and reproduce. The rest were eliminated.

This simple yet efficient mechanism has led to the evolution of living beings with all kinds of skills and abilities to perceive, navigate, modify their environments, and communicate among themselves.

“The natural world faced by animals and humans, and presumably also the environments faced in the future by artificial agents, are inherently so complex that they require sophisticated abilities in order to succeed (for example, to survive) within those environments,” the researchers write. “Thus, success, as measured by maximising reward, demands a variety of abilities associated with intelligence. In such environments, any behaviour that maximises reward must necessarily exhibit those abilities. In this sense, the generic objective of reward maximization contains within it many or possibly even all the goals of intelligence.”

For example, consider a squirrel that seeks the reward of minimizing hunger. On the one hand, its sensory and motor skills help it locate and collect nuts when food is available. But a squirrel that can only find food is bound to die of hunger when food becomes scarce. This is why it also has planning skills and memory to cache the nuts and restore them in winter. And the squirrel has social skills and knowledge to ensure other animals don’t steal its nuts. If you zoom out, hunger minimization can be a subgoal of “staying alive,” which also requires skills such as detecting and hiding from dangerous animals, protecting oneself from environmental threats, and seeking better habitats with seasonal changes.

“When abilities associated with intelligence arise as solutions to a singular goal of reward maximisation, this may in fact provide a deeper understanding since it explains why such an ability arises,” the researchers write. “In contrast, when each ability is understood as the solution to its own specialised goal, the why question is side-stepped in order to focus upon what that ability does.”

Finally, the researchers argue that the “most general and scalable” way to maximize reward is through agents that learn through interaction with the environment.

Developing abilities through reward maximization

In the paper, the AI researchers provide some high-level examples of how “intelligence and associated abilities will implicitly arise in the service of maximising one of many possible reward signals, corresponding to the many pragmatic goals towards which natural or artificial intelligence may be directed.”

For example, sensory skills serve the need to survive in complicated environments. Object recognition enables animals to detect food, prey, friends, and threats, or find paths, shelters, and perches. Image segmentation enables them to tell the difference between different objects and avoid fatal mistakes such as running off a cliff or falling off a branch. Meanwhile, hearing helps detect threats where the animal can’t see or find prey when they’re camouflaged. Touch, taste, and smell also give the animal the advantage of having a richer sensory experience of the habitat and a greater chance of survival in dangerous environments.

Rewards and environments also shape innate and learned knowledge in animals. For instance, hostile habitats ruled by predator animals such as lions and cheetahs reward ruminant species that have the innate knowledge to run away from threats since birth. Meanwhile, animals are also rewarded for their power to learn specific knowledge of their habitats, such as where to find food and shelter.

The researchers also discuss the reward-powered basis of language, social intelligence, imitation, and finally, general intelligence, which they describe as “maximising a singular reward in a single, complex environment.”

Here, they draw an analogy between natural intelligence and AGI: “An animal’s stream of experience is sufficiently rich and varied that it may demand a flexible ability to achieve a vast variety of subgoals (such as foraging, fighting, or fleeing), in order to succeed in maximising its overall reward (such as hunger or reproduction). Similarly, if an artificial agent’s stream of experience is sufficiently rich, then many goals (such as battery-life or survival) may implicitly require the ability to achieve an equally wide variety of subgoals, and the maximisation of reward should therefore be enough to yield an artificial general intelligence.”

Reinforcement learning for reward maximization

Reinforcement learning is a special branch of AI algorithms that is composed of three key elements: an environment, agents, and rewards.

By performing actions, the agent changes its own state and that of the environment. Based on how much those actions affect the goal the agent must achieve, it is rewarded or penalized. In many reinforcement learning problems, the agent has no initial knowledge of the environment and starts by taking random actions. Based on the feedback it receives, the agent learns to tune its actions and develop policies that maximize its reward.

In their paper, the researchers at DeepMind suggest reinforcement learning as the main algorithm that can replicate reward maximization as seen in nature and can eventually lead to artificial general intelligence.

“If an agent can continually adjust its behaviour so as to improve its cumulative reward, then any abilities that are repeatedly demanded by its environment must ultimately be produced in the agent’s behaviour,” the researchers write, adding that, in the course of maximizing for its reward, a good reinforcement learning agent could eventually learn perception, language, social intelligence and so forth.

In the paper, the researchers provide several examples that show how reinforcement learning agents were able to learn general skills in games and robotic environments.

However, the researchers stress that some fundamental challenges remain unsolved. For instance, they say, “We do not offer any theoretical guarantee on the sample efficiency of reinforcement learning agents.” Reinforcement learning is notoriously renowned for requiring huge amounts of data. For instance, a reinforcement learning agent might need centuries worth of gameplay to master a computer game. And AI researchers still haven’t figured out how to create reinforcement learning systems that can generalize their learnings across several domains. Therefore, slight changes to the environment often require the full retraining of the model.

The researchers also acknowledge that learning mechanisms for reward maximization is an unsolved problem that remains a central question to be further studied in reinforcement learning.

Strengths and weaknesses of reward maximization

Patricia Churchland, neuroscientist, philosopher, and professor emerita at the University of California, San Diego, described the ideas in the paper as “very carefully and insightfully worked out.”

However, Churchland pointed it out to possible flaws in the paper’s discussion about social decision-making. The DeepMind researchers focus on personal gains in social interactions. Churchland, who has recently written a book on the biological origins of moral intuitions, argues that attachment and bonding is a powerful factor in social decision-making of mammals and birds, which is why animals put themselves in great danger to protect their children. 

“I have tended to see bonding, and hence other-care, as an extension of the ambit of what counts as oneself—‘me-and-mine,’” Churchland said. “In that case, a small modification to the [paper’s] hypothesis to allow for reward maximization to me-and-mine would work quite nicely, I think. Of course, we social animals have degrees of attachment—super strong to offspring, very strong to mates and kin, strong to friends and acquaintances etc., and the strength of types of attachments can vary depending on environment, and also on developmental stage.”

This is not a major criticism, Churchland said, and could likely be worked into the hypothesis quite gracefully.

“I am very impressed with the degree of detail in the paper, and how carefully they consider possible weaknesses,” Churchland said. “I may be wrong, but I tend to see this as a milestone.”

Data scientist Herbert Roitblat challenged the paper’s position that simple learning mechanisms and trial-and-error experience are enough to develop the abilities associated with intelligence. Roitblat argued that the theories presented in the paper face several challenges when it comes to implementing them in real life.

“If there are no time constraints, then trial and error learning might be enough, but otherwise we have the problem of an infinite number of monkeys typing for an infinite amount of time,” Roitblat said.

The infinite monkey theorem states that a monkey hitting random keys on a typewriter for an infinite amount of time may eventually type any given text.

Roitblat is the author of Algorithms are Not Enough, in which he explains why all current AI algorithms, including reinforcement learning, require careful formulation of the problem and representations created by humans.

“Once the model and its intrinsic representation are set up, optimization or reinforcement could guide its evolution, but that does not mean that reinforcement is enough,” Roitblat said.

In the same vein, Roitblat added that the paper does not make any suggestions on how the reward, actions, and other elements of reinforcement learning are defined.

“Reinforcement learning assumes that the agent has a finite set of potential actions. A reward signal and value function have been specified. In other words, the problem of general intelligence is precisely to contribute those things that reinforcement learning requires as a pre-requisite,” Roitblat said. “So, if machine learning can all be reduced to some form of optimization to maximize some evaluative measure, then it must be true that reinforcement learning is relevant, but it is not very explanatory.”

This article was originally published by Ben Dickson on TechTalks, a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech, and what we need to look out for. You can read the original article here.

 

Repost: Original Source and Author Link

Categories
AI

DeepMind says reinforcement learning is ‘enough’ to reach general AI

Elevate your enterprise data technology and strategy at Transform 2021.


In their decades-long chase to create artificial intelligence, computer scientists have designed and developed all kinds of complicated mechanisms and technologies to replicate vision, language, reasoning, motor skills, and other abilities associated with intelligent life. While these efforts have resulted in AI systems that can efficiently solve specific problems in limited environments, they fall short of developing the kind of general intelligence seen in humans and animals.

In a new paper submitted to the peer-reviewed Artificial Intelligence journal, scientists at U.K.-based AI lab DeepMind argue that intelligence and its associated abilities will emerge not from formulating and solving complicated problems but by sticking to a simple but powerful principle: reward maximization.

Titled “Reward is Enough,” the paper, which is still in pre-proof as of this writing, draws inspiration from studying the evolution of natural intelligence as well as drawing lessons from recent achievements in artificial intelligence. The authors suggest that reward maximization and trial-and-error experience are enough to develop behavior that exhibits the kind of abilities associated with intelligence. And from this, they conclude that reinforcement learning, a branch of AI that is based on reward maximization, can lead to the development of artificial general intelligence.

Two paths for AI

One common method for creating AI is to try to replicate elements of intelligent behavior in computers. For instance, our understanding of the mammal vision system has given rise to all kinds of AI systems that can categorize images, locate objects in photos, define the boundaries between objects, and more. Likewise, our understanding of language has helped in the development of various natural language processing systems, such as question answering, text generation, and machine translation.

These are all instances of narrow artificial intelligence, systems that have been designed to perform specific tasks instead of having general problem-solving abilities. Some scientists believe that assembling multiple narrow AI modules will produce higher intelligent systems. For example, you can have a software system that coordinates between separate computer vision, voice processing, NLP, and motor control modules to solve complicated problems that require a multitude of skills.

A different approach to creating AI, proposed by the DeepMind researchers, is to recreate the simple yet effective rule that has given rise to natural intelligence. “[We] consider an alternative hypothesis: that the generic objective of maximising reward is enough to drive behaviour that exhibits most if not all abilities that are studied in natural and artificial intelligence,” the researchers write.

This is basically how nature works. As far as science is concerned, there has been no top-down intelligent design in the complex organisms that we see around us. Billions of years of natural selection and random variation have filtered lifeforms for their fitness to survive and reproduce. Living beings that were better equipped to handle the challenges and situations in their environments managed to survive and reproduce. The rest were eliminated.

This simple yet efficient mechanism has led to the evolution of living beings with all kinds of skills and abilities to perceive, navigate, modify their environments, and communicate among themselves.

“The natural world faced by animals and humans, and presumably also the environments faced in the future by artificial agents, are inherently so complex that they require sophisticated abilities in order to succeed (for example, to survive) within those environments,” the researchers write. “Thus, success, as measured by maximising reward, demands a variety of abilities associated with intelligence. In such environments, any behaviour that maximises reward must necessarily exhibit those abilities. In this sense, the generic objective of reward maximization contains within it many or possibly even all the goals of intelligence.”

For example, consider a squirrel that seeks the reward of minimizing hunger. On the one hand, its sensory and motor skills help it locate and collect nuts when food is available. But a squirrel that can only find food is bound to die of hunger when food becomes scarce. This is why it also has planning skills and memory to cache the nuts and restore them in winter. And the squirrel has social skills and knowledge to ensure other animals don’t steal its nuts. If you zoom out, hunger minimization can be a subgoal of “staying alive,” which also requires skills such as detecting and hiding from dangerous animals, protecting oneself from environmental threats, and seeking better habitats with seasonal changes.

“When abilities associated with intelligence arise as solutions to a singular goal of reward maximisation, this may in fact provide a deeper understanding since it explains why such an ability arises,” the researchers write. “In contrast, when each ability is understood as the solution to its own specialised goal, the why question is side-stepped in order to focus upon what that ability does.”

Finally, the researchers argue that the “most general and scalable” way to maximize reward is through agents that learn through interaction with the environment.

Developing abilities through reward maximization

In the paper, the AI researchers provide some high-level examples of how “intelligence and associated abilities will implicitly arise in the service of maximising one of many possible reward signals, corresponding to the many pragmatic goals towards which natural or artificial intelligence may be directed.”

For example, sensory skills serve the need to survive in complicated environments. Object recognition enables animals to detect food, prey, friends, and threats, or find paths, shelters, and perches. Image segmentation enables them to tell the difference between different objects and avoid fatal mistakes such as running off a cliff or falling off a branch. Meanwhile, hearing helps detect threats where the animal can’t see or find prey when they’re camouflaged. Touch, taste, and smell also give the animal the advantage of having a richer sensory experience of the habitat and a greater chance of survival in dangerous environments.

Rewards and environments also shape innate and learned knowledge in animals. For instance, hostile habitats ruled by predator animals such as lions and cheetahs reward ruminant species that have the innate knowledge to run away from threats since birth. Meanwhile, animals are also rewarded for their power to learn specific knowledge of their habitats, such as where to find food and shelter.

The researchers also discuss the reward-powered basis of language, social intelligence, imitation, and finally, general intelligence, which they describe as “maximising a singular reward in a single, complex environment.”

Here, they draw an analogy between natural intelligence and AGI: “An animal’s stream of experience is sufficiently rich and varied that it may demand a flexible ability to achieve a vast variety of subgoals (such as foraging, fighting, or fleeing), in order to succeed in maximising its overall reward (such as hunger or reproduction). Similarly, if an artificial agent’s stream of experience is sufficiently rich, then many goals (such as battery-life or survival) may implicitly require the ability to achieve an equally wide variety of subgoals, and the maximisation of reward should therefore be enough to yield an artificial general intelligence.”

Reinforcement learning for reward maximization

Reinforcement learning

Reinforcement learning is a special branch of AI algorithms that is composed of three key elements: an environment, agents, and rewards.

By performing actions, the agent changes its own state and that of the environment. Based on how much those actions affect the goal the agent must achieve, it is rewarded or penalized. In many reinforcement learning problems, the agent has no initial knowledge of the environment and starts by taking random actions. Based on the feedback it receives, the agent learns to tune its actions and develop policies that maximize its reward.

In their paper, the researchers at DeepMind suggest reinforcement learning as the main algorithm that can replicate reward maximization as seen in nature and can eventually lead to artificial general intelligence.

“If an agent can continually adjust its behaviour so as to improve its cumulative reward, then any abilities that are repeatedly demanded by its environment must ultimately be produced in the agent’s behaviour,” the researchers write, adding that, in the course of maximizing for its reward, a good reinforcement learning agent could eventually learn perception, language, social intelligence and so forth.

In the paper, the researchers provide several examples that show how reinforcement learning agents were able to learn general skills in games and robotic environments.

However, the researchers stress that some fundamental challenges remain unsolved. For instance, they say, “We do not offer any theoretical guarantee on the sample efficiency of reinforcement learning agents.” Reinforcement learning is notoriously renowned for requiring huge amounts of data. For instance, a reinforcement learning agent might need centuries worth of gameplay to master a computer game. And AI researchers still haven’t figured out how to create reinforcement learning systems that can generalize their learnings across several domains. Therefore, slight changes to the environment often require the full retraining of the model.

The researchers also acknowledge that learning mechanisms for reward maximization is an unsolved problem that remains a central question to be further studied in reinforcement learning.

Strengths and weaknesses of reward maximization

Patricia Churchland, neuroscientist, philosopher, and professor emerita at the University of California, San Diego, described the ideas in the paper as “very carefully and insightfully worked out.”

However, Churchland pointed it out to possible flaws in the paper’s discussion about social decision-making. The DeepMind researchers focus on personal gains in social interactions. Churchland, who has recently written a book on the biological origins of moral intuitions, argues that attachment and bonding is a powerful factor in social decision-making of mammals and birds, which is why animals put themselves in great danger to protect their children.

“I have tended to see bonding, and hence other-care, as an extension of the ambit of what counts as oneself—‘me-and-mine,’” Churchland said. “In that case, a small modification to the [paper’s] hypothesis to allow for reward maximization to me-and-mine would work quite nicely, I think. Of course, we social animals have degrees of attachment—super strong to offspring, very strong to mates and kin, strong to friends and acquaintances etc., and the strength of types of attachments can vary depending on environment, and also on developmental stage.”

This is not a major criticism, Churchland said, and could likely be worked into the hypothesis quite gracefully.

“I am very impressed with the degree of detail in the paper, and how carefully they consider possible weaknesses,” Churchland said. “I may be wrong, but I tend to see this as a milestone.”

Data scientist Herbert Roitblat challenged the paper’s position that simple learning mechanisms and trial-and-error experience are enough to develop the abilities associated with intelligence. Roitblat argued that the theories presented in the paper face several challenges when it comes to implementing them in real life.

“If there are no time constraints, then trial and error learning might be enough, but otherwise we have the problem of an infinite number of monkeys typing for an infinite amount of time,” Roitblat said. The infinite monkey theorem states that a monkey hitting random keys on a typewriter for an infinite amount of time may eventually type any given text.

Roitblat is the author of Algorithms are Not Enough, in which he explains why all current AI algorithms, including reinforcement learning, require careful formulation of the problem and representations created by humans.

“Once the model and its intrinsic representation are set up, optimization or reinforcement could guide its evolution, but that does not mean that reinforcement is enough,” Roitblat said.

In the same vein, Roitblat added that the paper does not make any suggestions on how the reward, actions, and other elements of reinforcement learning are defined.

“Reinforcement learning assumes that the agent has a finite set of potential actions. A reward signal and value function have been specified. In other words, the problem of general intelligence is precisely to contribute those things that reinforcement learning requires as a pre-requisite,” Roitblat said. “So, if machine learning can all be reduced to some form of optimization to maximize some evaluative measure, then it must be true that reinforcement learning is relevant, but it is not very explanatory.”

Ben Dickson is a software engineer and the founder of TechTalks. He writes about technology, business, and politics. 

This story originally appeared on Bdtechtalks.com. Copyright 2021

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
Tech News

Clubhouse will be available to the general public soon

Partly thanks to taking on the rather unconventional form of audio-only communication, Clubhouse quickly become one of the hottest names in social media since last year. Given its popularity, it might surprise some to discover that the social network is still an exclusive club that requires other members to invite you first in order to get in. It was ironically slow to expand even outside of iOS. Opening its doors to Android users seemingly also opened the floodgates and Clubhouse will no longer be an exclusive club starting sometime this summer.

Clubhouse recently explained that its slow expansion was intentional, even when faced with new rivals in that audio-only social networking space. It wanted to focus on more important aspects of growing the community and stabilizing features first rather than getting bogged down with server maintenance or rushed bug fixing from dozens of reports that would be coming in. It seems, however, that it has reached a point where it’s ready to actually open the doors to everyone.

In one of its tweets highlighting its recent Town Hall, Clubhouse revealed that it already has more than 2 million Android users. That’s rather impressive considering the Android app only went live earlier in May and that it isn’t even in feature parity compared to iOS. It does prove that there were that many people just waiting for Clubhouse to arrive on their favorite mobile platform before jumping in.

More importantly, it also revealed its plans to make the network available to the general public this summer, presuming all goes well. Until then, it will be focusing on tasks that will make the network feel more complete to prepare Clubhouse for a sudden flood of new users without any invitations.

Of course, Clubhouse’s public opening comes at a time when Twitter and Facebook, among others, are already stepping up their game in that same space. Twitter is even starting to test its Ticketed Spaces as a monetization system for both creators and Twitter itself, something that Clubhouse has also launched recently in a very minimal way.



Repost: Original Source and Author Link

Categories
AI

Amazon launches ECS Anywhere in general availability

Elevate your enterprise data technology and strategy at Transform 2021.


Amazon today announced the launch in general availability of Amazon ECS Anywhere, an extension of the company’s Elastic Container Service (ECS) that allows Amazon Web Services (AWS) customers to deploy native Amazon ECS tasks in any computing environment. Amazon says that the service includes the traditional AWS managed infrastructure, as well as customer-managed infrastructure and a fully managed control plan running in the cloud.

ECS Anywhere is designed for AWS customers who’ve made significant capital investments in their datacenters or who operate in highly regulated industries, Amazon says. While these customers may be all-in on cloud, they also have to consider practical constraints like financial resources or specialized workloads that inhibit them. Their deployment requirements may go beyond AWS-owned infrastructure, and they might not be able to afford to use different container management technologies for different deployment targets.

Amazon describes ECS Anywhere as an infrastructure-agnostic product that works with virtual machines, bare-metal hardware, and other infrastructure types running supported operating systems. In disconnected scenarios, ECS Anywhere tasks continue running on customer managed infrastructure, while cloud connectivity is required to update or scale the tasks or to connect to other in-region AWS services at runtime.

Amazon Web Services

With ECS Anywhere, customers can run and manage container-based apps on-premises using the same APIs, cluster management, workload scheduling, monitoring, and deployment pipelines they use with ECS in AWS. ECS Anywhere provides a container orchestration service that allows customers to run, scale, and secure container apps on local infrastructure in addition to all AWS regions, AWS local zones, edge locations, and hybrid infrastructure deployments. There aren’t any upfront fees or commitments to use Amazon ECS Anywhere, and customers pay only for the container instances they run.

ECS Anywhere

ECS Anywhere users can use the ECS control plane — which is where ECS cluster objects can be defined — and the AWS Systems Manager agent will install on customer-managed operating systems and turn those operating systems into “managed instances.” A converged version of the open source Amazon ECS agent will install on these managed instances and these instances will register into an ECS cluster previously defined in the control plane. A new launch type and compatibility requirement will be introduced, allowing the Amazon ECS control plane to run tasks on non-AWS managed infrastructure.

“ECS tasks will have, for example, a ‘task role’ and a ‘task execution role’ assigned,” Massimo Re Ferre, principal technologist at AWS, explained in a blog post. “This means they will be able to interact, if need be, with cloud services … as if they were deployed. However, there will also be effectively local resources running on customer managed operating system and with local network connectivity. This will allow Amazon ECS applications deployed ‘externally’ to appreciate low latency and high bandwidth when connecting to services running in proximity in the same data center.”

Customers and partners currently using ECS Anywhere include Siemens, CyberAgent, Getir, and Infosys among others, according to VP of compute services at AWS Deepak Singh. Canonical is leveraging it to offer Ubuntu for container workloads, while Aqua Security is using ECS Anywhere to help clients build cloud-native apps that meet certain compliance requirements.

“Customers have told us that while they need to run containers on their own infrastructure, they don’t want the hassle of operating their own cluster management software,” Singh said in a press release. “With Amazon ECS Anywhere, we are proud to provide our customers exactly what they’ve asked for — a single service and control plane to manage their container deployments.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Google launches AI-powered document processing services in general availability

Join Transform 2021 this July 12-16. Register for the AI event of the year.


Google today announced that several of its cloud-based, AI-powered document processing products have become generally available after launching in preview last year. DocAI platform, Lending DocAI, and Procurement DocAI, which have been piloted by thousands of businesses to date, are now open to all customers and include new features and resources.

Companies spend an average of $20 to file and store a single document, by some estimates, and only 18% of companies consider themselves paperless. An IDC report revealed that document-related challenges account for a 21.3% productivity loss, and U.S. companies waste a collective $8 billion annually managing paperwork.

How it works

With the launch in general availability, Lending DocAI, which processes loan applicants’ asset documents, now offers a set of specialized AI models for paystubs and bank statements. The service also now benefits from DocAI platform’s Human-in-the-Loop AI capability, which provides a workflow to manage human data review tasks.

As Google explains, Human-in-the-Loop AI enables human reviewers to verify data captured by Lending DocAI, Procurement DocAI, and other offerings in DocAI platform. The system shows a percentage score of how “sure” it is that the AI ingested the document correctly, and it’s customizable, with the flexibility to set different thresholds and assign groups of reviewers to stages of a workflow. Developers can choose reviewers to assign to tasks either from within their own company or from partner organizations.

Lending institutions like banks and brokers process hundreds of pages of paperwork for every loan. It’s a heavily manual process that adds thousands of dollars to the cost of issuing a loan. While hardly flawless, automated processing can give customers a degree of confidence they can afford the property they’re interested in, and some lenders are able to complete the ordeal within minutes, as opposed to the weeks it once took.

Procurement DocAI

Procurement DocAI, which performs document processing for invoices, receipts, and more, has gained an AI parser for electric, water, and other utility bills. The latest release taps Google’s Knowledge Graph to validate information, a system that understands over 500 facts about 5 billion entities from the web, as well as from open and licensed databases. Google claims that Knowledge Graph can help increase document parsing accuracy by identifying, for example, that “Angelina” correlates to “Angelina Paris,” a bakery identified using geodata.

Google also today announced a partnership with mortgage servicing firm Mr. Cooper, following a collaboration with home loan company Roostify last October. Google says Mr. Cooper will offer its customers greater automation and workflow tools by connecting them with the DocAI platform.

“Over the last few years, we have made substantial investments in our proprietary servicing technology and core mortgage platform that have revolutionized the customer experience while providing dramatic efficiencies in operating cost. By joining forces with Google … we are able to build on those advances and help make these technologies available for the mortgage industry to deploy through Google Cloud,” Mr. Cooper CEO Jay Bray said in a press release.

The general release of the DocAI platform comes after Google launched PPP Lending AI, an effort to help lenders expedite the processing of applications for the since-exhausted U.S. Small Business Administration’s (SBA) Paycheck Protection Program. As Google explained at the time in a whitepaper, AI can automate the handling of volumes of loan applications by identifying patterns that would take a human worker longer to spot.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Amazon launches ML-powered maintenance tool Lookout for Equipment in general availability

Join GamesBeat Summit 2021 this April 28-29. Register for a free or VIP pass today.


Amazon today announced the general availability of Lookout for Equipment, a service that uses machine learning to help customers perform maintenance on equipment in their facilities. Launched in preview last year during Amazon Web Services (AWS) re:Invent 2020, Lookout for Equipment ingests sensor data from a customer’s industrial equipment and then trains a model to predict early warning signs of machine failure or suboptimal performance.

Predictive maintenance technologies have been used for decades in jet engines and gas turbines, and companies like GE Digital’s Predix and Petasense offer Wi-Fi-enabled, cloud- and AI-driven sensors. According to a recent report by analysts at Markets and Markets, predictive factory maintenance could be worth $12.3 billion by 2025. Startups like Augury are vying for a slice of the segment, beyond Amazon.

With Lookout for Equipment, industrial customers can build a predictive maintenance solution for a single facility or multiple facilities. To get started, companies upload their sensor data — like pressure, flow rate, RPMs, temperature, and power — to Amazon Simple Storage Service (S3) and provide the relevant S3 bucket location to Lookout for Equipment. The service will automatically sift through the data, look for patterns, and build a model that’s tailored to the customer’s operating environment. Lookout for Equipment will then use the model to analyze incoming sensor data and identify early warning signs of machine failure or malfunction.

For each alert, Lookout for Equipment will specify which sensors are indicating an issue and measure the magnitude of its impact on the detected event. For example, if Lookout for Equipment spotted an problem on a pump with 50 sensors, the service could show which five sensors indicate an issue on a specific motor and relate that issue to the motor power current and temperature.

“Many industrial and manufacturing companies have heavily invested in physical sensors and other technology with the aim of improving the maintenance of their equipment. But even with this gear in place, companies are not in a position to deploy machine learning models on top of the reams of data due to a lack of resources and the scarcity of data scientists,” VP of machine learning at AWS Swami Sivasubramanian said in a press release. “Today, we’re excited to announce the general availability of Amazon Lookout for Equipment, a new service that enables customers to benefit from custom machine learning models that are built for their specific environment to quickly and easily identify abnormal machine behavior — so that they can take action to avoid the impact and expense of equipment downtime.”

Lookout for Equipment is available via the AWS console as well through supporting partners in the AWS Partner Network. It launches today in US East (N. Virginia), EU (Ireland), and Asia Pacific (Seoul) server regions, with availability in additional regions in the coming months.

The launch of Lookout for Equipment follows the general availability of Lookout for Metrics, a fully managed service that uses machine learning to monitor key factors impacting the health of enterprises. Both products are complemented by Amazon Monitron, an end-to-end equipment monitoring system to enable predictive maintenance with sensors, a gateway, an AWS cloud instance, and a mobile app.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link