Categories
AI

Instabase adds deep learning to make sense of unstructured data

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more


Whether they realize it or not, most enterprises are sitting on a mountain of priceless, yet untapped, data. Buried deep within PDFs, customer emails, and scanned documents is a trove of business intelligence and insights that often have the potential to inform critical business decisions – if only it can be extracted and harnessed, that is.

Instabase hopes to help businesses benefit from unstructured data with the help of some good, old-fashioned AI. Today, the business automation platform provider is announcing a set of new deep learning-based tools designed to help enterprises more easily extract and make sense of this unstructured data and build applications that will help them put it all to use.

“Unlocking unstructured data, which is 80% of all enterprise data, is an extremely difficult problem due to the variability of the data,” says Instabase founder and CEO Anant Bhardwaj. “Deep learning algorithms provide greater accuracy as the algorithm learns from the entirety of each training document and identifies many different attributes to make its decision, much like a human does.”

Instabase’s new deep learning features offer low-code and no-code functionality that’s designed to let Instabase customers tap into sophisticated deep learning models and train, run, and make use of these models for their business’s needs. Using drag-and-drop visual development interfaces, Instabase customers can build customized workflows and business applications powered by best-in-class deep learning models.

“These deep learning models have already been trained on very large sets of data and as a result, fewer samples are needed to fine-tune the model for a specific use case,” Bhardwaj explains. “That means enterprises can tackle use cases never before possible, build solutions faster and at unprecedented accuracies for their unstructured data use cases.”

Founded in 2015, Instabase uses technology like optical character recognition and natural language processing to extract and decipher data that is far too often buried in formats that can be difficult for machines to understand. The platform provider, whose customers include companies in the financial services, medical, and insurance industries, hopes that by tapping into this unstructured data, it can help companies automate more of their business processes, inform key decisions, and further their own digital transformation.

With the addition of its new deep learning infrastructure, Instabase hopes to make this unstructured data analysis even faster and more impactful. Using the platform’s Machine Learning Studio, Instabase customers can annotate data points within documents and spin up and train a custom model, which can then be used by others within the organization. The new features also include a Model Catalog, which offers plug-and-play access to a library of deep learning models built by Instabase and other providers.

The platform’s new deep learning features will be publicly available in early 2022.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Deep tech, no-code tools will help future artists make better visual content

This article was contributed by Abigail Hunter-Syed, Partner at LDV Capital.

Despite the hype, the “creator economy” is not new. It has existed for generations, primarily dealing with physical goods (pottery, jewelry, paintings, books, photos, videos, etc). Over the past two decades, it has become predominantly digital. The digitization of creation has sparked a massive shift in content creation where everyone and their mother are now creating, sharing, and participating online.

The vast majority of the content that is created and consumed on the internet is visual content. In our recent Insights report at LDV Capital, we found that by 2027, there will be at least 100 times more visual content in the world. The future creator economy will be powered by visual tech tools that will automate various aspects of content creation and remove the technical skill from digital creation. This article discusses the findings from our recent insights report.

Group of superheroes on a dark background

Above: ©LDV CAPITAL INSIGHTS 2021

Image Credit: ©LDV CAPITAL INSIGHTS 2021

We now live as much online as we do in person and as such, we are participating in and generating more content than ever before. Whether it is text, photos, videos, stories, movies, livestreams, video games, or anything else that is viewed on our screens, it is visual content.

Currently, it takes time, often years, of prior training to produce a single piece of quality and contextually-relevant visual content. Typically, it has also required deep technical expertise in order to produce content at the speed and quantities required today. But new platforms and tools powered by visual technologies are changing the paradigm.

Computer vision will aid livestreaming

Livestreaming is a video that is recorded and broadcast in real-time over the internet and it is one of the fastest-growing segments in online video, projected to be a $150 billion industry by 2027. Over 60% of individuals aged 18 to 34 watch livestreaming content daily, making it one of the most popular forms of online content.

Gaming is the most prominent livestreaming content today but shopping, cooking, and events are growing quickly and will continue on that trajectory.

The most successful streamers today spend 50 to 60 hours a week livestreaming, and many more hours on production. Visual tech tools that leverage computer vision, sentiment analysis, overlay technology, and more will aid livestream automation. They will enable streamers’ feeds to be analyzed in real-time to add production elements that are improving quality and cutting back the time and technical skills required of streamers today.

Synthetic visual content will be ubiquitous

A lot of the visual content we view today is already computer-generated graphics (CGI), special effects (VFX), or altered by software (e.g., Photoshop). Whether it’s the army of the dead in Game of Thrones or a resized image of Kim Kardashian in a magazine, we see content everywhere that has been digitally designed and altered by human artists. Now, computers and artificial intelligence can generate images and videos of people, things, and places that never physically existed.

By 2027, we will view more photorealistic synthetic images and videos than ones that document a real person or place. Some experts in our report even project synthetic visual content will be nearly 95% of the content we view. Synthetic media uses generative adversarial networks (GANs) to write text, make photos, create game scenarios, and more using simple prompts from humans such as “write me 100 words about a penguin on top of a volcano.” GANs are the next Photoshop.

L: Remedial drawing created, R: Landscape Image built by NVIDIA’s GauGAN from the drawing

Above: L: Remedial drawing created, R: Landscape Image built by NVIDIA’s GauGAN from the drawing

Image Credit: ©LDV CAPITAL INSIGHTS 2021

In some circumstances, it will be faster, cheaper, and more inclusive to synthesize objects and people than to hire models, find locations and do a full photo or video shoot. Moreover, it will enable video to be programmable – as simple as making a slide deck.

Synthetic media that leverages GANs are also able to personalize content nearly instantly and, therefore, enable any video to speak directly to the viewer using their name or write a video game in real-time as a person plays. The gaming, marketing, and advertising industries are already experimenting with the first commercial applications of GANs and synthetic media.

Artificial intelligence will deliver motion capture to the masses

Animated video requires expertise as well as even more time and budget than content starring physical people. Animated video typically refers to 2D and 3D cartoons, motion graphics, computer-generated imagery (CGI), and visual effects (VFX). They will be an increasingly essential part of the content strategy for brands and businesses deployed across image, video and livestream channels as a mechanism for diversifying content.

Graph displaying motion capture landscape

Above: ©LDV CAPITAL INSIGHTS 2021

Image Credit: ©LDV CAPITAL INSIGHTS 2021

The greatest hurdle to generating animated content today is the skill – and the resulting time and budget – needed to create it. A traditional animator typically creates 4 seconds of content per workday. Motion capture (MoCap) is a tool often used by professional animators in film, TV, and gaming to record a physical pattern of an individual’s movements digitally for the purpose of animating them. An example would be something like recording Steph Curry’s jump shot for NBA2K

Advances in photogrammetry, deep learning, and artificial intelligence (AI) are enabling camera-based MoCap – with little to no suits, sensors, or hardware. Facial motion capture has already come a long way, as evidenced in some of the incredible photo and video filters out there. As capabilities advance to full body capture, it will make MoCap easier, faster, budget-friendly, and more widely accessible for animated visual content creation for video production, virtual character live streaming, gaming, and more.

Nearly all content will be gamified

Gaming is a massive industry set to hit nearly $236 billion globally by 2027. That will expand and grow as more and more content introduces gamification to encourage interactivity with the content. Gamification is applying typical elements of game playing such as point scoring, interactivity, and competition to encourage engagement.

Games with non-gamelike objectives and more diverse storylines are enabling gaming to appeal to wider audiences. With a growth in the number of players, diversity and hours spent playing online games will drive high demand for unique content.

AI and cloud infrastructure capabilities play a major role in aiding game developers to build tons of new content. GANs will gamify and personalize content, engaging more players and expanding interactions and community. Games as a Service (GaaS) will become a major business model for gaming. Game platforms are leading the growth of immersive online interactive spaces.

People will interact with many digital beings

We will have digital identities to produce, consume, and interact with content. In our physical lives, people have many aspects of their personality and represent themselves differently in different circumstances: the boardroom vs the bar, in groups vs alone, etc. Online, the old school AOL screen names have already evolved into profile photos, memojis, avatars, gamertags, and more. Over the next five years, the average person will have at least 3 digital versions of themselves both photorealistic and fantastical to participate online.

Five examples of digital identities

Above: ©LDV CAPITAL INSIGHTS 2021

Image Credit: ©LDV CAPITAL INSIGHTS 2021

Digital identities (or avatars) require visual tech. Some will enable public anonymity of the individual, some will be pseudonyms and others will be directly tied to physical identity. A growing number of them will be powered by AI.

These autonomous virtual beings will have personalities, feelings, problem-solving capabilities, and more. Some of them will be programmed to look, sound, act and move like an actual physical person. They will be our assistants, co-workers, doctors, dates and so much more.

Interacting with both people-driven avatars and autonomous virtual beings in virtual worlds and with gamified content sets the stage for the rise of the Metaverse. The Metaverse could not exist without visual tech and visual content and I will elaborate on that in a future article.

Machine learning will curate, authenticate, and moderate content

For creators to continuously produce the volumes of content necessary to compete in the digital world, a variety of tools will be developed to automate the repackaging of content from long-form to short-form, from videos to blogs, or vice versa, social posts, and more. These systems will self-select content and format based on the performance of past publications using automated analytics from computer vision, image recognition, sentiment analysis, and machine learning. They will also inform the next generation of content to be created.

In order to then filter through the massive amount of content most effectively, autonomous curation bots powered by smart algorithms will sift through and present to us content personalized to our interests and aspirations. Eventually, we’ll see personalized synthetic video content replacing text-heavy newsletters, media, and emails.

Additionally, the plethora of new content, including visual content, will require ways to authenticate it and attribute it to the creator both for rights management and management of deep fakes, fake news, and more. By 2027, most consumer phones will be able to authenticate content via applications.

It is deeply important to detect disturbing and dangerous content as well and is increasingly hard to do given the vast quantities of content published. AI and computer vision algorithms are necessary to automate this process by detecting hate speech, graphic pornography, and violent attacks because it is too difficult to do manually in real-time and not cost-effective. Multi-modal moderation that includes image recognition, as well as voice, text recognition, and more, will be required.

Visual content tools are the greatest opportunity in the creator economy

The next five years will see individual creators who leverage visual tech tools to create visual content rival professional production teams in the quality and quantity of the content they produce. The greatest business opportunities today in the Creator Economy are the visual tech platforms and tools that will enable those creators to focus on the content and not on the technical creation.

Abigail Hunter-Syed is a Partner at LDV Capital investing in people building businesses powered by visual technology. She thrives on collaborating with deep, technical teams that leverage computer vision, machine learning, and AI to analyze visual data. She has more than a ten-year track record of leading strategy, ops, and investments in companies across four continents and rarely says no to soft-serve ice cream.

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers

Repost: Original Source and Author Link

Categories
AI

How Adobe uses deep learning to improve its products

Like every year, Adobe’s Max 2021 event featured product reveals and other innovations happening at the world’s leading computer graphics software company.

Among the most interesting features of the event is Adobe’s continued integration of artificial intelligence into its products, a venue that the company has been exploring in the past few years.

Like many other companies, Adobe is leveraging deep learning to improve its applications and solidify its position in the video and image editing market. In turn, the use of AI is shaping Adobe’s product strategy.

AI-powered image and video editing

Sensei, Adobe’s AI platform, is now integrated into all the products of its Creative Cloud suite. Among the features revealed in this year’s conference is an auto-masking tool in Photoshop, which enables you to select an object simply by hovering your mouse over it. A similar feature automatically creates mask layers for all the objects it detects in a scene.

The auto-mask feature saves a lot of time, especially in images where objects have complex contours and colors and would be very difficult to select with classic tools.

Adobe has also improved Neural Filters, a feature it added to Photoshop last year. Neural Filters use machine learning to add enhancements to images. Many of the filters are applicable to portraits and images of people. For example, you can apply skin smoothing, transfer makeup from a source image to a target image, or change the expression of a subject in a photo.

Other Neural Filters make more general changes, such as colorizing black-and-white images or changing the background landscape.

The Max conference also unveiled some preview and upcoming technologies. For example, a new feature for Adobe’s photo collection product called “in-between” takes two or more photos that were captured at a short interval of each other, and it creates a video by automatically generating the frames that were in-between the photos.

Another feature being developed is “on point,” which helps you search Adobe’s huge library of stock images by providing a reference pose. For example, if you provide it with a photo of a person sitting and reaching out their hand, the machine learning models will detect the pose of the person and find other photos where people are in similar positions.

AI features have been added to Lightroom, Premiere, and other Adobe products as well.

The challenges of delivering AI products

When you look at Adobe’s AI features individually, none of them are groundbreaking. While Adobe did not provide any architectural or implementation details in the event, anyone who has been following AI research can immediately relate each of the features presented at Max to one or more papers and presentations made at machine learning and computer vision conferences in the past few years. Auto-masking uses object detection and segmentation with deep learning, an area of research that has seen tremendous progress recently.

Style transfer with neural networks is a technique that is at least four years old. And generative adversarial networks (GAN), which power several of the image generation features, have been around for more than seven years. In fact, a lot of the technologies Adobe is using are open source and freely available.

The real genius behind Adobe’s AI is not the superior technology, but the company’s strategy for delivering the products to its customers.

A successful product needs to have a differentiating value that convinces users to start using it or switch from their old solutions to the new application.

The benefits of applying deep learning to different image processing applications are very clear. They result in improved productivity and lower costs. The assistance provided by deep learning models can help lower the barrier of artistic creativity for people who don’t have the skills and experience of expert graphical designers. In the case of auto-masking and neural filters, the tools make it possible even for experienced users to solve their problems faster and better. Some of the new features, such as the “in-between” feature, are addressing problems that had not been solved by other applications.

But beyond superior features, a successful product needs to be delivered to its target audience in a way that is frictionless and cost-effective. For example, say you develop a state-of-the-art deep learning–powered neural filter application and want to sell it on the market. Your target users are graphic designers who are already using a photo-editing tool such as Photoshop. If they want to apply your neural filter, they’ll have to constantly port their images between Photoshop and your application, which causes too much friction and degrades the user experience.

You’ll also have to deal with the costs of deep learning. Many user devices don’t have the memory and processing capacity to run neural networks and require cloud-based processing. Therefore, you’ll have to set up servers and web APIs to serve the deep learning models, and you also have to make sure your service will remain online and available as the usage scales. You only recoup such costs when you reach a large number of paying users.

You’ll also have to figure out how to monetize your product in a way that covers your costs while also keeping users interested in using it. Will your product be an ads-based free product, a freemium model, a one-time payment, or a subscription service? Most clients prefer to avoid working with several software vendors that have different payment models.

And you’ll need an outreach strategy to make your product visible to its intended market. Will you run ads on social media, make direct sales and reach out to design companies, or use content marketing? Many products fail not because they don’t solve a core problem but because they can’t reach out to the right market and deliver their product in a cost-efficient manner.

And finally, you’ll need a roadmap to continuously iterate and improve your product. For example, if you’re using machine learning to enhance images, you’ll need a workflow to constantly gather new data, find out where your models are failing, and finetune them to improve their performance.

Adobe’s AI strategy

Adobe already has a very large share of the graphics software market. Millions of people use Adobe’s applications every day, so the company has no problem in reaching out to its intended market. Whenever it has a new deep learning tool, it can immediately use the vast reach of Photoshop, Premiere, and the other applications in its Creative Cloud suite to make it visible and available to users. Users don’t need to pay for or install any new applications; they just need to download the new plugins into their applications.

The company’s gradual transition to the cloud in the past few years has also paved the way for a seamless integration of deep learning into its applications. Most of Adobe’s AI features run in the cloud. To its users, the experience of the cloud-based features is no different than using filters and tools that are directly running on their own devices. Meanwhile, the scale of Adobe’s cloud makes it possible for the company to run deep learning inference in a very cost-effective way, which is why most new AI features are made available for free to users who already have a Creative Cloud subscription.

Finally, the cloud-based deep learning model provides Adobe with the opportunity to run a very efficient AI factory. As Adobe’s cloud serves deep learning models to its users, it will also gather data to improve the performance of its AI features in the future. For example, the company acknowledged at the Max conference that the auto-masking feature does not work for all objects yet but will improve over time. The continued iteration will in turn enable Adobe to enhance its AI capabilities and strengthen its position in the market. The AI in turn will shape the products Adobe will roll out in the future.

Running applied machine learning projects is very difficult, which is mostly why companies fail in bringing them to fruition. Adobe is an interesting case study of how bringing together the right elements can turn advances in AI into profitable business applications.

Ben Dickson is a software engineer and the founder of TechTalks. He writes about technology, business, and politics.

This story originally appeared on Bdtechtalks.com. Copyright 2021

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

New deep reinforcement learning technique helps AI to evolve

Hundreds of millions of years of evolution have produced a variety of life-forms, each intelligent in its own fashion. Each species has evolved to develop innate skills, learning capacities, and a physical form that ensures survival in its environment.

But despite being inspired by nature and evolution, the field of artificial intelligence has largely focused on creating the elements of intelligence separately and fusing them together after the development process. While this approach has yielded great results, it has also limited the flexibility of AI agents in some of the basic skills found in even the simplest life-forms.

In a new paper published in the scientific journal Nature, AI researchers at Stanford University present a new technique that can help take steps toward overcoming some of these limits. Called “deep evolutionary reinforcement learning,” or DERL, the new technique uses a complex virtual environment and reinforcement learning to create virtual agents that can evolve both in their physical structure and learning capacities. The findings can have important implications for the future of AI and robotics research.

Evolution is hard to simulate

In nature, the body and brain evolve together. Across many generations, every animal species has gone through countless cycles of mutation to grow limbs, organs, and a nervous system to support the functions it needs in its environment. Mosquitos are equipped with thermal vision to spot body heat. Bats have wings to fly and an echolocation apparatus to navigate dark spaces. Sea turtles have flippers to swim with and a magnetic field detector system to travel very long distances. Humans have an upright posture that frees their arms and lets them see the far horizon, hands and nimble fingers that can manipulate objects, and a brain that makes them the best social creatures and problem solvers on the planet.

Interestingly, all these species descended from the first life-form that appeared on Earth several billion years ago. Based on the selection pressures caused by the environment, the descendants of those first living beings evolved in many directions.

Studying the evolution of life and intelligence is interesting, but replicating it is extremely difficult. An AI system that would want to recreate intelligent life in the same way that evolution did would have to search a very large space of possible morphologies, which is extremely expensive computationally. It would need a lot of parallel and sequential trial-and-error cycles.

AI researchers use several shortcuts and predesigned features to overcome some of these challenges. For example, they fix the architecture or physical design of an AI or robotic system and focus on optimizing the learnable parameters. Another shortcut is the use of Lamarckian rather than Darwinian evolution, in which AI agents pass on their learned parameters to their descendants. Yet another approach is to train different AI subsystems separately (vision, locomotion, language, etc.) and then tack them on together in a final AI or robotic system. While these approaches speed up the process and reduce the costs of training and evolving AI agents, they also limit the flexibility and variety of results that can be achieved.

Deep evolutionary reinforcement learning

In their new work, the researchers at Stanford aim to bring AI research a step closer to the real evolutionary process while keeping the costs as low as possible. “Our goal is to elucidate some principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control,” they wrote in their paper.

Within the DERL framework, each agent uses deep reinforcement learning to acquire the skills required to maximize its goals during its lifetime. DERL uses Darwinian evolution to search the morphological space for optimal solutions, which means that when a new generation of AI agents are spawned, they only inherit the physical and architectural traits of their parents (along with slight mutations). None of the learned parameters are passed on across generations.

“DERL opens the door to performing large-scale in silico experiments to yield scientific insights into how learning and evolution cooperatively create sophisticated relationships between environmental complexity, morphological intelligence, and the learnability of control tasks,” the researchers wrote.

Simulating evolution

For their framework, the researchers used MuJoCo, a virtual environment that provides highly accurate rigid-body physics simulation. Their design space is called Universal Animal (Unimal), in which the goal is to create morphologies that learn locomotion and object-manipulation tasks in a variety of terrains.

Each agent in the environment is composed of a genotype that defines its limbs and joints. The direct descendant of each agent inherits the parent’s genotype and goes through mutations that can create new limbs, remove existing limbs, or make small modifications to characteristics, such as the degrees of freedom or the size of limbs.

Each agent is trained with reinforcement learning to maximize rewards in various environments. The most basic task is locomotion, in which the agent is rewarded for the distance it travels during an episode. Agents whose physical structures are better suited for traversing terrain learn faster to use their limbs for moving around.

To test the system’s results, the researchers generated agents in three types of terrains: flat (FT), variable (VT), and variable terrains with modifiable objects (MVT). The flat terrain puts the least selection pressure on the agents’ morphology. The variable terrains, on the other hand, force the agents to develop a more versatile physical structure that can climb slopes and move around obstacles. The MVT variant has the added challenge of requiring the agents to manipulate objects to achieve their goals.

The benefits of DERL

An image of AI-generated shapes in different configurations and a set of data tables regarding their morphological results.

Above: Deep evolutionary reinforcement learning generates a variety of successful morphologies across different environments.

Image Credit: TechTalks

One of the interesting findings of DERL is the diversity of the results. Other approaches to evolutionary AI tend to converge on one solution because new agents directly inherit the physique and learnings of their parents. But in DERL, only morphological data is passed on to descendants; the system ends up creating a diverse set of successful morphologies, including bipeds, tripeds, and quadrupeds with and without arms.

At the same time, the system shows traits of the Baldwin effect, which suggests that agents that learn faster are more likely to reproduce and pass on their genes to the next generation. DERL shows that evolution “selects for faster learners without any direct selection pressure for doing so,” according to the Stanford paper.

“Intriguingly, the existence of this morphological Baldwin effect could be exploited in future studies to create embodied agents with lower sample complexity and higher generalization capacity,” the researchers wrote.

Finally, the DERL framework also validates the hypothesis that more complex environments will give rise to more intelligent agents. The researchers tested the evolved agents across eight different tasks, including patrolling, escaping, manipulating objects, and exploration. Their findings show that in general, agents that have evolved in variable terrains learn faster and perform better than AI agents that have only experienced flat terrain.

Their findings seem to be in line with another hypothesis by DeepMind researchers that a complex environment, a suitable reward structure, and reinforcement learning can eventually lead to the emergence of all kinds of intelligent behaviors.

AI and robotics research

The DERL environment only has a fraction of the complexities of the real world. “Although DERL enables us to take a significant step forward in scaling the complexity of evolutionary environments, an important line of future work will involve designing more open-ended, physically realistic, and multiagent evolutionary environments,” the researchers wrote.

In the future, the researchers plan to expand the range of evaluation tasks to better assess how the agents can enhance their ability to learn human-relevant behaviors.

The work could have important implications for the future of AI and robotics and push researchers to use exploration methods that are much more similar to natural evolution.

“We hope our work encourages further large-scale explorations of learning and evolution in other contexts to yield new scientific insights into the emergence of rapidly learnable intelligent behaviors, as well as new engineering advances in our ability to instantiate them in machines,” the researchers wrote.

Ben Dickson is a software engineer and the founder of TechTalks. He writes about technology, business, and politics.

This story originally appeared on Bdtechtalks.com. Copyright 2021

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Deep North, which uses AI to track people from camera footage, raises $16.7M

Join gaming leaders online at GamesBeat Summit Next this upcoming November 9-10. Learn more about what comes next. 


Deep North, a Foster City, California-based startup applying computer vision to security camera footage, today announced that it raised $16.7 million in a series A-1 round. Led by Celesta Capital and Yobi Partners, with participation from Conviction Investment Partners, Deep North plans to use the funds to make hires and expand its services “at scale,” according to CEO Rohan Sanil.

Deep North, previously known as Vmaxx, claims its platform can help brick-and-mortar retailers “embrace digital” and protect against COVID-19 by retrofitting security systems to track purchases and ensure compliance with masking rules. But the company’s system, which relies on algorithms with potential flaws, raises concerns about both privacy and bias.

“Even before a global pandemic forced retailers to close their doors … businesses were struggling to compete with a rapidly growing online consumer base,” Sanil said in a statement. “As stores open again, retailers must embrace creative digital solutions with data driven, outcome-based computer vision and AI solutions, to better compete with online retailers and, at the same time, accommodate COVID-safe practices.”

AI-powered monitoring

Deep North was founded in 2016 by Sanil and Jinjun Wang, an expert in multimedia signal processing, pattern recognition, computer vision, and analytics. Wang — now a professor at Xi’an Jiaotong University in Xi’an, China — was previously a research scientist at NEC before joining Epson’s R&D division as a member of the senior technical staff. Sanil founded a number of companies prior to Deep North, including Akirra Media Systems, where Wang was once employed as a research scientist.

“In 2016, I pioneered object detection technology to help drive targeted advertising from online videos. When a major brand saw this, they challenged m e to create a means of identifying, analyzing, and sorting objects captured on their security video cameras in their theme parks,” Sanil told VentureBeat via email. “My exploration inspired development that would unlock the potential of installed CCTV and security video cameras within the customer’s physical environment and apply object detection and analysis in any form of video.”

After opening offices in China and Sweden and rebranding in 2018, Deep North expanded the availability of its computer vision and video analytics products, which offer object and people detection capabilities. The company says its real-time, AI-powered and hardware-agnostic software can understand customers’ preferences, actions, interactions, and reactions “in virtually any physical setting” across “a variety of markets,” including retailers, grocers, airports, drive-thrus, shopping malls, restaurants, and events.

Deep North says that retailers, malls, and restaurants in particular can use its solution to analyze customer “hotspots,” seating, occupancy, dwell times, gaze direction, and wait times, leveraging these insights to figure out where to assign store associates or kitchen staff. Stores can predict conversion by correlating tracking data with the time of day, location, marketing events, weather, and more, while shopping centers can draw on tenant statistics to understand trends and identify “synergies” between tenants, optimizing for store placement and cross-tenant promotions.

Deep North

“Our algorithms are trained to detect objects in motion and generate rich metadata about physical environments such as engagement, pathing, and dwelling. Our inference pipeline brings together camera feeds and algorithms for real-time processing,” Deep North explains on its website. “[We] can deploy both via cloud and on-premise and go live within a matter of hours. Our scalable GPU edge appliance enables businesses to bring data processing directly to their environments and convert their property into a digital AI property. Video assets never leave the premise, ensuring the highest level of security and privacy.”

Beyond these solutions, Deep North developed products for particular use cases like social distancing and sanitation. The company offers products that monitor for hand-washing and estimate wait times at airport check-in counters, for example, as well as detect the presence of masks and track the status of maintenance workers on tarmacs.

“With Deep North’s mask detection capability, retailers can easily monitor large crowds and receive real-time alerts,” Deep North explains about its social distancing products. “In addition, Deep North … monitors schedules and coverage of sanitization measures as well as the total time taken for each cleaning activity … Using Deep North’s extensive data, [malls can] create tenant compliance scorecards to benchmark efforts, track overall progress, course-correct as necessary. [They] can also ensure occupancy limits are adhered to across several properties, both locally and region-wide, by monitoring real-time occupancy on our dashboard and mobile apps.”

Bias concerns

Like most computer vision systems, Deep North’s were trained on datasets of images and videos showing examples of people, places, and things. Poor representation within these datasets can result in harm — particularly given that the AI field generally lacks clear descriptions of bias.

Previous research has found that ImageNet and Open Images — two large, publicly available image datasets — are U.S.- and Euro-centric, encoding humanlike biases about race, ethnicity, gender, weight, and more. Models trained on these datasets perform worse on images from Global South countries. For example, images of grooms are classified with lower accuracy when they come from Ethiopia and Pakistan, compared to images of grooms from the United States. And because of how images of words like “wedding” or “spices” are presented in distinctly different cultures, object recognition systems can fail to classify many of these objects when they come from the Global South.

Bias can arise from other sources, like differences in the sun path between the northern and southern hemispheres and variations in background scenery. Studies show that even differences between camera models — e.g., resolution and aspect ratio — can cause an algorithm to be less effective in classifying the objects it was trained to detect.

Tech companies have historically deployed flawed models into production. ST Technologies’ facial recognition and weapon-detecting platform was found to misidentify black children at a higher rate and frequently mistook broom handles for guns. Meanwhile, Walmart’s AI- and camera-based anti-shoplifting technology, which is provided by Everseen, came under scrutiny last May over its reportedly poor detection rates.

Deep North doesn’t disclose on its website how it trained its computer vision algorithms, including whether it used synthetic data (which has its own flaws) to supplement real-world datasets. The company also declines to say to what extent it takes into account accessibility and users with major mobility issues.

In an email, Sanil claimed that Deep North “has one of the largest training datasets in the world,” derived from real-world deployments and scenarios. “Our human object detection and analysis algorithms have been trained with more than 130 million detections, thousands of camera feeds, and various environmental conditions while providing accurate insights for our customers,” he said. “Our automated and semi-supervised training methodology helps us build new machine learning models rapidly, with the least amount of training data and human intervention.”

In a follow-up email, Sanil added: “Our platform detects humans, including those with unique gaits, and those that use mobility aids and assistive devices. We don’t do any biometric analysis, and therefore there is no resulting bias in our system … In the simplest terms, the platform interprets everything as an object whether it’s a human or a shopping cart or a vehicle. We provide object counts entering or exiting a location. Our object counting and reporting is not influenced by specific characteristics.” He continued: “We have a large set of labeled data. For new data to be labeled, we need to classify some of the unlabeled data using the labeled information set. With the semi-supervised process we can now expedite the labeling process for new datasets. This saves time and cost for us. We don’t need annotators, or expensive and slow processes.”

Privacy and controversy

While the purported goal of products like Deep North’s are health, safety, and analytics, the technology could be coopted for other, less humanitarian intents. Many privacy experts worry that they’ll normalize greater levels of surveillance, capturing data about workers’ movements and allowing managers to chastise employees in the name of productivity.

Deep North is no stranger to controversy, having reportedly worked with school districts and universities in Texas, Florida, Massachusetts, and California to pilot a security system that uses AI and cameras to detect threats. Deep North claims that the system, which it has since discontinued, worked with cameras with resolutions as low as 320p and could interpret people’s behavior while identifying objects like unattended bags and potential weapons.

Deep North is also testing systems in partnership with the U.S. Transportation Security Administration, which furnished it with a grant last March. The company received close to $200,000 in funding to provide metrics like passenger throughput, social distancing compliance, agent interactions, and bottleneck zones as well as reporting of unattended baggage, movement in the wrong direction, or occupying restricted areas.

“We are humbled and excited to be able to apply our innovations to help TSA realize its vision of improving passenger experience and safety throughout the airport,” Sanil said in a statement. “We are committed to providing the U.S. Department of Homeland Security and other government entities with the best AI technologies to build a safer and better homeland through continued investment and innovation.”

Deep North admitted in an interview with Swedish publication Breakit that it offers facial characterization services to some customers to estimate age range. And on its website, the startup touts its technologies’ ability to personalize marketing materials depending on a person’s demographics, like gender. But Deep North is adamant that its internal protections prevent it from ascertaining the identity of any person captured via on-camera footage.

“We have no capability to link the metadata to any single individual. Further, Deep North does not capture personally identifiable information (PII) and was developed to govern and preserve the integrity of each and every individual by the highest possible standards of anonymization,” Sanil told TechCrunch in March 2020. “Deep North does not retain any PII whatsoever, and only stores derived metadata that produces metrics such as number of entries, number of exits, etc. Deep North strives to stay compliant with all existing privacy policies including GDPR and the California Consumer Privacy Act.”

To date, 47-employee Deep North has raised $42.3 million in venture capital.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Spell unveils deep learning operations platform to cut AI training costs

All the sessions from Transform 2021 are available on-demand now. Watch now.


Spell today unveiled an operations platform that provides the tooling needed to train AI models based on deep learning algorithms.

The platforms currently employed to train AI models are optimized for machine learning algorithms. AI models based on deep learning algorithms require their own deep learning operations (DLOps) platform, Spell head of marketing Tim Negris told VentureBeat.

The Spell platform automates the entire deep learning workflow using tools the company developed in the course of helping organizations build and train AI models for computer vision and speech recognition applications that require deep learning algorithms.

Deep roots

Deep learning algorithms trace their lineage back to neural networks in a field of machine learning that structures algorithms in layers to create a neural network that can learn and make intelligent decisions on its own. The artifacts and models that are created using deep learning algorithms, however, don’t lend themselves to the same platforms used to manage machine learning operations (MLOps), Negris said.

An AI model based on deep learning algorithms can require tracking and managing hundreds of experiments with thousands of parameters spanning large numbers of graphical processor units (GPUs), Negris noted. The Spell platform specifically addresses the need to manage, automate, orchestrate, document, optimize, deploy, and monitor deep learning models throughout their entire lifecycle, he said. “Data science teams need to be able to explain and reproduce deep learning results,” Negris added.

While most existing MLOps platforms are not well suited to managing deep learning algorithms, Negris said the Spell platform can also be employed to manage AI models based on machine learning algorithms. Spell does not provide any tools to manage the lifecycle of those models, but data science teams can add their own third-party framework for managing them to the Spell platform.

The Spell platform also reduces cost by automatically invoking spot instances that cloud service providers make available for a finite amount of time whenever feasible, Negris said. That capability can reduce the total cost of training an AI model by as much as 66%, he added. That’s significant because the cost of training AI models based on deep learning algorithms can in some cases reach millions of dollars.

A hybrid approach

In time, most AI applications will be constructed using a mix of machine and deep learning algorithms. In fact, as the building of AI models using machine learning algorithms becomes more automated, many data science teams will spend more of their time constructing increasingly complex AI models based on deep learning algorithms. The cost of building AI models based on deep learning algorithms should also steadily decline as GPUs deployed in an on-premises IT environment or accessed via a cloud service become more affordable.

In the meantime, Negris said that while the workflows for building AI models will converge, it’s unlikely traditional approaches to managing application development processes based on DevOps platforms will be extended to incorporate AI models. The continuous retraining of AI models that are subject to drift does not lend itself to the more linear processes that are employed today to build and deploy traditional applications, he said.

Nevertheless, all the AI models being trained eventually need to find their way into an application deployed in a production environment. The challenge many organizations face today is aligning the rate at which AI models are developed with the faster pace at which applications are now deployed and updated.

One way or another, it’s only a matter of time before every application — to varying degrees — incorporates one or more AI models. The issue going forward is finding a way to reduce the level of friction that occurs whenever an AI model needs to be deployed within an application.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
Game

‘Skin Deep’ is a stinky sci-fi shooter from indie icon Brendon Chung

Brendon Chung knows what people expect out of a first-person shooter. Guns? Check. Strafing? Yep. Ammo drops in strategic yet predictable locations? You betcha.

A sneezing system? Uh, sure. Noxious green clouds that follow you when you’re smelly, giving away your location? Um. Actually, yes.

https://www.youtube.com/watch?v= CeyXnAn-7wY

Skin Deep is the latest project out of Chung’s studio, Blendo Games, and it’s his first-ever FPS title. He’s known for developing clever first-person action and puzzle games including Gravity Bone, Thirty Flights of Loving and Quadrilateral Cowboy, and visually, Skin Deep fits perfectly into his repertoire. The only difference is the gun.

“I’d never done one where you just have a gun and you straight-up shoot people,” Chung said. “I thought, you know what? This is something that I love. This is a game genre that has been so important to me for a long time… This is kind of my attempt at making a bunch of little things that I like in first-person shooter games, and putting them into a game that I think will be funny.”

Skin Deep

Annapurna Interactive

Chung started coding back in elementary school, when he would spend hours between classes customizing levels in FPS classics Doom and Quake, and he continued modding as titles like Half-Life, Quake 2 and Doom 3 hit the scene. He got a job at a mainstream studio in Los Angeles, but continued working on his own projects and eventually went fully independent, picking up a handful of accolades in the process.

Despite a deep personal connection to the FPS genre, Chung hasn’t released a shooter of his own — but that’s going to change when Skin Deep hits Steam. The actual release date is still up in the air, a fact that may be concerning for anyone who remembers waiting for Quadrilateral Cowboy, a game that was “six months away” for well over three years. (On the Skin Deep FAQ page, one of the Qs reads, “Is Skin Deep going to take 4+ years of development time like your previous game Quadrilateral Cowboy?” and the accompanying answer is, “I hope not.”)

Regardless of a release date, today publisher Annapurna Interactive showed off a new trailer for Skin Deep. A new, extra-smelly trailer.

Skin Deep is a non-linear espionage shooter set on a spaceship and played from the perspective of an armed, cryogenically frozen insurance agent whose job is to protect the vessel from invading space pirates. The game looks lighthearted yet sophisticated, in classic Blendo fashion; it involves shooting, sneaking and solving puzzles, and all of it is animated in Chung’s signature cubist style. This ties back to FPS history, too — Skin Deep and most of Blendo Games’ titles are built on a modified port of the Doom 3 engine, idTech4.

Repost: Original Source and Author Link

Categories
AI

AI-powered deep neural nets increase accuracy for credit score predictions

All the sessions from Transform 2021 are available on-demand now. Watch now.


Credit Karma has more than 110 million users and a customer approval rate of 90%, but that wasn’t always the case. When the company launched 14 years ago, its approval percentages were in the single digits, chief technical officer Ryan Graciano said during VentureBeat’s virtual Transform 2021 conference last week.

The reason for this turnaround? Big data and machine learning.

When Credit Karma launched in 2007, the company relied on traditional datacenters because the cloud wasn’t yet part of the conversation. There would have been trouble with banking partners and credit bureaus, and “compliance people wouldn’t even let you in the door,” Graciano said.

The company got very proficient at hardware procurement and systems management but realized the physical hardware was limiting.

“The thing about big data and cloud is that big data moves really quickly, [and] the technologies change very rapidly,” Graciano said. “If you’re needing to do a six-to-nine-month hardware procurement cycle [and] a significant platform change, you’re going to be pretty far behind the curve.”

That was the first issue Credit Karma sought to resolve — the company needed more elasticity. It wasn’t just the time required to set up the hardware, but the fact that the hardware requirements were changing rapidly to keep up with new capabilities and the technology stack couldn’t keep up.

Credit Karma wound up picking Google Cloud and its machine learning offerings because BigQuery and TensorFlow made it easier to handle big data.

The machine learning evolution

The machine learning attempts were initially very straightforward. The company applied simple linear regression models to the anonymized data from its databases. Later, Credit Karma moved on to using gradient boosted trees. Nowadays, the company relies on wide and deep neural nets to predict which banks will approve customers, and at what rates. This technique runs about 80% of Credit Karma’s methods and helps facilitate Darwin, an internal system of experimentation and problem-solving.

The platform Credit Karma built is reusable, Graciano said. There was a recommendation engine on top of the machine learning platform, and everything else connected to it. Anything that happened with Credit Karma came from the system, whether it was receiving an email from Credit Karma, a push notification, or badges on the site.

“All of those things are powered by this one single system. And so that gave us the ability to spend a lot of time on the nuts and bolts of how our data scientists would work in the system,” Graciano said.

It is far easier to add new data sources and clean up the data than it is to define new algorithms. One way to improve the system is to add orthogonal data, rather than innovating on the algorithm, Graciano said. The company’s prediction capabilities expanded as more data sources were added.

“Getting those additional elements is actually a lot more powerful than the 32nd iteration on our algorithm can ever be,” Graciano said.

Graciano acknowledged it took a while to figure out what Credit Karma needed — such as a platform that allowed data scientists to automate retraining models.

“I would say we stumbled through many, many issues,” Graciano said.

Cloud was the way forward

Graciano recommends businesses move toward the cloud because it increases interoperability within the external ecosystem.

“If you’re looking for uplift, you’ll usually get more uplift by adding orthogonal data than you will by innovating on your algorithm,” he said. For Credit Karma, this was a strategic decision that paid off for the longevity of the platform, allowing it to amass useful data and making the company able to leverage it.

“Nothing is more strategic to us than data, and having a lot of power over our data,” Graciano said. Many businesses are likely going to make this move for the very same reasons, shifting from a deterministic way of developing software to a more experimental framework.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

The future of deep learning, according to its pioneers

Where does your enterprise stand on the AI adoption curve? Take our AI survey to find out.


Deep neural networks will move past their shortcomings without help from symbolic artificial intelligence, three pioneers of deep learning argue in a paper published in the July issue of the Communications of the ACM journal.

In their paper, Yoshua Bengio, Geoffrey Hinton, and Yann LeCun, recipients of the 2018 Turing Award, explain the current challenges of deep learning and how it differs from learning in humans and animals. They also explore recent advances in the field that might provide blueprints for the future directions for research in deep learning.

Titled “Deep Learning for AI,” the paper envisions a future in which deep learning models can learn with little or no help from humans, are flexible to changes in their environment, and can solve a wide range of reflexive and cognitive problems.

The challenges of deep learning

Yoshua Bengio Geoffrey Hinton Yann LeCun deep learning

Above: Deep learning pioneers Yoshua Bengio (left), Geoffrey Hinton (center), and Yann LeCun (right).

Deep learning is often compared to the brains of humans and animals. However, the past years have proven that artificial neural networks, the main component used in deep learning models, lack the efficiency, flexibility, and versatility of their biological counterparts.

In their paper, Bengio, Hinton, and LeCun acknowledge these shortcomings. “Supervised learning, while successful in a wide variety of tasks, typically requires a large amount of human-labeled data. Similarly, when reinforcement learning is based only on rewards, it requires a very large number of interactions,” they write.

Supervised learning is a popular subset of machine learning algorithms, in which a model is presented with labeled examples, such as a list of images and their corresponding content. The model is trained to find recurring patterns in examples that have similar labels. It then uses the learned patterns to associate new examples with the right labels. Supervised learning is especially useful for problems where labeled examples are abundantly available.

Reinforcement learning is another branch of machine learning, in which an “agent” learns to maximize “rewards” in an environment. An environment can be as simple as a tic-tac-toe board in which an AI player is rewarded for lining up three Xs or Os, or as complex as an urban setting in which a self-driving car is rewarded for avoiding collisions, obeying traffic rules, and reaching its destination. The agent starts by taking random actions. As it receives feedback from its environment, it finds sequences of actions that provide better rewards.

In both cases, as the scientists acknowledge, machine learning models require huge labor. Labeled datasets are hard to come by, especially in specialized fields that don’t have public, open-source datasets, which means they need the hard and expensive labor of human annotators. And complicated reinforcement learning models require massive computational resources to run a vast number of training episodes, which makes them available to a few, very wealthy AI labs and tech companies.

Bengio, Hinton, and LeCun also acknowledge that current deep learning systems are still limited in the scope of problems they can solve. They perform well on specialized tasks but “are often brittle outside of the narrow domain they have been trained on.” Often, slight changes such as a few modified pixels in an image or a very slight alteration of rules in the environment can cause deep learning systems to go astray.

The brittleness of deep learning systems is largely due to machine learning models being based on the “independent and identically distributed” (i.i.d.) assumption, which supposes that real-world data has the same distribution as the training data. i.i.d also assumes that observations do not affect each other (e.g., coin or die tosses are independent of each other).

“From the early days, theoreticians of machine learning have focused on the iid assumption… Unfortunately, this is not a realistic assumption in the real world,” the scientists write.

Real-world settings are constantly changing due to different factors, many of which are virtually impossible to represent without causal models. Intelligent agents must constantly observe and learn from their environment and other agents, and they must adapt their behavior to changes.

“[T]he performance of today’s best AI systems tends to take a hit when they go from the lab to the field,” the scientists write.

The i.i.d. assumption becomes even more fragile when applied to fields such as computer vision and natural language processing, where the agent must deal with high-entropy environments. Currently, many researchers and companies try to overcome the limits of deep learning by training neural networks on more data, hoping that larger datasets will cover a wider distribution and reduce the chances of failure in the real world.

Deep learning vs hybrid AI

The ultimate goal of AI scientists is to replicate the kind of general intelligence humans have. And we know that humans don’t suffer from the problems of current deep learning systems.

“Humans and animals seem to be able to learn massive amounts of background knowledge about the world, largely by observation, in a task-independent manner,” Bengio, Hinton, and LeCun write in their paper. “This knowledge underpins common sense and allows humans to learn complex tasks, such as driving, with just a few hours of practice.”

Elsewhere in the paper, the scientists note, “[H]umans can generalize in a way that is different and more powerful than ordinary iid generalization: we can correctly interpret novel combinations of existing concepts, even if those combinations are extremely unlikely under our training distribution, so long as they respect high-level syntactic and semantic patterns we have already learned.”

Scientists provide various solutions to close the gap between AI and human intelligence. One approach that has been widely discussed in the past few years is hybrid artificial intelligence that combines neural networks with classical symbolic systems. Symbol manipulation is a very important part of humans’ ability to reason about the world. It is also one of the great challenges of deep learning systems.

Bengio, Hinton, and LeCun do not believe in mixing neural networks and symbolic AI. In a video that accompanies the ACM paper, Bengio says, “There are some who believe that there are problems that neural networks just cannot resolve and that we have to resort to the classical AI, symbolic approach. But our work suggests otherwise.”

The deep learning pioneers believe that better neural network architectures will eventually lead to all aspects of human and animal intelligence, including symbol manipulation, reasoning, causal inference, and common sense.

Promising advances in deep learning

In their paper, Bengio, Hinton, and LeCun highlight recent advances in deep learning that have helped make progress in some of the fields where deep learning struggles. One example is the Transformer, a neural network architecture that has been at the heart of language models such as OpenAI’s GPT-3 and Google’s Meena. One of the benefits of Transformers is their capability to learn without the need for labeled data. Transformers can develop representations through unsupervised learning, and then they can apply those representations to fill in the blanks on incomplete sentences or generate coherent text after receiving a prompt.

More recently, researchers have shown that Transformers can be applied to computer vision tasks as well. When combined with convolutional neural networks, transformers can predict the content of masked regions.

A more promising technique is contrastive learning, which tries to find vector representations of missing regions instead of predicting exact pixel values. This is an intriguing approach and seems to be much closer to what the human mind does. When we see an image such as the one below, we might not be able to visualize a photo-realistic depiction of the missing parts, but our mind can come up with a high-level representation of what might go in those masked regions (e.g., doors, windows, etc.). (My own observation: This can tie in well with other research in the field aiming to align vector representations in neural networks with real-world concepts.)

The push for making neural networks less reliant on human-labeled data fits in the discussion of self-supervised learning, a concept that LeCun is working on.

masked house

Above: Can you guess what is behind the grey boxes in the above image?.

The paper also touches upon “system 2 deep learning,” a term borrowed from Nobel laureate psychologist Daniel Kahneman. System 2 accounts for the functions of the brain that require conscious thinking, which include symbol manipulation, reasoning, multi-step planning, and solving complex mathematical problems. System 2 deep learning is still in its early stages, but if it becomes a reality, it can solve some of the key problems of neural networks, including out-of-distribution generalization, causal inference, robust transfer learning, and symbol manipulation.

The scientists also support work on “Neural networks that assign intrinsic frames of reference to objects and their parts and recognize objects by using the geometric relationships.” This is a reference to “capsule networks,” an area of research Hinton has focused on in the past few years. Capsule networks aim to upgrade neural networks from detecting features in images to detecting objects, their physical properties, and their hierarchical relations with each other. Capsule networks can provide deep learning with “intuitive physics,” a capability that allows humans and animals to understand three-dimensional environments.

“There’s still a long way to go in terms of our understanding of how to make neural networks really effective. And we expect there to be radically new ideas,” Hinton told ACM.

Ben Dickson is a software engineer and the founder of TechTalks. He writes about technology, business, and politics.

This story originally appeared on Bdtechtalks.com. Copyright 2021

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
Tech News

Pioneers of deep learning AI think its future is gonna be lit

Deep neural networks will move past their shortcomings without help from symbolic artificial intelligence, three pioneers of deep learning argue in a paper published in the July issue of the Communications of the ACM journal.

In their paper, Yoshua Bengio, Geoffrey Hinton, and Yann LeCun, recipients of the 2018 Turing Award, explain the current challenges of deep learning and how it differs from learning in humans and animals. They also explore recent advances in the field that might provide blueprints for the future directions for research in deep learning.

Titled “Deep Learning for AI,” the paper envisions a future in which deep learning models can learn with little or no help from humans, are flexible to changes in their environment, and can solve a wide range of reflexive and cognitive problems.

The challenges of deep learning

Deep learning is often compared to the brains of humans and animals. However, the past years have proven that artificial neural networks, the main component used in deep learning models, lack the efficiency, flexibility, and versatility of their biological counterparts.

In their paper, Bengio, Hinton, and LeCun acknowledge these shortcomings. “Supervised learning, while successful in a wide variety of tasks, typically requires a large amount of human-labeled data. Similarly, when reinforcement learning is based only on rewards, it requires a very large number of interactions,” they write.

Supervised learning is a popular subset of machine learning algorithms, in which a model is presented with labeled examples, such as a list of images and their corresponding content. The model is trained to find recurring patterns in examples that have similar labels. It then uses the learned patterns to associate new examples with the right labels. Supervised learning is especially useful for problems where labeled examples are abundantly available.

Reinforcement learning is another branch of machine learning, in which an “agent” learns to maximize “rewards” in an environment. An environment can be as simple as a tic-tac-toe board in which an AI player is rewarded for lining up three Xs or Os, or as complex as an urban setting in which a self-driving car is rewarded for avoiding collisions, obeying traffic rules, and reaching its destination. The agent starts by taking random actions. As it receives feedback from its environment, it finds sequences of actions that provide better rewards.

In both cases, as the scientists acknowledge, machine learning models require huge labor. Labeled datasets are hard to come by, especially in specialized fields that don’t have public, open-source datasets, which means they need the hard and expensive labor of human annotators. And complicated reinforcement learning models require massive computational resources to run a vast number of training episodes, which makes them available to a few, very wealthy AI labs and tech companies.

Bengio, Hinton, and LeCun also acknowledge that current deep learning systems are still limited in the scope of problems they can solve. They perform well on specialized tasks but “are often brittle outside of the narrow domain they have been trained on.” Often, slight changes such as a few modified pixels in an image or a very slight alteration of rules in the environment can cause deep learning systems to go astray.

The brittleness of deep learning systems is largely due to machine learning models being based on the “independent and identically distributed” (i.i.d.) assumption, which supposes that real-world data has the same distribution as the training data. i.i.d also assumes that observations do not affect each other (e.g., coin or die tosses are independent of each other).

“From the early days, theoreticians of machine learning have focused on the iid assumption… Unfortunately, this is not a realistic assumption in the real world,” the scientists write.

Real-world settings are constantly changing due to different factors, many of which are virtually impossible to represent without causal models. Intelligent agents must constantly observe and learn from their environment and other agents, and they must adapt their behavior to changes.

“[T]he performance of today’s best AI systems tends to take a hit when they go from the lab to the field,” the scientists write.

The i.i.d. assumption becomes even more fragile when applied to fields such as computer vision and natural language processing, where the agent must deal with high-entropy environments. Currently, many researchers and companies try to overcome the limits of deep learning by training neural networks on more data, hoping that larger datasets will cover a wider distribution and reduce the chances of failure in the real world.

Deep learning vs hybrid AI

The ultimate goal of AI scientists is to replicate the kind of general intelligence humans have. And we know that humans don’t suffer from the problems of current deep learning systems.

“Humans and animals seem to be able to learn massive amounts of background knowledge about the world, largely by observation, in a task-independent manner,” Bengio, Hinton, and LeCun write in their paper. “This knowledge underpins common sense and allows humans to learn complex tasks, such as driving, with just a few hours of practice.”

Elsewhere in the paper, the scientists note, “[H]umans can generalize in a way that is different and more powerful than ordinary iid generalization: we can correctly interpret novel combinations of existing concepts, even if those combinations are extremely unlikely under our training distribution, so long as they respect high-level syntactic and semantic patterns we have already learned.”

Scientists provide various solutions to close the gap between AI and human intelligence. One approach that has been widely discussed in the past few years is hybrid artificial intelligence that combines neural networks with classical symbolic systems. Symbol manipulation is a very important part of humans’ ability to reason about the world. It is also one of the great challenges of deep learning systems.

Bengio, Hinton, and LeCun do not believe in mixing neural networks and symbolic AI. In a video that accompanies the ACM paper, Bengio says, “There are some who believe that there are problems that neural networks just cannot resolve and that we have to resort to the classical AI, symbolic approach. But our work suggests otherwise.”

The deep learning pioneers believe that better neural network architectures will eventually lead to all aspects of human and animal intelligence, including symbol manipulation, reasoning, causal inference, and common sense.

Promising advances in deep learning

In their paper, Bengio, Hinton, and LeCun highlight recent advances in deep learning that have helped make progress in some of the fields where deep learning struggles.

One example is the Transformer, a neural network architecture that has been at the heart of language models such as OpenAI’s GPT-3 and Google’s Meena. One of the benefits of Transformers is their capability to learn without the need for labeled data. Transformers can develop representations through unsupervised learning, and then they can apply those representations to fill in the blanks on incomplete sentences or generate coherent text after receiving a prompt.

More recently, researchers have shown that Transformers can be applied to computer vision tasks as well. When combined with convolutional neural networks, transformers can predict the content of masked regions.

A more promising technique is contrastive learning, which tries to find vector representations of missing regions instead of predicting exact pixel values. This is an intriguing approach and seems to be much closer to what the human mind does. When we see an image such as the one below, we might not be able to visualize a photo-realistic depiction of the missing parts, but our mind can come up with a high-level representation of what might go in those masked regions (e.g., doors, windows, etc.). (My own observation: This can tie in well with other research in the field aiming to align vector representations in neural networks with real-world concepts.)

The push for making neural networks less reliant on human-labeled data fits in the discussion of self-supervised learning, a concept that LeCun is working on.