Hackers can now take over your computer through Microsoft Word

A new zero-day vulnerability in Microsoft Office could potentially allow hackers to take control of your computer. The vulnerability can be exploited even if you don’t actually open an infected file.

Although we’re still waiting for an official fix, Microsoft has released a workaround for this exploit, so if you frequently use MS Office, be sure to check it out.

Interesting maldoc was submitted from Belarus. It uses Word's external link to load the HTML and then uses the "ms-msdt" scheme to execute PowerShell code.

— nao_sec (@nao_sec) May 27, 2022

The vulnerability has been dubbed Follina by one of the researchers who first looked into it — Kevin Beaumont, who also wrote a lengthy post about it. It first came to light on May 27 through a tweet by nao_sec, although Microsoft allegedly first heard of it as early as April. Although no patch has been released for it just yet, Microsoft’s workaround involves disabling the Microsoft Support Diagnostic Tool (MSDT), which is how the exploit gets entry into the attacked computer.

This exploit affects primarily .rtf files, but other MS Word files can also be affected. A feature in MS Word called Templates allows the program to load and execute code from external sources. Follina relies on this in order to enter the computer and then runs a series of commands that opens up MSDT. Under regular circumstances, MSDT is a safe tool that Microsoft uses to debug various issues for Windows users. Unfortunately, in this case, it also grants remote access to your computer, which helps the exploit take control of it.

In the case of .rtf files, the exploit can run even if you don’t open the file. As long as you view it in File Explorer, Follina can be executed. Once the attacker gains control of your computer via MSDT, it’s up to them as far as what they want to do. They might download malicious software, leak files, and do pretty much everything else.

Beaumont has shared plenty of examples of the way Follina has already been exploited and found in various files. The exploit is being used for financial extortion, among other things. Needless to say — you don’t want this on your computer.

What do you do until Microsoft releases a patch?

There are a few steps you can take to stay safe from the Follina exploit until Microsoft itself releases a patch that will fix this problem. As things stand now, the workaround is the official fix, and we don’t know for a fact that anything else is sure to follow.

First and foremost, check whether your version of Microsoft Office could potentially be affected. So far, the vulnerability has been found in Office 2013, 2016, 2019, 2021, Office ProPlus, and Office 365. There is no telling whether older versions of Microsoft Office are safe, though, so it’s better to take additional steps to protect yourself.

If you’re able to avoid using .doc, .docx, and .rtf files for the time being, it’s not a bad idea. Consider switching to cloud-based alternatives like Google Docs. Only accept and download files from 100%-proven sources — which is a good guideline to live by, in general.

Last but not least, follow Microsoft’s guidance on disabling MSDT. It will require you to open the Command Prompt and run it as administrator, then input a couple of entries. If everything goes through as planned, you should be safe from Follina. Nevertheless, remember to always be cautious.

Editors’ Choice

Repost: Original Source and Author Link


The future of AI is a conversation with a computer

How would an AI writing program start an article on the future of AI writing? Well, there’s one easy way to find out: I used the best known of these tools, OpenAI’s GPT-3, to do the job for me.

Using GPT-3 is disarmingly simple. You have a text box to type into and a menu on the side to adjust parameters, like the “temperature” of the response (which essentially equates to randomness). You type, hit enter, and GPT-3 completes what you’ve written, be it poetry, fiction, or code. I tried inputting a simple headline and a few sentences about the topic, and GPT-3 began to fill in the details. It told me that AI uses “a series of autocomplete-like programs to learn language” and that these programs analyze “the statistical properties of the language” to “make educated guesses based on the words you’ve typed previously.”

So far, so good, I thought. I hit enter again, and the program added a quote from Google’s head of AI, Jeff Dean, then referenced an experimental piece of software from the 1960s before promising that an “AI Revolution” was coming that would reap immense rewards across the fields of science, technology, and medicine.

Fine, I thought. Then I thought a little more and did some googling. I soon discovered that the quote from Dean was made up, that the experimental software never existed, and while the promise of an “AI Revolution” was all well and good, it wasn’t any different from the vague nonsense found in hype-filled press releases. Really, what was most revealing about the future of AI was not what GPT-3 said but how it said it. The medium is the message, as Marshall McLuhan pointed out many years ago. And here, the medium included plausible fabrications; endless output; and, crucially, an opportunity to respond to the robot writer.

If we’re looking ahead at the next 10 years of AI development, trying to predict how we will interact with increasingly intelligent software, it helps to consider those tools that can talk back. AI writing models may only be digital parrots, able to copy form without understanding meaning, but they still create a dialogue with the user. This is something that often seems missing from the introduction of AI systems like facial recognition algorithms (which are imposed upon us) or self-driving cars (where the public becomes the test subject in a dangerous experiment). With AI writing tools, there is the possibility for a conversation.

If you use Gmail or Google Docs, then you’ve probably already encountered this technology. In Google’s products, AI editors lurk in the blank space in front of your cursor, manifesting textual specters that suggest how to finish a sentence or reply to an email. Often, their prompts are just simple platitudes — ”Thanks!”, “Great idea!”, “Let’s talk next week!” — but sometimes these tools seem to be taking a stronger editorial line, pushing your response in a certain direction. Such suggestions are intended to be helpful, of course, but they seem to provoke annoyance as frequently as gratitude.

To understand how AI systems learn to generate such suggestions, imagine being given two lists of words. One starts off “eggs, flour, spatula,” and the other goes “paint, crayons, scissors.” If you had to add the items “milk” and “glitter” to these lists, which would you choose and with how much confidence? And what if that word was “brush” instead? Does that belong in the kitchen, where it might apply an egg wash, or is it more firmly located in the world of arts-and-crafts? Quantifying this sort of context is how AI writing tools learn to make their suggestions. They mine vast amounts of text data to create statistical maps of the relationships between words, and use this information to complete what you write. When you start typing, they start predicting which words should come next.

Features like Gmail’s Smart Reply are only the most obvious example of how these systems — often known as large language models — are working their way into the written world. AI chatbots designed for companionship have become increasingly popular, with some, like Microsoft’s Chinese Xiaoice, attracting tens of millions of users. Choose-your-own-adventure-style text games with AI dungeon masters are attracting users by letting people tell stories collaboratively with computers. And a host of startups offer multipurpose AI text tools that summarize, rephrase, expand, and alter users’ input with varying degrees of competence. They can help you to write fiction or school essays, say their creators, or they might just fill the web with endless spam.

The ability of the underlying software to actually understand language is a topic of hot debate. (One that tends to arrive, time and time again, at the same question: what do we mean by “understand” anyway?). But their fluency across genres is undeniable. For those enamored with this technology, scale is key to their success. It’s by making these models and their training data bigger and bigger that they’ve been able to improve so quickly. Take, for example, the training data used to create GPT-3. The exact size of the input is difficult to calculate, but one estimate suggests that the entirety of Wikipedia in English (3.9 billion words and more than 6 million articles) makes up only 0.6 percent of the total.

Relying on scale to build these systems has benefits and drawbacks. From an engineering perspective, it allows for fast improvements in quality: just add more data and compute to reap fast rewards. The size of large language models is generally measured in their number of connections, or parameters, and by this metric, these systems have increased in complexity extremely quickly. GPT-2, released in 2019, had 1.5 billion parameters, while its 2020 successor, GPT-3, had more than 100 times that — some 175 billion parameters. Earlier this year, Google announced it had trained a language model with 1.6 trillion parameters.

The difference in quality as systems get larger is notable, but it’s unclear how much longer these scaling efforts will reap rewards in quality. Boosters think that sky’s the limit — that these systems will keep on getting smarter and smarter, and that they may even be the first step toward creating a general-purpose artificial intelligence or AGI. But skeptics suggest that the AI field in general is starting to reap diminishing returns as it scales ever up.

A reliance on scale, though, is inextricably linked to the statistical approach that creates uncertainty in these models’ output. These systems have no centralized store of accepted “truths”; no embodied understanding of “what the world is like for humans” and, hence, no way to distinguish fact from fiction or to exercise common sense.

Quiz them on simple trivia, like capital cities or the birthdays of US presidents, and they are right most of the time. But to these systems, truth is simply a statistical feature of their training data. They answer questions correctly because the text they’ve been fed has presented them with the correct information with enough frequency. This means that if you push them on any given topic or stray from the most obvious fields, they will lie thoughtlessly, making up quotes, dates, biographical details, and anything else you want to hear. The same probabilistic approach also means they can stumble over common sense questions. Start quizzing them with slightly fantastical queries, and they will confidently assert, for example, that a pencil is heavier than a toaster or that a blade of grass only has one eye. Such answers reveal the gulf between statistical and embodied intelligence.

To get a better understanding of these AI language models, I’ve been playing with a variety for the past few weeks; from tools sold to copywriters to flexible, multipurpose systems like GPT-3. The experience has been dizzying. Often, I’m amazed by the fluency, insight, and creativity of these systems. As part of a project for The Verge’s 10-year anniversary, for example, I used GPT-3 to write technopagan spells for a zine, feeding it a prompt (below in bold) which it completed with a four-step ritual (of which I’m showing only the first step):

Technopaganism is a new sort of magic that blends traditional witchcraft with the power of being online all the time. What follows is a technopagan spell designed to protect your phone from spam and improve the quality of messages you receive. Step 1) Take your cell phone and go outside. Close your eyes, hold the device in front of you, and intone: “I offer you, O Spirit of Technopagan Magic, my eternal and unending electronic connection to the net. And in return I ask you for magick to combine my will with the will of the system — make me impervious to the garbage that flows through my inbox. Make all messages flow together, and I will choose among them what is right for me to see. Grant me the Finger of the Internet, and I will strike at spam, and cast out the spammers back to the dark places where they come from.”

But other times, I’m stunned by how limited these programs are. Something that’s often overlooked is just how much human curation is needed to shape their output. The text above was not the first response I received from GPT-3, and I had to go through several iterations to generate a response that was both cogent and funny. It helped, of course, that the task I’d set GPT-3 was an imaginative and open-ended one: it played into the program’s strengths (and I think GPT-3’s success in such tasks has led some users and observers to exaggerate the intelligence of these systems). Other times, though, the software produced nonsensical content even within the fanciful framing I’d given it. Another “spell” it generated in response to the same prompt was much less focused, adding fictitious social media handles, tech headlines, and non-existent URLs to the spell’s instructions:

Perform a ritual to seal the crown circle into the device. May the gods help us all, the Apple iPhone X reference for is broken! Destroy the holy symbol! —tianong xiong (@tianongxiong) September 13, 2018 To confirm the rumor: Facebook Messenger’s search bar does have a search bar hidden right next to it — Warl Darkh (@dkpt) September 12, 2018

You can argue that this is just creativity of a different sort, and that of course a proper technopagan spell would include URLs. But it’s also obvious the machine has gone off-piste.

Despite such weaknesses, there’s already talk of AI systems taking over writers’ jobs. Naturally, I wondered if a computer could write articles for The Verge (and not just this one). I played around with different models, inputting opening paragraphs into these systems and asking for story ideas. Here is some more from GPT-3 on large language models:

“The human brain is just a really fancy computer,” said Jeff Dean, a Google hardware engineer and AI expert, at a company event in 2016. “It’s a computer made of meat.”

To understand GPT-3, you first have to know a little about how the brain learns. The brain isn’t born knowing how to speak or understand speech. It must first acquire this ability through exposure to the outside world, which is why children tend to learn language so quickly, and why being raised in a household where a language isn’t spoken presents a major obstacle.

As neuroscientist Mark Changizi has shown, the process of learning a language is a massive project. “Language is one of the most complex things that brains can do,” he writes, “and it emerges from the brain’s more basic skills, like vision, hearing, and motor control.”

But how does the brain acquire all this knowledge? The short answer is: via autocomplete.

All these points make sense if you’re not concentrating too hard, but they don’t flow from sentence to sentence. They never follow an argument or build to a conclusion. And again, fabrication is a problem. Both Jeff Dean and Mark Changizi are real people who have been more or less correctly identified (though Dean is now head of AI at Google, and Changizi is a cognitive scientist rather than a neuroscientist). But neither man ever uttered the words that GPT-3 attributed to them, as far as I can tell. Yet despite these problems, there’s also a lot to be impressed by. For example, using “autocomplete” as a metaphor to describe AI language models is both accurate and easy to understand. I’ve done it myself! But is this because it’s simply a common metaphor that others have deployed before? Is it right then to say GPT-3 is “intelligent” to use this phrase or is it just subtly plagiarizing others? (Hell, I ask the same questions about my own writing.)

Where AI language models seem best suited, is creating text that is rote, not bespoke, as with Gmail’s suggested replies. In the case of journalism, automated systems have already been integrated into newsrooms to write “fill in the blanks” stories about earthquakes, sporting events, and the like. And with the rise of large AI language models, the span of content that can be addressed in this way is expanding.

Samanyou Garg is the founder of an AI writing startup named Writesonic, and says his service is used mostly by e-commerce firms. “It really helps [with] product descriptions at scale,” says Garg. “Some of the companies who approach us have like 10 million products on their website, and it’s not possible for a human to write that many.” Fabian Langer, founder of a similar firm named AI Writer, tells The Verge that his tools are often used to pad out “SEO farms” — sites that exist purely to catch Google searches and that create revenue by redirecting visitors to ads or affiliates. “Mostly, it’s people in the content marketing industry who have company blogs to fill, who need to create content,” said Langer. “And to be honest, for these [SEO] farms, I do not expect that people really read it. As soon as you get the click, you can show your advertisement, and that’s good enough.”

It’s this sort of writing that AI will take over first, and which I’ve started to think of as “low-attention” text — a description that applies to both the effort needed to create and read it. Low-attention text is not writing that makes huge demands on our intelligence, but is mostly functional, conveying information quickly or simply filling space. It also constitutes a greater portion of the written world than you might think, including not only marketing blogs but work interactions and idle chit-chat. That’s why Gmail and Google Docs are incorporating AI language models’ suggestions: they’re picking low-hanging fruit.

A big question, though, is what effect will these AI writing systems have on human writing and, by extension, our culture? The more I’ve thought about the output of large language models, the more it reminds me of geofoam. This is a building material made from expanded polystyrene that is cheap to produce, easy to handle, and packed into the voids left over by construction projects. It is incredibly useful but somewhat controversial, due to its uncanny appearance as giant polystyrene blocks. To some, geofoam is an environmentally-sound material that fulfills a specific purpose. To others, it’s a horrific symbol of our exploitative relationship with the Earth. Geofoam is made by pumping oil out of the ground, refining it into cheap matter, and stuffing it back into the empty spaces progress leaves behind. Large language models work in a similar way: processing the archaeological strata of digital text into synthetic speech to fill our low-attention voids.

For those who worry that much of the internet is already “fake” — sustained by botnets, traffic farms, and automatically generated content — this will simply mark the continuation of an existing trend. But just as with geofoam, the choice to use this filler on a wide scale will have structural effects. There is ample evidence, for example, that large language models encode and amplify social biases, producing text that is racist and sexist, or that repeats harmful stereotypes. The corporations in control of these models pay lip service to these problems but don’t think they present serious problems. (Google famously fired two of its AI researchers after they published a detailed paper describing these issues.) And as we offload more of the cognitive burden of writing onto machines, making our low-attention text no-attention text, it seems plausible that we, in turn, will be shaped by the output of these models. Google already uses its AI autocomplete tools to suggest gender-neutral language (replacing “chairman” with “chair,” for example), and regardless of your opinion on the politics of this sort of nudge, it’s worth discussing what the end-point of these systems might be.

In other words: what happens when AI systems trained on our writing start training us?

Despite the problems and limitations of large language models, they’re already being embraced for many tasks. Google is making language models central to its various search products; Microsoft is using them to build automated coding software, and the popularity of apps like Xiaoice and AI Dungeon suggests that the free-flowing nature of AI writing programs is no hindrance to their adoption.

Like many other AI systems, large language models have serious limitations when compared with their hype-filled presentations. And some predict this widespread gap between promise and performance means we’re heading into another period of AI disillusionment. As the roboticist Rodney Brooks put it: “just about every successful deployment [of AI] has either one of two expedients: It has a person somewhere in the loop, or the cost of failure, should the system blunder, is very low.” But AI writing tools can, to an extent, avoid these problems: if they make a mistake, no one gets hurt, and their collaborative nature means human curation is often baked in.

What’s interesting is considering how the particular characteristics of these tools can be used to our advantage, showing how we might interact with machine learning systems, not in a purely functional fashion but as something exploratory and collaborative. Perhaps the most interesting single use of large language models to date is a book named Phamarko AI: a text written by artist and coder K Allado-McDowell as an extended dialogue with GPT-3.

To create Phamarko AI, Allado-McDowell wrote and GPT-3 responded. “I would write into a text field, I would write a prompt, sometimes that would be several paragraphs, sometimes it would be very short, and then I would generate some text from the prompt,” Allado-McDowell told The Verge. “I would edit the output as it was coming out, and if I wasn’t interested in what it was saying, I would cut that part and regenerate, so I compared it to pruning a plant.”

The resulting text is esoteric and obscure, discussing everything from the roots of language itself to the concept of “hyper-dimensionality.” It is also brilliant and illuminating, showing how writing alongside machines can shape thought and expression. At different points, Allado-McDowell compares the experience of writing using GPT-3 to taking mushrooms and communing with gods. They write: “A deity that rules communication is an incorporeal linguistic power. A modern conception of such might read: a force of language from outside of materiality.” That force, Allado-McDowell suggests, might well be a useful way to think about artificial intelligence. The result of communing with it is a sort of “emergence,” they told me, an experience of “being part of a larger ecosystem than just the individual human or the machine.”

This, I think, is why AI writing is so much more exciting than many other applications of artificial intelligence: because it offers the chance for communication and collaboration. The urge to speak to something greater than ourselves is evident in how these programs are being embraced by early adopters. A number of individuals have used GPT-3 to talk to dead loved ones, for example, turning its statistical intelligence into an algorithmic ouija board. Though such experiments also reveal the limitations. In one of these cases, OpenAI shut down a chatbot shaped to resemble a developer’s dead fiancée because the program didn’t conform to the company’s terms of service. That’s another, less promising reality of these systems: the vast majority are owned and operated by corporations with their own interests, and they will shape their programs (and, in turn, their users) as they see fit.

Despite this, I’m hopeful, or at least curious, about the future of AI writing. It will be a conversation with our machines; one that is diffuse and subtle, taking place across multiple platforms, where AI programs linger on the fringes of language. These programs will be unseen editors to news stories and blog posts, they will suggest comments in emails and documents, and they will be interlocutors that we even talk to directly. It’s impossible that this exchange will only be good for us, and that the deployment of these systems won’t come without problems and challenges. But it will, at least, be a dialogue.

Repost: Original Source and Author Link


Video-level computer vision advances business insights

This article was contributed by Can Kocagil, data scientist at OREDATA.

From spatial to spatiotemporal visual processing

Instance-based classification, segmentation, and object detection in images are fundamental issues in the context of computer vision. Different from image-level information retrieval, the video-level problems aim at detection, segmentation, and tracking of object instances in spatiotemporal domain that have both space and time dimensions.

Video domain learning is a crucial task for spatiotemporal understanding in camera and drone-based systems with applications in video-editing, autonomous driving, pedestrian tracking, augmented reality, robot vision, and a lot more. Furthermore, it helps us to decode spatiotemporal raw data to actionable insights along with the video, as it has richer content compared to visual-spatial data. With the addition of temporal dimension to our decoding process, we get further information about

  • Motion
  • Viewpoint variations
  • Illuminations
  • Occlusions
  • Deformations
  • Local ambiguities

from the video frames. Because of this, video-level information retrieval has gained popularity as a research area, and it attracts the community along the lines of research for video understanding.

Conceptually speaking, video-level information retrieval algorithms are mostly adapted from image-level processes by adding additional heads to capture temporal information. Aside from simpler video-level classification and regression tasks, video object detection, video object tracking, video captioning, and video instance segmentation are the most common tasks.

To start with, let’s recall the image-level instance segmentation problem.

Image-level instance segmentation

Instance segmentation not only groups pixels into different semantic classes, but also groups them into different object instances. A two-stage paradigm is usually adopted, which first generates object proposals using a Region Proposal Network (RPN), and then predicts object bounding boxes and masks using aggregated RoI features. Different from semantic segmentation, which segments different semantic classes only, instance segmentation also segments the different instances of each class.

Instance segmentation example

Above: Left figure: Semantic segmentation. Right figure: Instance segmentation.

Video classification

The video classification task is a direct adaptation of image classification to the video domain. Instead of giving images as inputs, video frames are given to the model to learn from. By nature, the sequences of images that are temporally correlated are given to learning algorithms that incorporate features of both spatial and temporal visual information to produce classification scores.

The core idea is that, given specific video frames, we want to identify the type of video from pre-defined classes.

Video captioning

Video captioning is the task of generating captions for a video by understanding the action and event in the video, which can help in the retrieval of the video efficiently through text. The idea here is that, given specific video frames, we want to generate natural language that describes the concept and context of the video.

Video Captioning Example

Above: Video captioning example

Image Credit: Can Kocagil

Video captioning is a multidisciplinary problem that requires algorithms from both computer vision (to extract features) and natural language processing (to map extracted features to natural language).

Video object detection (VOD)

Video object detection aims to detect objects in videos, which was first proposed as part of the ImageNet visual challenge. Even though the association and providing of identity improves the detection quality, this challenge is limited to spatially preserved evaluation metrics for per-frame detection and does not require joint object detection and tracking. However, there is no joint detection, segmentation, and tracking as opposed to video-level semantic tasks.

Video Object Detection Example

Above: Video object detection

Image Credit: Can Kocagil

The difference between image-level object detection and video object detection is that the time series of images are given to the machine learning model, which contains temporal information as opposed to image-level processes.

Video object tracking (VOT)

Video object tracking is the process of both localizing the objects and tracking them across the video. Given an initial set of detections in the first frame, the algorithm generates a unique ID for each object in each timestamp and tries to successfully match them across the video. For instance, if I say that the particular object has an ID of “P1” in the first frame, the model tries to predict the ID of “P1” of that particular object in the remaining frames.

Video object tracking tasks are generally categorized as detection-based and detection-free tracking approaches. In detection-based tracking algorithms, objects are jointly detected and tracked such that the tracking part improves the detection quality, whereas in detection-free approaches we’re given an initial bounding box and try to track that object across video frames.

Video Object Tracking example

Above: Video object tracking

Video instance segmentation (VIS)

Video instance segmentation is the recently introduced computer vision research topic that aims at joint detection, segmentation, and tracking of instances in the video domain. Because the video instance segmentation task is supervised, it requires human-oriented high-quality annotations for bounding boxes and binary segmentation masks with predefined categories. It requires both segmentation and tracking, and it is a more challenging task compared to image-level instance segmentation. Hence, as opposed to previous fundamental computer vision tasks, video instance segmentation requires multidisciplinary and aggregated approaches. VIS is like a contemporary all-in-one computer vision task that is the composition of general vision problems.

Video Instance Segmentation Prediction example

Above: Video instance segmentation prediction

Image Credit: Can Kocagil

Knowledge brings value: Video-level information retrieval in action

Acknowledging the technical boundaries of video-level information retrieval tasks will improve the understanding of business concerns and customer needs from a practical perspective. For example, when a client says, “we have videos and want to extract only the locations of pedestrians from the videos,” you’ll recognize that your task is video object detection. What if they want to both localize and track them in videos? Then your problem is translated to the video object tracking task. Let’s say that they also want to segment them across videos. Your task is now video instance segmentation. However, if a client says that they want to generate automatic captions for videos, from a technical point of view, your problem can be formulated as video captioning. Understanding the scope of the project and drawing technical business requirements depends on the kind of insights clients want to derive, and it is crucial for technical teams to formulate the issue as an optimization problem.

This article was contributed by Can Kocagil, data scientist at OREDATA.


Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers

Repost: Original Source and Author Link


MLOps platform Landing AI raises $57M to help manufacturers adopt computer vision

Join gaming leaders online at GamesBeat Summit Next this upcoming November 9-10. Learn more about what comes next. 

Palo Alto, California-based Landing AI, the AI startup led by Andrew Ng — the cofounder of Google Brain, one of Google’s AI research divisions — today announced that it raised $57 million in a series A funding round led by McRock Capital. In addition, Insight Partners, Taiwania Capital, Canadian Pension Plan Investment Board, Intel Capital, Samsung Catalyst Fund, Far Eastern Group’s DRIVE Catalyst, Walsin Lihwa, and AI Fund participated, bringing Landing AI’s total raised to around $100 million.

The increased use of AI in manufacturing is dovetailing with the broader corporate sector’s embrace of digitization. According to Google Cloud, 76% of manufacturing companies turned to data and analytics, cloud, and AI technologies due to the pandemic. As pandemic-induced challenges snarl the supply chain, including skilled labor shortages and transportation disruptions, the adoption of AI is likely to accelerate. Deloitte reports that 93% of companies believe that AI will be a pivotal component in driving growth and innovation in manufacturing.

Landing AI was founded in 2o17 by Ng, an adjunct professor at Stanford, formerly an associate professor and director of the university’s Stanford AI Lab. Landing AI’s flagship product is LandingLens, a platform that allows companies to build, iterate, and deploy AI-powered visual inspection solutions for manufacturing.

“AI will transform industries, but that means it needs to work with all kinds of companies, not just those with millions of data points to feed into AI engines. Manufacturing problems often have dozens or hundreds of data points. LandingLens is designed to work even on these small data problems,” Ng told VentureBeat via email. “In consumer internet, a single, monolithic AI system can serve billions of users. But in manufacturing, each manufacturing plant might need its own AI model. By enabling domain experts, rather than only AI experts, to build these AI systems, LandingLens is democratizing access to cutting-edge AI.”

Deep background in AI

Ng, who previously served as chief scientist at Baidu, is an active entrepreneur in the AI industry. After leaving Baidu, he launched an online curriculum of classes centered around machine learning called, and soon after incorporated the company Landing AI.

While at Stanford, Ng started the Stanford Engineering Everywhere, a compendium of freely available online courses, which served as the foundation for Coursera. Ng is currently the chairman of AI cognitive behavioral therapy startup Woebot; sat on the board of Apple-owned driverless car company, and has written several guides and online training courses that aim to demystify AI for business executives.

Three years ago, Ng unveiled the AI Fund, a $175 million incubator that backs small teams of experts looking to solve key problems using AI. In a Medium post announcing the fund, which was an early investor in Landing AI, Ng wrote that he wants to “develop systematic and repeatable processes to initiate and pursue new AI opportunities.”


Landing AI focuses on MLOps, the discipline involving collaboration between data scientists and IT professionals with the aim of productizing AI systems. A compound of “machine learning” and “information technology operations,” the market for such solutions could grow from a nascent $350 million to $4 billion by 2025, according to Cognilytica.

LandingLens provides low-code and no-code visual inspection tools that enable computer vision engineers to train, test, and deploy AI systems to edge devices like laptops. Users create a “defect book” and upload their media. After labeling the data, they can divide it into “training” and “validation” subsets to create and evaluate a model before deploying it into production.

Landing AI

Above: Landing AI’s development dashboard.

Labeled datasets, such as pictures annotated with captions, expose patterns to AI systems, in effect telling machines what to look for in future datasets. Training datasets are the samples used to create the model, while test datasets are used to measure their performance and accuracy.

“For instance … [Landing AI] can help manufacturers more readily identify defects by working with the small data sets the companies have … or spot patterns in a smattering of health care diagnoses,” a spokesperson from Landing AI explained to VentureBeat via email. “Overcoming the ‘big data’ bias to instead concentrate on ‘good data’ — the food for AI — will be critical to unlocking the power of AI in ever more industries.”

On its website, Landing AI touts LandingLens as a tailored solution for OEMs, system integrators, and distributors to evaluate model efficacy for a single app or as part of a hybrid solution, combined with traditional systems. In manufacturing, Landing AI supports uses cases like assembly inspection, processing monitoring, and root cause analysis. But the platform can also be used to develop models in industries like automotive, electronics, agriculture, retail — particularly for tasks involving glass and weld inspection, wafer and die inspection, automated picking and weeding, identifying patterns and trends to generate customer insights.

“A data-centric AI approach [like Landing AI’s] involves building AI systems with quality data — with a focus on ensuring that the data clearly conveys what the AI must learn,” Landing AI writes on its website. “Quality managers, subject-matter experts, and developers can work together during the development process to reach a consensus on defects and labels build a model to analyze results to make further optimizations … Additional benefits of data-centric AI include the ability  for teams to develop consistent methods for collecting and labeling images and for training, optimizing, and updating the models … Landing AI’s AI deep learning workflow simplifies the development of automated machine solutions that identify, classify, and categorize defects while improving production yield.”

With upwards of 82% of firms saying that custom app development outside of IT is important, Gartner predicts that 65% of all apps — including AI-powered apps — will be created using low-code platforms by 2024. Another study reports that 85% of 500 engineering leads think that low-code will be commonplace within their organizations as soon as the end of this year, while one-third anticipates that the market for low- and no-code will climb to between $58.8 billion and $125.4 billion in 2027.

Landing AI competes with, Comet, Domino Data Lab, and others in the burgeoning MLOps and machine learning lifecycle management segment. But investors like Insight Partners’ George Mathew believe that the startup’s platform offers enough to differentiate it from the rest of the pack. Landing AI’s customers include battery developer QuantumScape and life sciences company Ligand Pharmaceuticals, which says it’s using LandingLens to improve its cell screening technologies. Manufacturing giant Foxconn is another client — Ng says that Landing AI has been working with since June 2017 to “develop AI technologies, talent, and systems that build on the core competencies of the two companies.”

“Digital modernization of manufacturing is rapidly growing and is expected to reach $300 billion by 2023,” Mathew explained in a press release. “The opportunity and need for Landing AI is only exploding. It will unlock the untapped segment of targeted machine vision projects addressing quality, efficiency, and output. We’re looking forward to playing a role in the next phase of Landing AI’s exciting journey.”


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link


Windows 11 Runs on a 15-Year Old Intel Pentium Computer

Amid Microsoft’s statements that Windows 11 was made for newer machines, creative users continue to prove that you can run Microsoft’s latest operating system on most computers. This time, a Twitter user managed to successfully install and run Windows 11 on an Intel Pentium 4-based system.

The news emerged when Twitter user Carlos S.M. posted screenshots, and later a video, of his computer running Windows 11. The video includes benchmarks that prove just how old all the components are, starting with the 15-year-old processor.

To clarify: This is not just a budget PC — it almost belongs in a museum. The specifications clearly prove just how ancient it is. It has an Intel Pentium 4 661 processor that runs at 3.6GHz and was first released in 2006. The chip has just one core, which is below the minimum that Microsoft tries to enforce.

It’s not just the CPU that is past its retirement in this build. The computer has 4GB of DDR2 RAM that runs at 800MHz. That’s right — DDR2, first launched in 2003. This is combined with an Asus P5Q motherboard, released in 2008. There are, however, two much more modern components, as the PC includes an Nvidia GeForce GT 710 graphics card (2016) and a 120GB SSD.

Against all odds, Carlos S.M. managed to install Windows 11 successfully using the Windows 10 PE installer. The video shows that booting the system, as well as navigating the settings menu, takes a long time — but that should come as no surprise given that some components are nearing their 16th birthday. The system still managed to run several programs, including CPU-Z, a modern benchmarking tool.

Technically, installing Windows 11 on Intel Pentium 4 661 shouldn’t have been possible. Microsoft’s support documentation states that the new OS requires at least a dual-core CPU with a clock of 1GHz or higher. Intel Pentium 4 661 satisfies only the clock requirement, but the processor was still accepted by the Windows 11 PC Health tool.

One of Microsoft’s main requirements for running Windows 11 is that the PC must possess TPM 2.0, a security feature only present on newer machines. Tthe company shared a full list of supported processors, and it only goes as far back as Intel Coffee Lake (launched in 2017) and AMD Ryzen 2000 (released in 2018). Understandably, this caused many people to believe that their PC cannot run Windows 11 at all, but it now seems that this won’t be the case.

Although Microsoft has posted workarounds to install Windows 11 on older computers, the company warned that the OS may not be eligible to install updates. Carlos S.M. was able to install updates as scheduled, further demonstrating that Windows 11 doesn’t require a top-notch PC to be able to run.

Editors’ Choice

Repost: Original Source and Author Link


How To Record Your Screen on an Apple Mac Computer

Recording your computer screen is a handy way to capture what you’re doing on your device. On a Mac, you can use it to record MacOS gameplay or show a friend how to perform a task … because sometimes showing someone is much easier than trying to explain it.

Fortunately, MacOS includes two built-in ways to help you record your screen with ease: Using the Screenshot toolbar or using Apple’s QuickTime Player app. We’ll take you through both methods in this guide so you’ll know exactly how to capture videos when you need to.

Use the Screenshot toolbar

Step 1: Press Command + Shift + 5 on your keyboard. This opens your Mac’s Screenshot toolbar.

Step 2: In the center of the toolbar are two options for video recording: Record Entire Screen or Record Selected Portion. As the names suggest, you can click the left-most of the two buttons to record the whole screen or the one on the right to record just a portion.

Step 3: Click Options to tweak how it records. From here, you can screen-record on your Mac with audio. Under the Microphone heading, just choose a plugged-in mic, and when you record, your voice will be included.

The MacOS Screenshot toolbar showing its options menu.

Step 4: If you chose to record the whole screen, just click anywhere on your display to begin the recording. Alternatively, click the Record button.

Step 5: If you selected Record Selected Portion, you will see a box on-screen showing what will be captured. Click and drag the handles at the edges of this box to adjust what is recorded. You can move this selection to wherever you want. When you’re ready, click Record.

A portion of a MacOS screen that will be recorded.

Step 6: When you’re finished, click the Stop button in the menu bar, or press Control + Command + Esc. A thumbnail will appear in the bottom-right corner of your desktop. Ignore it or swipe it right to save the video. Click the Thumbnail to open the recording, then click the Trim button (to the left of the Done button) to cut it if necessary. Control-Click the thumbnail to get some options, like opening it in an app or changing where recordings are saved.

A thumbnail of a saved screen recording on the MacOS desktop.

Use QuickTime Player

Step 1: Open QuickTime Player from the Applications folder, from Launchpad, or by pressing Command + Space and typing the app’s name.

A Spotlight window in MacOS showing results for QuickTime Player.

Step 2: Click File > New Screen Recording or press Control + Command + N.

QuickTime Player on a Mac showing a menu with an option to start a new screen recording.

Step 3: This opens the same Screenshot toolbar as in the section above. Click Record Entire Screen or Record Selected Portion, then click Record.

The MacOS Screenshot toolbar showing two buttons used to record a Mac's screen.

Step 4: When you’re finished, click the Stop button in the menu bar or press Control + Command + Esc.

Step 5: Unlike recording using the Command + Shift + 5 shortcut, QuickTime will automatically open the recording and save it to your chosen location (by default, this is the desktop). Here, you can view, edit, or share it.

A screen recording in QuickTime Player, with a menu open showing an option to trim the video.

Editors’ Choice

Repost: Original Source and Author Link


Computer vision platform Cogniac nabs $20M to bolster its customer acquisition efforts

Cogniac, a San Jose, California-based startup developing computer vision tech for task automation, today announced that it raised $20 million in a series B1 financing round led by National Grid Partners with participation from National Grid, Autotech Ventures, Cisco Investments, Energy Innovation Capital, London Technology Club, Vanedge Capital, and Wing Venture Capital. CEO Chuck Myers says that the proceeds will be put toward the expansion of Cogniac’s workforce and the ramp-up of R&D efforts to support the company’s approach to computer vision, data storage, and “human-AI interactivity.”

Computer vision is a type of AI technology that allows machines to understand, categorize, and differentiate between images. Using photos from cameras and videos as well as deep learning components, computer vision can identify and classify objects and then react to what it “sees.”

Investments in computer vision startups are on the rise as businesses embrace automation during the pandemic, which continues to place a strain on the worldwide labor market. Despite not having passed the “awareness phase,” as per one survey, the computer vision market could grow from $10.9 billion in 2019 to $17.4 billion by 2024. External investments in computer vision startups have already far exceeded the $3.5 billion McKinsey estimated in 2016.


Above: Cogniac’s computer vision platform.

Image Credit: Cogniac

Cogniac’s AI platform has customers connect machine vision cameras, security cameras, drones, smartphones, and other sources and define objects and conditions of interest to them. They might specify surface damage and supply chain quality control inspections, for example, or accident prevention and real-time physical threat detection. Cogniac then monitors and improves classification, identification, counting, and measuring through a feedback system while integrating with third-party apps to deliver alerts and notifications.

Cogniac generates custom AI models for scenarios based on imagery and feedback. Once deployed, these models can learn new characteristics, adapting based either on archival imagery or data users enter. The platform monitors the confidence level of reach new prediction, prioritizing predictions with the lowest level for review while a core learning engine searches for configuration variations, ostensibly lessening the need for manual intervention.

Cogniac claims that with deep convolutional neural networks — types of AI models often applied to analyzing visual imagery — its system can achieve accuracy over 90% prior to human corrections. Moreover, the startup says the technology enables its platform to support multiple deployment environments, including cloud, gateway, on-premises, and hybrid.

Promise and pitfalls

Tasks in manufacturing, which is one of Cogniac’s key markets, can be error-prone when humans are in the loop. A study from Vanson Bourne found that 23% of all unplanned downtime in manufacturing is the result of human error, compared with rates as low as 9% in other segments. The $327.6 million Mars Climate Orbiter spacecraft was destroyed because of a failure to properly convert between units of measurement. And one pharma company reported a misunderstanding that resulted in an alert ticket being overridden, which cost four days on the production line at £200,000 ($253,946) per day.

And broadly speaking, computer vision can be used for nefarious purposes, like monitoring the responses of ride-hailing customers to in-car advertisements. This summer, AnyVision, a controversial Israeli facial recognition startup, raised $235 million in venture capital from SoftBank and Eldridge Industries. Public records and a 2019 version of its user guide show how invasive AnyVision’s software can be — one school using it saw that a student’s face was captured more than 1,000 times during the week.

Cogniac — a member of Nvidia’s Inception accelerator program, with partners including SAP and Rockwell Automation — has controversially provided its software to the U.S. Army to analyze battlefield drone data. The company has also participated in trials with U.S. Customs and Border Protection and helped an Arizona sheriff’s department to identify when people cross the U.S.-Mexico border — and expressed an openness to larger deployments down the line.

Of course, Cogniac isn’t alone in this — machine learning, computer vision, and facial recognition vendors including TrueFace, Clearview AI, TwoSense, and AI.Reverie also have contracts with various U.S. military and law enforcement branches. But according to Cogniac cofounder Bill Kish, government contracts are a small portion of the company’s business, which is primarily focused on industrial applications.

One Cogniac client is Georgia Pacific, which is finalizing the deployment of a solution that simplifies processes around the company’s mill operations. Another is Bobcat, which says it’s implementing Cogniac’s platform within the manufacturing warehouse kitting inspection workflows in warehouses across Otsego, Minnesota facilities. (Kitting refers to compiling products into a single “kit” that’s then shipped to a customer.) More recently, Cogniac announced a partnership with Trimac Transportation, a transportation service company based in North America, to deploy the startup’s technology throughout Trimac’s document identification and filing processes.

On the subject of bias that might arise in Cogniac’s models from imbalanced datasets, Kish says the company employs a process in which multiple people review uncertain data to establish a consensus. The company’s system acts as a source of record for managing assets, ensuring biases inherent in the visual data are spotlighted so they can be addressed through feedback.

“We’re at a key inflection point for AI vision adoption in the industrial and manufacturing sectors,” Myers said in a statement. “Our product’s efficacy and ease of implementation offer our customers significant and material improvement to their workstreams and processes. This funding allows us to scale our operations to meet the needs of this currently nascent but massively important and growing space. AI vision will serve as the foundation of safety and efficiency for the future of logistics and manufacturing, and we’re leading the creation of that infrastructure and operation standard.”

To date, Cogniac has raised over $30 million in venture capital.


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link


Computer vision startup Zebra Medical Vision sells for $200M

The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. Register now!

Zebra Medical Vision, a computer vision startup focused on health care, today announced it has entered into an agreement to be acquired by publicly traded health firm Nanox. Terms of the deal weren’t disclosed, but a Zebra spokesperson said it was expected to be valued at around $200 million — $100 million upfront and another $100 million tied to specific milestones, all in stock.

Computer vision, which deals with algorithms that can gain a high-level understanding of images and videos, is being applied across a range of medical domains. While some research has raised concerns about bias, startups and incumbents are chasing after a market that’s anticipated to be worth $5.31 billion by 2026, according to Verified Market Research. They argue that a shortage of professionals — the U.S. Bureau of Labor Statistics projects only a 9% increase in the number of radiologic and MRI technicians by 2028 — will necessitate scalable computer vision technologies. Moreover, the companies claim, computer vision has the potential to reduce labor costs, as well as medical imaging workloads.

Zebra Medical was founded in 2014 by Elad Benjamin, Eyal Gura, and Eyal Toledano to help patients, physicians, and health care providers use computer vision tools to diagnose bone, liver, lung, and cardiovascular diseases. The startup delivers what it calls one of the largest open clinical research platforms globally, enabling researchers to access millions of anonymized, indexed clinical records for scientific discovery. Zebra also developed an analytics solution that provides algorithms and clinical insight decision support tools to health care institutions via a software-as-a-service-based model.

Above: A screenshot of one of Zebra Medical Vision’s diagnostic imaging tools.

Beyond this, Zebra hosts a data repository with over 2 million medical images and has U.S. Food and Drug Administration (FDA)-cleared and CE-marked solutions, including seven FDA-cleared and 10 CE-marked AI solutions for medical imaging — the most recent being a 3D modeling product for x-ray images used for preoperative orthopedic surgery planning. In partnership with several radiological industry associations, Zebra in July lobbied the American Medical Association to allow insurers to reimburse clinicians using the company’s AI in vertebral compression fracture (VCF) detection screenings.

Zebra, which had raised $57.4 million in venture capital, counts over 1,100 hospitals, academic institutions, and care providers among its customers, including InterMountain Healthcare, Johnson & Johnson, Nuance, Nvidia, and the University of Oxford.

Pivoting focus

Ahead of the acquisition, Zebra pivoted from focusing on diagnosis and triage to leveraging its data to help health care systems evaluate large volumes of patients for chronic conditions. Should the acquisition be completed, CEO Zohar Elhanani says Zebra will combine its capabilities with the acquiring company’s strategy to “accelerate the population health vision” and make medical imaging “more efficient.”

“Zebra Medical Vision has always operated with the goal of expanding the use of AI in medical imaging to improve health outcomes for patients worldwide,” Elhanani said. “At this time, we understand that that vision is best served by joining forces with a trusted partner with the means to boost our capabilities and propel population health, powered by AI, to the next level. Screening populations to detect and treat chronic disease early has proven to improve outcomes, and we’re thrilled to be taking the helm of the population health transformation in health care.”

Nanox also announced today that it has entered into a binding letter of intent to acquire USARAD and its related company, Medical Diagnostics Web, for $30 million in cash and stock. USARAD operates a network of 300 radiologists across health centers, urgent care facilities, and other providers, which Nanox says will provide it access to trained radiologists — lowering the barrier to U.S. market entry and other countries around the globe.

Nanox, which was founded in 2016 by Japanese venture capital tycoon Hitoshi Masuya, hopes to reinvent the x-ray with hardware inspired by Star Trek’s biobed. Its product, called the Arc, is designed to promote the early detection of conditions discoverable by computed tomography (CT), mammography, fluoroscopy, angiogram, and other imaging modalities. A cloud-based software dubbed Nanox.Cloud complements the Arc with value-added services, including a scan repository, radiologist matching, online and offline diagnostic review and annotation, connectivity to diagnostic assistive AI systems, billing, and reporting.

“Expanding access to medical imaging via widespread deployment of the Nanox Arc solves one of the obstacles to achieving true population health management,” Nanox CEO Ran Poliakine said in a press release. “Yet the global shortage of trained radiologists represents a significant bottleneck in the imaging process. The Nanox ARC, together with the acquisitions of Zebra Medical Vision and USARAD, if consummated, would move us toward our vision of deploying our systems and have the support of a large network of radiologists empowered with highly advanced AI algorithms that will allow for the rapid interpretation of medical images into actionable medical interventions, which would represent an end-to-end, globally connected medical imaging solution.”

The pandemic spurred investments in AI across nearly every industry. According to CB Insights’ Q2 2021 report, AI startups attracted record funding — more than $20 billion — despite a drop in deal volume. Health care AI continued to have the largest AI deal share, accounting for 17% of all AI deals ($2.36 billion).


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link


How computer vision works — and why it’s plagued by bias

All the sessions from Transform 2021 are available on-demand now. Watch now.

It’s no secret that AI is everywhere, yet it’s not always clear when we’re interacting with it, let alone which specific techniques are at play. But one subset is easy to recognize: If the experience is intelligent and involves photos or videos, or is visual in any way, computer vision is likely working behind the scenes.

Computer vision is a subfield of AI, specifically of machine learning. If AI allows machines to “think,” then computer vision is what allows them to “see.” More technically, it enables machines to recognize, make sense of, and respond to visual information like photos, videos, and other visual inputs.

Over the last few years, computer vision has become a major driver of AI. The technique is used widely in industries like manufacturing, ecommerce, agriculture, automotive, and medicine, to name a few. It powers everything from interactive Snapchat lenses to sports broadcasts, AR-powered shopping, medical analysis, and autonomous driving capabilities. And by 2022, the global market for the subfield is projected to reach $48.6 billion annually, up from just $6.6 billion in 2015.

The computer vision story follows that of AI overall. A slow rise full of technical hurdles. A big boom enabled by massive amounts of data. Rapid proliferation. And then growing concern over bias and how the technology is being used. To understand computer vision, it’s important to understand how it works, how it’s being used, and both the challenges it overcame and the ones it still faces today.

How computer vision works

Computer vision allows computers to accomplish a variety of tasks. There’s image segmentation (divides an image into parts and examines them individually) and pattern recognition (recognizes the repetition of visual stimuli between images). There’s also object classification (classifies objects found in an image), object tracking (finds and tracks moving objects in a video), and object detection (looks for and identifies specific objects in an image). Additionally, there’s facial recognition, an advanced form of object detection that can detect and identify human faces.

As mentioned, computer vision is a subset of machine learning, and it similarly uses neural networks to sort through massive amounts of data until it understands what it’s looking at. In fact, the example in our machine learning explainer about how deep learning could be used to separate photos of ice cream and pepperoni pizza is more specifically a computer vision use case. You provide the AI system with a lot of photos depicting both foods. The computer then puts the photos through several layers of processing — which make up the neural network — to distinguish the ice cream from the pepperoni pizza one step at a time. Earlier layers look at basic properties like lines or edges between light and dark parts of the images, while subsequent layers identify more complex features like shapes or even faces.

This works because computer vision systems function by interpreting an image (or video) as a series of pixels, which are each tagged with a color value. These tags serve as the inputs the system process as it moves the image through the neural network.

Rise of computer vision

Like machine learning overall, computer vision dates back to the 1950s. Without our current computing power and data access, the technique was originally very manual and prone to error. But it did still resemble computer vision as we know it today; the effectiveness of first processing according to basic properties like lines or edges, for example, was discovered in 1959. That same year also saw the invention of a technology that made it possible to transform images into grids of numbers , which incorporated the binary language machines could understand into images.

Throughout the next few decades, more technical breakthroughs helped pave the way for computer vision. First, there was the development of computer scanning technology, which for the first time enabled computers to digitize images. Then came the ability to turn two-dimensional images into three-dimensional forms. Object recognition technology that could recognize text arrived in 1974, and by 1982, computer vision really started to take shape. In that same year, one researcher further developed the processing hierarchy, just as another developed an early neural network.

By the early 2000s, object recognition specifically was garnering a lot of interest. But it was the release of ImageNet, a dataset containing millions of tagged images, in 2010 that helped propel computer vision’s rise. Suddenly, a vast amount of labeled, ready-to-go data was available for anyone who wanted it. ImageNet was used widely, and most of the computer vision systems that have been built today relied on it. But while computer vision systems were popular at this point, they were still turning up a lot of errors. That changed in 2012 when a model called AlexNet, which used ImageNet, significantly reduced the error rate for image recognition, ushering in today’s field of computer vision.

Computer vision’s bias and challenges

The availability of ImageNet was transformative for the growth and adoption of computer vision. It quite literally became the basis for the industry. But it also scarred the technology in ways that are having a real impact today.

The story of ImageNet reflects a popular saying in data science and AI: “garbage in, garbage out.” In jumping to take advantage of the dataset, researchers and data scientists didn’t pause to consider where the images came from, who chose them, who labeled them, why the were labeled as they were, what images or labels may have been omitted, and the effect all of this might have on how their technology would function, let alone the impact it would have on society and people’s lives. Years later, in 2019, a study on ImageNet revealed the prevalence of bias and problematic labels throughout the dataset.

“Many truly offensive and harmful categories hid in the depth of ImageNet’s Person categories. Some classifications were misogynist, racist, ageist, and ableist. … Insults, racist slurs, and oral judgements abound,” wrote AI researcher Kate Crawford in her book Atlas of AI. And even besides these explicitly obvious harms (some of which have been removed — ImageNet is reportedly working to address various sources of bias), curious choices in terms of categories, hierarchy, and labeling have been found throughout the dataset. It’s now widely criticized for privacy violations as well, as people whose photos were used in the dataset didn’t consent to being included or labeled.

Data and algorithmic bias is one of the core issues of AI overall, but it’s especially easy to see the impact in some computer vision applications. Facial recognition technology, for example, is known to misidentify Black people, but its use is surging in retail stores. It’s also already common in policing, which has prompted protests and regulations in several U.S. cities and states.

Regulations overall are an emerging challenge for computer vision (and AI in general). It’s clear more of it is coming (especially if more of the world follows in the European Union’s path), but it’s not yet known exactly what such regulations will look like, making it difficult for researchers and companies to navigate in this moment. “There’s no standardization and it’s uncertain. For these types of things, having clarification would be helpful,” said Haniyeh Mahmoudian, DataRobot’s global AI ethicist and a winner of VentureBeat’s Women in AI responsibility and ethics award.

Computer vision has some technical challenges as well. It’s limited by hardware, including cameras and sensors. Additionally, computer vision systems are very complex to scale. And like all types of AI, they require massive amounts of computing power (which is expensive) and data. And as the entire history of computer vision makes clear, good data that is representative, unbiased, and ethically collected is hard to come by — and incredibly tedious to tag.


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link


Last Chance for Back-to-School Computer Deals at Dell

Digital Trends may earn a commission when you buy through links on our site.

Before you head back to school, it’s the perfect time to take advantage of Dell laptop deals, or student laptop deals, ensuring you have the best equipment for the year. If you’re looking for a more permanent station, whether for gaming, your dorm, or elsewhere, there are also some excellent desktop computer deals and desktop monitor deals going on.

Or, you can head on over to Dell where the back-to-school sale is still live, at least until the end of today! It’s a great opportunity to get deals on student laptops, desktops, monitors, and everything in between. You can shop the sale yourself, or check out some of the better deals we’ve found below.

New Inspiron 15 Touch Laptop (2021) – $920, was $989

The Inspiron 15 Touch Laptop is the latest release of the system, with a touchscreen display offering a 2-in-1 tablet and laptop design. It’s powered by an AMD Ryzen 7 5700U 8-core mobile processor with AMD Radeon graphics and 16GB of DDR4 RAM. The display is a 15.6-inch FHD LED-backlit touch panel with a native resolution of 1920 x 1080. You also get a 512GB M.2 solid-state drive, Intel WiFi 6 AX200, Bluetooth 5.0, and Windows 10 Home with a free upgrade to Windows 11 when it drops. Right now, Dell is offering the New Inspiron 15 Touch for $69 off (normally $989), which brings the price to $920 with free express delivery. Act soon, the deal won’t last much longer!

New XPS 13 Touch Laptop (2021) – $1,670, was $1,970

Dell XPS 13 laptop at a side angle on a white background.

Offering one of its best and latest releases for a great price, Dell’s New XPS 13 Touch Laptop is packed with power. The 13.4-inch FHD OLED display has an InfinityEdge design (narrow bezels) with a native resolution of 3456 x 2160. Inside is an 11th Gen Intel Core i7 processor with a 12MB cache and clock speeds up to 4.8GHz. That’s paired with Intel Iris Xe graphics and shared memory, 16GB of LPDDR4 RAM, and a 512GB M.2 solid-state drive. Finally, you’ll get Killer WiFi 6 AX1650, Bluetooth 5.1, and Windows 10 Home with a free upgrade to Windows 11. All of that is available for $300 off (normally $1,970), which brings the total price down to $1,670 with free express delivery. This deal won’t last long either, so hurry!

More Back-to-School Laptop Deals Available Now

Want to see what other laptop and computer sales are going on? We rounded up all of the best student laptop deals! You can check those out below.

We strive to help our readers find the best deals on quality products and services, and we choose what we cover carefully and independently. The prices, details, and availability of the products and deals in this post may be subject to change at anytime. Be sure to check that they are still in effect before making a purchase.

Digital Trends may earn commission on products purchased through our links, which supports the work we do for our readers.

Editors’ Choice

Repost: Original Source and Author Link