Crisis Text Line has decided to stop sharing conversation data with spun-off AI company Loris.ai after facing scrutiny from data privacy experts. “During these past days, we have listened closely to our community’s concerns,” the 24/7 hotline service writes in a statement on its website. “We hear you. Crisis Text Line has had an open and public relationship with Loris AI. We understand that you don’t want Crisis Text Line to share any data with Loris, even though the data is handled securely, anonymized and scrubbed of personally identifiable information.” Loris.ai will delete any data it has received from Crisis Text Line.
Politico recently reported how Crisis Text Line (which is not affiliated with the National Suicide Prevention Lifeline) is sharing data from conversations with Loris.ai, which builds AI systems designed to increase empathetic conversation by customer service reps. Crisis Text Line is a not-for-profit service that, according to Shawn Rodriguez, VP and General Counsel of Crisis Text Line, provides “mental health crisis intervention services.” It is also a shareholder in Loris.ai and, according to Politico, at one point shared a CEO with the company.
Before hotline users seeking assistance speak with volunteer counselors, they consent to data collection and can read the company’s data-sharing practices. Those volunteer counselors, which CTL calls “ Empathy MVPs,” are expected to make a commitment of “volunteering 4 hours per week until 200 hours are reached.” Politico quoted one volunteer who claimed that the people who contact the line “have an expectation that the conversation is between just the two people that are talking” and said he was terminated in August after raising concerns about CTL’s handling of data. That same volunteer, Tim Reierson, has started a Change.org petition pushing CTL “to reform its data ethics.”
Politico noted how Crisis Text Line says data use and AI play a role in how it operates:
“Data science and AI are at the heart of the organization — ensuring, it says, that those in the highest-stakes situations wait no more than 30 seconds before they start messaging with one of its thousands of volunteer counselors. It says it combs the data it collects for insights that can help identify the neediest cases or zero in on people’s troubles, in much the same way that Amazon, Facebook and Google mine trends from likes and searches.”
A recent story cherry-picked and omitted information about our data privacy policies. We want to clarify and fill in facts that were missing so people understand why ethical data privacy is foundational to our work.
Following the report, Crisis Text Line released a statement on its website and via a Twitter thread. In a statement, Crisis Text Line said it does not “sell or share personally identifiable data with any organization or company.” It went on to claim that “[t]he only for-profit partner that we have shared fully scrubbed and anonymized data with is Loris.ai. We founded Loris.ai to leverage the lessons learned from operating our service to make customer support more human and empathetic. Loris.ai is a for-profit company that helps other for-profit companies employ de-escalation techniques in some of their most notoriously stressful and painful moments between customer service representatives and customers.”
In its defense, Crisis Text Line said over the weekend that “Our data scrubbing process has been substantiated by independent privacy watchdogs such as the Electronic Privacy Information Center, which called Crisis Text Line “a model steward of personal data.” It was citing a 2018 letter to the FCC, however, that defense is shakier now that the Electronic Privacy Information Center (EPIC) has responded with its own statement saying the quote was used outside of its original context:
“Our statements in that letter were based on a discussion with CTL about their data anonymization and scrubbing policies for academic research sharing, not a technical review of their data practices. Our review was not related to, and we did not discuss with CTL, the commercial data transfer arrangement between CTL and Loris.ai. If we had, we could have raised the ethical concerns with the commercial use of intimate message data directly with the organization and their advisors. But we were not, and the reference to our letter now, out of context, is wrong.”
On the Loris.ai website, it claims “safeguarding personal data is at the heart of everything we do,” and that “we draw our insights from anonymized, aggregated data that have been scrubbed of Personally Identifiable Information (PII).” That’s not enough for EPIC, which makes the point that Loris and CTL are seeking to “extract commercial value out of the most sensitive, intimate, and vulnerable moments in the lives (of) those individuals seeking mental health assistance and of the hard-working volunteer responders… No data scrubbing technique or statement in a terms of service can resolve that ethical violation.”
Update, 10.15PM ET: This story has been updated to reflect Crisis Text Line’s decision to stop sharing data with Loris.ai.
Correction February 1st, 10:54AM ET: An earlier version of this story identified Tim Reierson as both a volunteer and an employee who was fired. He was a volunteer on the hotline who was terminated. We regret the error.
How would an AI writing program start an article on the future of AI writing? Well, there’s one easy way to find out: I used the best known of these tools, OpenAI’s GPT-3, to do the job for me.
Using GPT-3 is disarmingly simple. You have a text box to type into and a menu on the side to adjust parameters, like the “temperature” of the response (which essentially equates to randomness). You type, hit enter, and GPT-3 completes what you’ve written, be it poetry, fiction, or code. I tried inputting a simple headline and a few sentences about the topic, and GPT-3 began to fill in the details. It told me that AI uses “a series of autocomplete-like programs to learn language” and that these programs analyze “the statistical properties of the language” to “make educated guesses based on the words you’ve typed previously.”
So far, so good, I thought. I hit enter again, and the program added a quote from Google’s head of AI, Jeff Dean, then referenced an experimental piece of software from the 1960s before promising that an “AI Revolution” was coming that would reap immense rewards across the fields of science, technology, and medicine.
Fine, I thought. Then I thought a little more and did some googling. I soon discovered that the quote from Dean was made up, that the experimental software never existed, and while the promise of an “AI Revolution” was all well and good, it wasn’t any different from the vague nonsense found in hype-filled press releases. Really, what was most revealing about the future of AI was not what GPT-3 said but how it said it. The medium is the message, as Marshall McLuhan pointed out many years ago. And here, the medium included plausible fabrications; endless output; and, crucially, an opportunity to respond to the robot writer.
If we’re looking ahead at the next 10 years of AI development, trying to predict how we will interact with increasingly intelligent software, it helps to consider those tools that can talk back. AI writing models may only be digital parrots, able to copy form without understanding meaning, but they still create a dialogue with the user. This is something that often seems missing from the introduction of AI systems like facial recognition algorithms (which are imposed upon us) or self-driving cars (where the public becomes the test subject in a dangerous experiment). With AI writing tools, there is the possibility for a conversation.
If you use Gmail or Google Docs, then you’ve probably already encountered this technology. In Google’s products, AI editors lurk in the blank space in front of your cursor, manifesting textual specters that suggest how to finish a sentence or reply to an email. Often, their prompts are just simple platitudes — ”Thanks!”, “Great idea!”, “Let’s talk next week!” — but sometimes these tools seem to be taking a stronger editorial line, pushing your response in a certain direction. Such suggestions are intended to be helpful, of course, but they seem to provoke annoyance as frequently as gratitude.
To understand how AI systems learn to generate such suggestions, imagine being given two lists of words. One starts off “eggs, flour, spatula,” and the other goes “paint, crayons, scissors.” If you had to add the items “milk” and “glitter” to these lists, which would you choose and with how much confidence? And what if that word was “brush” instead? Does that belong in the kitchen, where it might apply an egg wash, or is it more firmly located in the world of arts-and-crafts? Quantifying this sort of context is how AI writing tools learn to make their suggestions. They mine vast amounts of text data to create statistical maps of the relationships between words, and use this information to complete what you write. When you start typing, they start predicting which words should come next.
Features like Gmail’s Smart Reply are only the most obvious example of how these systems — often known as large language models — are working their way into the written world. AI chatbots designed for companionship have become increasingly popular, with some, like Microsoft’s Chinese Xiaoice, attracting tens of millions of users. Choose-your-own-adventure-style text games with AI dungeon masters are attracting users by letting people tell stories collaboratively with computers. And a host of startups offer multipurpose AI text tools that summarize, rephrase, expand, and alter users’ input with varying degrees of competence. They can help you to write fiction or school essays, say their creators, or they might just fill the web with endless spam.
The ability of the underlying software to actually understand language is a topic of hot debate. (One that tends to arrive, time and time again, at the same question: what do we mean by “understand” anyway?). But their fluency across genres is undeniable. For those enamored with this technology, scale is key to their success. It’s by making these models and their training data bigger and bigger that they’ve been able to improve so quickly. Take, for example, the training data used to create GPT-3. The exact size of the input is difficult to calculate, but one estimate suggests that the entirety of Wikipedia in English (3.9 billion words and more than 6 million articles) makes up only 0.6 percent of the total.
Relying on scale to build these systems has benefits and drawbacks. From an engineering perspective, it allows for fast improvements in quality: just add more data and compute to reap fast rewards. The size of large language models is generally measured in their number of connections, or parameters, and by this metric, these systems have increased in complexity extremely quickly. GPT-2, released in 2019, had 1.5 billion parameters, while its 2020 successor, GPT-3, had more than 100 times that — some 175 billion parameters. Earlier this year, Google announced it had trained a language model with 1.6 trillion parameters.
The difference in quality as systems get larger is notable, but it’s unclear how much longer these scaling efforts will reap rewards in quality. Boosters think that sky’s the limit — that these systems will keep on getting smarter and smarter, and that they may even be the first step toward creating a general-purpose artificial intelligence or AGI. But skeptics suggest that the AI field in general is starting to reap diminishing returns as it scales ever up.
A reliance on scale, though, is inextricably linked to the statistical approach that creates uncertainty in these models’ output. These systems have no centralized store of accepted “truths”; no embodied understanding of “what the world is like for humans” and, hence, no way to distinguish fact from fiction or to exercise common sense.
Quiz them on simple trivia, like capital cities or the birthdays of US presidents, and they are right most of the time. But to these systems, truth is simply a statistical feature of their training data. They answer questions correctly because the text they’ve been fed has presented them with the correct information with enough frequency. This means that if you push them on any given topic or stray from the most obvious fields, they will lie thoughtlessly, making up quotes, dates, biographical details, and anything else you want to hear. The same probabilistic approach also means they can stumble over common sense questions. Start quizzing them with slightly fantastical queries, and they will confidently assert, for example, that a pencil is heavier than a toaster or that a blade of grass only has one eye. Such answers reveal the gulf between statistical and embodied intelligence.
To get a better understanding of these AI language models, I’ve been playing with a variety for the past few weeks; from tools sold to copywriters to flexible, multipurpose systems like GPT-3. The experience has been dizzying. Often, I’m amazed by the fluency, insight, and creativity of these systems. As part of a project for The Verge’s 10-year anniversary, for example, I used GPT-3 to write technopagan spells for a zine, feeding it a prompt (below in bold) which it completed with a four-step ritual (of which I’m showing only the first step):
Technopaganism is a new sort of magic that blends traditional witchcraft with the power of being online all the time. What follows is a technopagan spell designed to protect your phone from spam and improve the quality of messages you receive. Step 1) Take your cell phone and go outside. Close your eyes, hold the device in front of you, and intone: “I offer you, O Spirit of Technopagan Magic, my eternal and unending electronic connection to the net. And in return I ask you for magick to combine my will with the will of the system — make me impervious to the garbage that flows through my inbox. Make all messages flow together, and I will choose among them what is right for me to see. Grant me the Finger of the Internet, and I will strike at spam, and cast out the spammers back to the dark places where they come from.”
But other times, I’m stunned by how limited these programs are. Something that’s often overlooked is just how much human curation is needed to shape their output. The text above was not the first response I received from GPT-3, and I had to go through several iterations to generate a response that was both cogent and funny. It helped, of course, that the task I’d set GPT-3 was an imaginative and open-ended one: it played into the program’s strengths (and I think GPT-3’s success in such tasks has led some users and observers to exaggerate the intelligence of these systems). Other times, though, the software produced nonsensical content even within the fanciful framing I’d given it. Another “spell” it generated in response to the same prompt was much less focused, adding fictitious social media handles, tech headlines, and non-existent URLs to the spell’s instructions:
Perform a ritual to seal the crown circle into the device. May the gods help us all, the Apple iPhone X reference for is broken! Destroy the holy symbol! —tianong xiong (@tianongxiong) September 13, 2018 To confirm the rumor: Facebook Messenger’s search bar does have a search bar hidden right next to it pic.twitter.com/DghBL5DmhU — Warl Darkh (@dkpt) September 12, 2018
You can argue that this is just creativity of a different sort, and that of course a proper technopagan spell would include URLs. But it’s also obvious the machine has gone off-piste.
Despite such weaknesses, there’s already talk of AI systems taking over writers’ jobs. Naturally, I wondered if a computer could write articles for The Verge (and not just this one). I played around with different models, inputting opening paragraphs into these systems and asking for story ideas. Here is some more from GPT-3 on large language models:
“The human brain is just a really fancy computer,” said Jeff Dean, a Google hardware engineer and AI expert, at a company event in 2016. “It’s a computer made of meat.”
To understand GPT-3, you first have to know a little about how the brain learns. The brain isn’t born knowing how to speak or understand speech. It must first acquire this ability through exposure to the outside world, which is why children tend to learn language so quickly, and why being raised in a household where a language isn’t spoken presents a major obstacle.
As neuroscientist Mark Changizi has shown, the process of learning a language is a massive project. “Language is one of the most complex things that brains can do,” he writes, “and it emerges from the brain’s more basic skills, like vision, hearing, and motor control.”
But how does the brain acquire all this knowledge? The short answer is: via autocomplete.
All these points make sense if you’re not concentrating too hard, but they don’t flow from sentence to sentence. They never follow an argument or build to a conclusion. And again, fabrication is a problem. Both Jeff Dean and Mark Changizi are real people who have been more or less correctly identified (though Dean is now head of AI at Google, and Changizi is a cognitive scientist rather than a neuroscientist). But neither man ever uttered the words that GPT-3 attributed to them, as far as I can tell. Yet despite these problems, there’s also a lot to be impressed by. For example, using “autocomplete” as a metaphor to describe AI language models is both accurate and easy to understand. I’ve done it myself! But is this because it’s simply a common metaphor that others have deployed before? Is it right then to say GPT-3 is “intelligent” to use this phrase or is it just subtly plagiarizing others? (Hell, I ask the same questions about my own writing.)
Where AI language models seem best suited, is creating text that is rote, not bespoke, as with Gmail’s suggested replies. In the case of journalism, automated systems have already been integrated into newsrooms to write “fill in the blanks” stories about earthquakes, sporting events, and the like. And with the rise of large AI language models, the span of content that can be addressed in this way is expanding.
Samanyou Garg is the founder of an AI writing startup named Writesonic, and says his service is used mostly by e-commerce firms. “It really helps [with] product descriptions at scale,” says Garg. “Some of the companies who approach us have like 10 million products on their website, and it’s not possible for a human to write that many.” Fabian Langer, founder of a similar firm named AI Writer, tells The Verge that his tools are often used to pad out “SEO farms” — sites that exist purely to catch Google searches and that create revenue by redirecting visitors to ads or affiliates. “Mostly, it’s people in the content marketing industry who have company blogs to fill, who need to create content,” said Langer. “And to be honest, for these [SEO] farms, I do not expect that people really read it. As soon as you get the click, you can show your advertisement, and that’s good enough.”
It’s this sort of writing that AI will take over first, and which I’ve started to think of as “low-attention” text — a description that applies to both the effort needed to create and read it. Low-attention text is not writing that makes huge demands on our intelligence, but is mostly functional, conveying information quickly or simply filling space. It also constitutes a greater portion of the written world than you might think, including not only marketing blogs but work interactions and idle chit-chat. That’s why Gmail and Google Docs are incorporating AI language models’ suggestions: they’re picking low-hanging fruit.
A big question, though, is what effect will these AI writing systems have on human writing and, by extension, our culture? The more I’ve thought about the output of large language models, the more it reminds me of geofoam. This is a building material made from expanded polystyrene that is cheap to produce, easy to handle, and packed into the voids left over by construction projects. It is incredibly useful but somewhat controversial, due to its uncanny appearance as giant polystyrene blocks. To some, geofoam is an environmentally-sound material that fulfills a specific purpose. To others, it’s a horrific symbol of our exploitative relationship with the Earth. Geofoam is made by pumping oil out of the ground, refining it into cheap matter, and stuffing it back into the empty spaces progress leaves behind. Large language models work in a similar way: processing the archaeological strata of digital text into synthetic speech to fill our low-attention voids.
For those who worry that much of the internet is already “fake” — sustained by botnets, traffic farms, and automatically generated content — this will simply mark the continuation of an existing trend. But just as with geofoam, the choice to use this filler on a wide scale will have structural effects. There is ample evidence, for example, that large language models encode and amplify social biases, producing text that is racist and sexist, or that repeats harmful stereotypes. The corporations in control of these models pay lip service to these problems but don’t think they present serious problems. (Google famously fired two of its AI researchers after they published a detailed paper describing these issues.) And as we offload more of the cognitive burden of writing onto machines, making our low-attention text no-attention text, it seems plausible that we, in turn, will be shaped by the output of these models. Google already uses its AI autocomplete tools to suggest gender-neutral language (replacing “chairman” with “chair,” for example), and regardless of your opinion on the politics of this sort of nudge, it’s worth discussing what the end-point of these systems might be.
In other words: what happens when AI systems trained on our writing start training us?
Despite the problems and limitations of large language models, they’re already being embraced for many tasks. Google is making language models central to its various search products; Microsoft is using them to build automated coding software, and the popularity of apps like Xiaoice and AI Dungeon suggests that the free-flowing nature of AI writing programs is no hindrance to their adoption.
Like many other AI systems, large language models have serious limitations when compared with their hype-filled presentations. And some predict this widespread gap between promise and performance means we’re heading into another period of AI disillusionment. As the roboticist Rodney Brooks put it: “just about every successful deployment [of AI] has either one of two expedients: It has a person somewhere in the loop, or the cost of failure, should the system blunder, is very low.” But AI writing tools can, to an extent, avoid these problems: if they make a mistake, no one gets hurt, and their collaborative nature means human curation is often baked in.
What’s interesting is considering how the particular characteristics of these tools can be used to our advantage, showing how we might interact with machine learning systems, not in a purely functional fashion but as something exploratory and collaborative. Perhaps the most interesting single use of large language models to date is a book named Phamarko AI: a text written by artist and coder K Allado-McDowell as an extended dialogue with GPT-3.
To create Phamarko AI, Allado-McDowell wrote and GPT-3 responded. “I would write into a text field, I would write a prompt, sometimes that would be several paragraphs, sometimes it would be very short, and then I would generate some text from the prompt,” Allado-McDowell told The Verge. “I would edit the output as it was coming out, and if I wasn’t interested in what it was saying, I would cut that part and regenerate, so I compared it to pruning a plant.”
The resulting text is esoteric and obscure, discussing everything from the roots of language itself to the concept of “hyper-dimensionality.” It is also brilliant and illuminating, showing how writing alongside machines can shape thought and expression. At different points, Allado-McDowell compares the experience of writing using GPT-3 to taking mushrooms and communing with gods. They write: “A deity that rules communication is an incorporeal linguistic power. A modern conception of such might read: a force of language from outside of materiality.” That force, Allado-McDowell suggests, might well be a useful way to think about artificial intelligence. The result of communing with it is a sort of “emergence,” they told me, an experience of “being part of a larger ecosystem than just the individual human or the machine.”
This, I think, is why AI writing is so much more exciting than many other applications of artificial intelligence: because it offers the chance for communication and collaboration. The urge to speak to something greater than ourselves is evident in how these programs are being embraced by early adopters. A number of individuals have used GPT-3 to talk to dead loved ones, for example, turning its statistical intelligence into an algorithmic ouija board. Though such experiments also reveal the limitations. In one of these cases, OpenAI shut down a chatbot shaped to resemble a developer’s dead fiancée because the program didn’t conform to the company’s terms of service. That’s another, less promising reality of these systems: the vast majority are owned and operated by corporations with their own interests, and they will shape their programs (and, in turn, their users) as they see fit.
Despite this, I’m hopeful, or at least curious, about the future of AI writing. It will be a conversation with our machines; one that is diffuse and subtle, taking place across multiple platforms, where AI programs linger on the fringes of language. These programs will be unseen editors to news stories and blog posts, they will suggest comments in emails and documents, and they will be interlocutors that we even talk to directly. It’s impossible that this exchange will only be good for us, and that the deployment of these systems won’t come without problems and challenges. But it will, at least, be a dialogue.
The future of search is a conversation — at least, according to Google.
It’s a pitch the company has been making for years, and it was the centerpiece of last week’s I/O developer conference. There, the company demoed two “groundbreaking” AI systems — LaMDA and MUM — that it hopes, one day, to integrate into all its products. To show off its potential, Google had LaMDA speak as the dwarf planet Pluto, answering questions about the celestial body’s environment and its flyby from the New Horizons probe.
As this tech is adopted, users will be able to “talk to Google”: using natural language to retrieve information from the web or their personal archives of messages, calendar appointments, photos, and more.
This is more than just marketing for Google. The company has evidently been contemplating what would be a major shift to its core product for years. A recent research paper from a quartet of Google engineers titled “Rethinking Search” asks exactly this: is it time to replace “classical” search engines, which provide information by ranking webpages, with AI language models that deliver these answers directly instead?
There are two questions to ask here. First is can it be done? After years of slow but definite progress, are computers really ready to understand all the nuances of human speech? And secondly, should it be done? What happens to Google if the company leaves classical search behind? Appropriately enough, neither question has a simple answer.
There’s no doubt that Google has been pushing a vision of speech-driven search for a long time now. It debuted Google Voice Search in 2011, then upgraded it to Google Now in 2012; launched Assistant in 2016; and in numerous I/Os since, has foregrounded speech-driven, ambient computing, often with demos of seamless home life orchestrated by Google.
Despite clear advances, I’d argue that actual utility of this technology falls far short of the demos. Check out the introduction below of Google Home in 2016, for example, where Google promises that the device will soon let users “control things beyond the home, like booking a car, ordering dinner, or sending flowers to mom, and much, much more.” Some of these things are now technically feasible, but I don’t think they’re common: speech has not proven to be the flexible and faultless interface of our dreams.
Everyone will have different experiences, of course, but I find that I only use my voice for very limited tasks. I dictate emails on my computer, set timers on my phone, and play music on my smart speaker. None of these constitute a conversation. They are simple commands, and experience has taught me that if I try anything more complicated, words will fail. Sometimes this is due to not being heard correctly (Siri is atrocious on that score), but often it just makes more sense to tap or type my query into a screen.
Watching this year’s I/O demos I was reminded of the hype surrounding self-driving cars, a technology that has so far failed to deliver on its biggest claims (remember Elon Musk promising that a self-driving car would take a cross country trip in 2018? It hasn’t happened yet). There are striking parallels between the fields of autonomous driving and speech tech. Both have seen major improvements in recent years thanks to the arrival of new machine learning techniques coupled with abundant data and cheap computation. But both also struggle with the complexity of the real world.
In the case of self-driving cars, we’ve created vehicles that don’t perform reliably outside of controlled settings. In good weather, with clear road markings, and on wide streets, self-driving cars work well. But steer them into the real world, with its missing signs, sleet and snow, unpredictable drivers, and they are clearly far from fully autonomous.
It’s not hard to see the similarity with speech. The technology can handle simple, direct commands that require the recognition of only a small number of verbs and nouns (think “play music,” “check the weather” and so on) as well as a few basic follow-ups, but throw these systems into the deep waters of conversation and they flounder. As Google’s CEO Sundar Pichai commented at I/O last week: “Language is endlessly complex. We use it to tell stories, crack jokes, and share ideas. […] The richness and flexibility of language make it one of humanity’s greatest tools and one of computer sciences’ greatest challenges.”
However, there are reasons to think things are different now (for speech anyway). As Google noted at I/O, it’s had tremendous success with a new machine learning architecture known as Transformers, a model that now underpins the world’s most powerful natural language processing (NLP) systems, including OpenAI’s GPT-3 and Google’s BERT. (If you’re looking for an accessible explanation of the underlying tech and why it’s so good at parsing language, I highly recommend this blog post from Google engineer Dale Markowitz.)
The arrival of Transformers has created a truly incredible, genuinely awe-inspiring flowering of AI language capabilities. As has been demonstrated with GPT-3, AI can now generate a seemingly endless variety of text, from poetry to plays, creative fiction to code, and much more, always with surprising ingenuity and verve. They also deliver state-of-the-art results in various speech and linguistic tests and, what’s better, systems scale incredibly well. That means if you pump in more computational power, you get reliable improvements. The supremacy of this paradigm is sometimes known in AI as the “bitter lesson” and is very good news for companies like Google. After all, they’ve got plenty of compute, and that means there’s lots of road ahead to improve these systems.
Google channeled this excitement at I/O. During a demo of LaMDA, which has been trained specifically on conversational dialogue, the AI model pretended first to be Pluto, then a paper airplane, answering questions with imagination, fluency, and (mostly) factual accuracy. “Have you ever had any visitors?” a user asked LaMDA-as-Pluto. The AI responded: “Yes I have had some. The most notable was New Horizons, the spacecraft that visited me.”
A demo of MUM, a multi-modal model that understands not only text but also image and video, had a similar focus on conversation. When the model was asked: “I’ve hiked Mt. Adams and now want to hike Mt. Fuji next fall, what should I do differently to prepare?” it was smart enough to know that the questioner is not only looking to compare mountains, but that “preparation” means finding weather-appropriate gear and relevant terrain training. If this sort of subtlety can transfer into a commercial product — and that’s obviously a huge, skyscraper-sized if — then it would be a genuine step forward for speech computing.
That, though, brings us to the next big question: even if Google can turn speech into a conversation, should it? I won’t pretend to have a definitive answer to this, but it’s not hard to see big problems ahead if Google goes down this route.
First are the technical problems. The biggest is that it’s impossible for Google (or any company) to reliably validate the answers produced by the sort of language AI the company is currently demoing. There’s no way of knowing exactly what these sorts of models have learned or what the source is for any answer they provide. Their training data usually consists of sizable chunks of the internet and, as you’d expect, this includes both reliable data and garbage misinformation. Any response they give could be pulled from anywhere online. This can also lead them to producing output that reflects the sexist, racist, and biased notions embedded in parts of their training data. And these are criticisms that Google itself has seemingly been unwilling to reckon with.
Similarly, although these systems have broad capabilities, and are able to speak on a wide array of topics, their knowledge is ultimately shallow. As Google’s researchers put it in their paper “Rethinking Search,” these systems learn assertions like “the sky is blue,” but not associations or causal relationships. That means that they can easily produce bad information based on their own misunderstanding of how the world works.
Kevin Lacker, a programmer and former Google search quality engineer, illustrated these sorts of errors in GPT-3 in this informative blog post, noting how you can stump the program with common sense questions like “Which is heavier, a toaster or a pencil?” (GPT-3 says: “A pencil”) and “How many eyes does my foot have?” (A: “Your foot has two eyes”).
To quote Google’s engineers again from “Rethinking Search”: these systems “do not have a true understanding of the world, they are prone to hallucinating, and crucially they are incapable of justifying their utterances by referring to supporting documents in the corpus they were trained over.”
These issues are amplified by the sort of interface Google is envisioning. Although it’s possible to overcome difficulties with things like sourcing (you can train a model to provide citations, for example, noting the source of each fact it gives), Google imagines every answer being delivered ex cathedra, as if spoken by Google itself. This potentially creates a burden of trust that doesn’t exist with current search engines, where it’s up to the user to assess the credibility of each source and the context of the information they’re shown.
The pitfalls of removing this context is obvious when we look at Google’s “featured snippets” and “knowledge panels” — cards that Google shows at the top of the Google.com search results page in response to specific queries. These panels highlight answers as if they’re authoritative but the problem is they’re often not, an issue that former search engine blogger (and now Google employee) Danny Sullivan dubbed the “one true answer” problem.
These snippets have made headlines when users discover particularly egregious errors. One example from 2017 involved asking Google “Is Obama planning martial law?” and receiving the answer (cited from a conspiracy news site) that, yes, of course he is (if he was, it didn’t happen).
In the demos Google showed at I/O this year of LaMDA and MUM, it seems the company is still leaning toward this “one true answer” format. You ask and the machine answers. In the MUM demo, Google noted that users will also be “given pointers to go deeper on topics,” but it’s clear that the interface the company dreams of is a direct back and forth with Google itself.
This will work for some queries, certainly; for simple demands that are the search equivalent of asking Siri to set a timer on my phone (e.g. asking when was Madonna born, who sang “Lucky Star,” and so on). But for complex problems, like those Google demoed at I/O with MUM, I think they’ll fall short. Tasks like planning holidays, researching medical problems, shopping for big-ticket items, looking for DIY advice, or digging into a favorite hobby, all require personal judgement, rather than computer summary.
The question, then, is will Google be able to resist the lure of offering one true answer? Tech watchers have noted for a while that the company’s search products have become more Google-centric over time. The company increasingly buries results under ads that are both external (pointing to third-party companies) and internal (directing users to Google services). I think the “talk to Google” paradigm fits this trend. The underlying motivation is the same: it’s about removing intermediaries and serving users directly, presumably because Google believes it’s best positioned to do so.
In a way, this is the fulfillment of Google’s corporate mission “to organise the world’s information and make it universally accessible and useful.” But this approach could also undermine what makes the company’s product such a success in the first place. Google isn’t useful because it tells you what you need to know, it’s useful because it helps you find this information for yourself. Google is the index, not the encyclopedia and it shouldn’t sacrifice search for results.
Boris is the wise ol’ CEO of TNW who writes a weekly column on everything about being an entrepreneur in tech — from managing stress to embracing awkwardness. You can get his musings straight to your inbox by signing up for his newsletter!
I recently learned a technique for enabling connections and conversations during dinner, and it’s called… the conversation piece.
Apparently, in previous centuries, it was common practice for the wealthier families of Amsterdam (where I live) to place an object in the middle of the table, so your guests could discuss it. A literal conversation piece.
They’d host prestigious dinner parties where the pieces were crafted by commissioned artists. Some were abstract, some more classical and poetic, and some were downright pornographic. So conversations never stalled, the more audacious and controversial the piece was, the more there was to talk about — especially once the alcohol started flowing.
[Read: The ‘five whys’ will make returning to the office less awful]
I greatly enjoy hosting dinner parties as they revolve around good conversations and exchanging ideas, so I was really inspired. But I was also inspired because I could see how the benefits can reach far beyond dinner parties.
I want to have great discussions in my management team, good exchange of ideas with other teams, and engaging one-on-one conversations with my employees.
So here are my three tips to improve your communication, based on the age-old conversation piece.
Tip 1: Choose your piece
Let me be clear: don’t bring an erotic centerpiece to your next team meeting. But what you can do is think about the general concept of having a conversation piece to enable great conversations.
I love the story of the founder of Palm when the company was first starting out. He chose what sparked conversations by actually fashioning a Palmpilot prototype out of wood, drawing a screen and buttons on it with a pencil — and then he carried it around the office and pretended it worked.
He’d come into meetings, place a block of wood on the table, and act as if it was a fully functioning machine. Now that’s a conversation starter.
But a conversation piece doesn’t have to be a physical object or a thing. At our conferences — next one in 84 days, see you there in person? — I like to use our speakers as the conversation pieces.
No matter who you run into or what field they’re in, it’s so easy to encourage a discussion with a simple “have you seen this speaker?” And the beautiful thing is, it doesn’t even really matter whether the answer is yes or no because you now have something to kickstart the conversation.
Tip 2: Shut up
As great as it is to come prepared for conversations, there’s one undeniable fact we have to acknowledge: you talk too much.
If you want to have a good conversation, learn to shut the fuck up. Or as more eloquent people would say: you’ll never learn from just hearing yourself speak.
I know you agree with me when I say the best conversations are when you’re genuinely interested and ask lots of questions. It’s amazing when you have chats like these with friends and family, but they’re even more precious in company meetings.
That’s why I find the most powerful thing you can do in a meeting is to start by saying, “I’m here to listen.” Then hold your tongue and absorb what others have to say — you’ll learn a ton.
So not only can a ‘conversation piece’ be abstract rather than physical, the intentional absence of one can start a deeper discussion.
Tip 3: Walkie-talkie
The physical conversation pieces of old also remind me of a conversation technique I picked up a while back. When you want to discuss complex topics, go for a walk together.
Why? Because it works so much better than a rigid chat sitting across from each other.
When you’re walking, you can stare into the distance and speak your mind more freely. When you’re sat opposite each other, you won’t be able to avoid eye contact, and that can make it harder to think out loud as you feel more scrutinized or worried about how the other person will react to a half-formed thought.
This is just one reason video chats can be so annoying: the constant and unavoidable eye contact.
A walk can therefore make sure the wrong thing doesn’t dictate which direction the conversation goes — for example, perceived judgment by the person sitting across from you.
So to sum up…
Want to ask someone a tricky question or discuss a complex topic? Don’t schedule lunch but go for a walk, or meet in a museum, or find a sunset or fireplace to stare into. And then shut up and listen.
Don’t try to impress others with your stories or insights, but dive deep into their background and reasoning. And lastly, define your conversation piece — whether it’s physical, abstract, or non-existent — then let the conversation develop on its own from there.
Can’t get enough of Boris? Check out his older stories here, and sign up for his newsletter here.
Following months of inner conflict and opposition from Congress and thousands of Google employees, Google today announced that it will reorganize its AI ethics operations and place them in the hands of VP Marian Croak, who will lead a new responsible AI research and engineering center for expertise.
A blog and six-minute video interview with Croak that Google released today announcing the news make no mention of former Ethical AI team co-lead Timnit Gebru, whom Google fired abruptly in late 2020, or Ethical AI lead Margaret “Meg” Mitchell, who a Google spokesperson told VentureBeat was placed under internal investigation last month.
The release also makes no mention of steps taken to address a need to “rebuild trust” called for by members of the Ethical AI team at Google. Multiple members of the Ethical AI team said they found out about the change in leadership from a report published late Wednesday evening by Bloomberg.
“Marian is a highly accomplished trailblazing scientist that I had admired and even confided in. It’s incredibly hurtful to see her legitimizing what Jeff Dean and his subordinates have done to me and my team,” Gebru told VentureBeat.
Meg Mitchell is still suspended from her corporate account. The last email that the Ethical AI team got from research leadership was over two weeks ago.
We’re in the lurch and left out to dry. This should tell you a lot about what Google thinks about ethics research.
In the video, Croak discusses self-driving cars and techniques for diagnosis of diseases as potential areas of focus in the future, but made no mention of large language models. A recent piece of AI research citing a cross spectrum of experts concluded that companies like Google and OpenAI only have a matter of months to set standards about how to address the negative societal impact of large language models.
In December, Gebru was fired after she sent an email to colleagues advising them to no longer participate in diversity data collecting efforts. A paper she was working on at the time criticized large language models, like the kind Google is known for producing, for harming marginalized communities and tricking people into believing models trained with massive corpora of text data represent genuine progress in language understanding.
In the weeks following her firing, members of the Ethical AI team also called for the reinstatement of Gebru in her previous role. More than 2,000 Googlers and thousands of other supporters signed a letter in support of Gebru and in opposition to what the letter calls “unprecedented research censorship.” Members of Congress who have proposed legislation to regulate algorithms also raised a number of questions about the Gebru episode in a letter to Google CEO Sundar Pichai. Earlier this month, news emerged that two software engineers resigned in protest over Google’s treatment of Black women like Gebru and former recruiter April Curley.
In today’s video and blog post about the change at Google, Croak said that people need to understand that the fields of responsible AI and ethics are new, and called for a more conciliatory tone of conversation about the ways AI can harm people. Google created its AI ethics principles in 2019, shortly after thousands of employees opposed participation in the U.S. military’s Project Maven.
“So there’s a lot of dissension, there’s a lot of conflict in terms of trying to standardize a normative definition of these principles and whose definition of fairness and safety are we going to use, and so there’s quite a lot of conflict right now in the field, and it can be polarizing at times, and what I’d like to do is just have people have a conversation in a more diplomatic way perhaps so we can truly advance this field,” Croak said.
Croak said the new center will work internally to assess AI systems that are being deployed or designed, then “partner with our colleagues and PAs and mitigate potential harms.”
The Gebru episode at Google led some AI researchers to pledge that they wouldn’t review papers from Google Research until change was made. Shortly after Google fired Gebru, Reuters reported that the company asked its researchers to strike a positive tone when addressing issues referred to as sensitive topics.
Croak’s appointment to the position spells the latest controversial development at the top of AI ethics ranks at Google Research and DeepMind, which Google acquired in 2014. Last month, a Wall Street Journal report found that DeepMind cofounder Mustafa Suleyman was removed from management duties, before leaving the company in 2019, due to his bullying of coworkers. Suleyman also served as a head of ethics at DeepMind, where he discussed issues like climate change and health care. Months later, Google hired Suleyman for work in an advisory role on matters of policy and regulation.
How Google conducts itself when it comes to using AI responsibly and defending against forms of algorithmic oppression is immensely important because AI adoption is growing in business and society, but also because Google is a world leader in producing published AI research. A study published last fall found that Big Tech companies treat AI ethics funding in a way that’s analogous to the way Big Tobacco companies funded health research decades ago.
VentureBeat has reached out to Google to inquire about steps to reform internal practices, issues raised by Google employees, and a number of other questions. This story will be updated if we hear back.
More to come
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.
Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
up-to-date information on the subjects of interest to you
gated thought-leader content and discounted access to our prized events, such as Transform