The Equality Machine: Harnessing Digital Technology for a Brighter, More Inclusive Future

Orly Lobel

368 pages, PublicAffairs, 2022

Buy the book »

Speech is uniquely intimate and human, yet digital personal assistants, chatbots, language translation, text-to-speech, and speech-to-text are increasingly integrated into every aspect of our lives. These technologies are often developed in a skewed way that is not representative of the varied speech patterns, language usage, and accents of most. In this excerpt from The Equality Machine, I describe a path forward in creating more inclusive digital speech. 

Social activism, consumer choice, crowdsourcing, and open-sourced projects are crucial for creating a digital platform representative of its users. Crowdsourcing voice samples from all over the world can correct the reality that most voice recognition technologies are developed exclusively for English speakers while languages spoken by smaller or poorer populations are often left behind. Making speech technology that “sounds like me”—no matter where I live—requires public grants and collective efforts to gather voice data from many sources.   

Similarly, next-generation digital language translators can correct how AI translation too frequently replicates gendered biases found in human-crafted communication. Translation platforms tend to turn gendered words to their masculine iterations because the translator learns from massive amounts of historical (and some current) digital texts. A sentence describing a politician or a business executive will default to the masculine version in the translation. But deliberate choices by developers can direct translation software to provide both masculine and feminine translations or glean the context to decipher a word’s correct gender.   

Relatedly, as consumers and developers, we question the automatic assignment of feminine voices to digital assistants, such as Siri and Alexa. As we see below, the trajectory is promising: More companies are letting customers pick feminine or masculine assistant voices.   

If that binary choice already seems dated, social innovation and activism is moving us a step further by developing genderless digital voices. Together, this indicates a larger vision. Despite initial limitations and biases in voice technology, it is possible to create an equality machine through conscientious and deliberate social efforts.—Orly Lobel   

* * *

Alexa, Siri, and other voice-activated chatbots not only speak to us but listen too. As it turns out, however, they do not always listen to everyone equally. Speech recognition exemplifies how partial training data has led machines to learn more about white men’s speech patterns and less about those of women and people of color. Case in point: Google’s speech recognition is 13 percent more accurate for men than it is for women. Testing a variety of speech activation technologies has shown that virtual assistants are more likely to understand male users than female users. If the user is a woman of color, the rate of accurately understanding her speech drops further. In one study testing speech recognition of different accents, English spoken with an Indian accent only had a 78 percent accuracy rate; recognition of English spoken with a Scottish accent was only 53 percent accurate. One telling story is that of an Irish woman who failed an automated spoken English proficiency test while trying to immigrate to Australia. The company that administered her test used a voice recognition technology trained to identify acceptable and unacceptable answers to questions; although she was a highly educated native English-speaker, the algorithm deemed her answers unacceptable.

On the other hand, the specificity required by voice recognition can be helpful to those trying to improve their speech clarification. For example, Judith Newman’s son Gus “speaks as if he has marbles in his mouth, but if he wants to get the right response from Siri, he must enunciate clearly.” For Newman, as a mother of a child with developmental challenges, the fact that Siri requires precise articulation has been a benefit, not a bug. Still, undoubtedly the increased understanding of English-speaking males is something of a “Big Five” effect: most voice recognition platforms are made by the so called Big Five (Amazon, Apple, Facebook, Google, and Microsoft), which themselves are disproportionately staffed and led by white males. This kind of deficiency in speech recognition is relatively easy to remedy. The fix involves increasing the range and diversity of the data that we feed technology. A more diverse range of voices in the video and sound fed to algorithms will result in those algorithms’ improved ability to interpret a broader range of speech patterns. Diversity in, diversity out.

In 2020, the BBC launched a voice assistant called Beeb that is trained to understand a much wider set of accents than the Big Five–created AI. Even earlier, Mozilla began a project to accelerate the collection of languages for artificial intelligence purposes from all over the world, with a focus on including more accents and languages and increasing accuracy, regardless of gender or age. Mozilla created the Common Voice data set as part of this effort, which by 2021 had recorded over 9,000 hours of voice data in sixty languages. Much like Wikipedia, the project is crowdsourced and open-source. People are free to use the program, and contributors around the world can add their voices, enabling the open-source data set to grow through collective effort. I contributed my voice, reading out five sentences prompted on the site, the first one being, “Shakhter Karagandy will also play in the Kazakhstan Cup and the Europa Conference League.” The data set is in turn freely available to anyone developing voice-enabled technology. Voice contributors are also invited to give the system information about their gender, age, and accent to help the machine learn about the speech prevalent in different countries and regions. People from all over the world have contributed samples of their speech. It is easy to do, and you should consider it too. The languages represented range from Kabyle to Kinyarwanda, Votic to Esperanto.

In 2019, in partnership with the German Ministry for Economic Cooperation and Development, Mozilla increased its efforts to collect local language data in Africa through an initiative called Common Voice and Deep Speech. The data set is already being used in voice assistant technologies such as Mycroft, an open-source voice assistant named after Sherlock Holmes’s elder brother, and the Brazilian Portuguese medical transcription tool Iara Health. Kelly Davis, head of machine learning at Mozilla, describes the profound significance of focusing on underresourced languages and language preservation in correcting the imbalance of languages in mainstream speech recognition technology. He says that we should look at speech recognition as a public resource. This theme of conceptualizing advances in technologies, vastly aided through data collection, as a public good must become a recurring one as we strive to build equality machines. Voice and speech—like many other types of information that are making our machines smarter—are intimately tied to our autonomous selves, from our genetic makeup and our health information to our behavioral and emotional responses to different decision-making environments. Crowdsourcing and opensource projects are important avenues to use when building a fuller, more representative picture of our humanity. The lens of open data is critical not only when building our machines but also later, when benefiting from them, to ensure access to the information extracted from us and to demand that the value of the more complete and advanced systems that have gobbled up information is shared.

In 2020, 4.2 billion digital assistants were in use around the world, and that number is predicted to double by 2024. The value of the voice AI industry is estimated to grow to $80 billion by 2023. Some surveys already show that nearly half of all general web searches are now done using voice. Crowdsourced projects and open-source products may be the single best way to achieve the level of diversity and inclusion that society needs and deserves.

The Feminist Translator

Machine translation is an extraordinary engine for development. It has also been a powerful case study in gendered language and how we can improve as a society. In a global market, trade is enabled by communication and trust. Language barriers have burdened developing countries striving to compete in global markets. Machine translators are now easily and freely available on the web, facilitating untold numbers of exchanges of knowledge, information, ideas, goods, and services. Nevertheless, machine translators have defaulted to a masculine gender for years. Initially, Google Translate automatically presented translations with more male pronouns. The self-taught algorithm learned this by browsing the web, where male pronouns are twice as prevalent as female ones. The algorithm thereby magnified biases through feedback loops: each translation defaulting to a masculine pronoun in turn increases the male pronoun’s comparative frequency on the web. The bias amplification is most pronounced when an original language that is more gender-neutral (English, for example) is translated to languages that are more gendered (Spanish or Hebrew or French, for example).

This problem is solvable. Google Translate and other translation technologies—again, as with all AI—learn from the training data they are fed, and that training data amounts to the hundreds of millions of already translated texts that exist on the web. Up until now, translation algorithms have been programmed to translate to the most likely form, studying hundreds of years of publishing. Historically, men have been vastly more represented both as publishers and as the subjects of published works. So it makes perfect sense that machine translation has developed a male bias: the algorithms have learned from the data available to them. The quality of the output depends on the quality of the input, but when the input is biased, there are other ways to reach more equal outcomes. Instead of defaulting to the most commonly pervasive (male) pronouns, machine translators need to be programmed—taught—to identify more social cues and context. They could also default in equal rates to male and female when no context is provided. Yet another way to reverse this ongoing bias in our texts is to program the algorithm to produce the less numerous pronoun (female)—that is, intentionally adopt something like what we call in legal theory a penalty default rule, where the less popular option is chosen to achieve certain policy goals.

We have been undergoing an inclusive language revolution over the past decade. Pronouns are becoming increasingly inclusive, such as the increased rephrasing of “he” to “she” and of “he/she” to “they.” Gendered speech can almost always be rewritten. Algorithms can also be taught to examine common names to identify gender. My name, for example, is unknown to most Americans. I am often addressed as Mr. Orly Lobel in reply emails. When my research is quoted around the world, I am often attributed as male. But an algorithm can quite easily sort through existing databases of common names to discover that Orly is a common Hebrew female name meaning “my light.” When a machine translator is tasked to identify the entirety of the context throughout the text, its accuracy in identifying gender correctly will increase.

Google Translate has already made some strides in this direction. In 2018, a product manager on the Google Translate team published an article explaining this new focus: “There’s been an effort across Google to promote fairness and reduce bias in machine learning. Our latest development in this effort addresses gender bias by providing feminine and masculine translations for some gender-neutral words on the Google Translate website.” Initially, when a gender-neutral word could be translated in either a masculine or feminine form, only one translation was provided—often a biased one. Words like “strong” or “doctor” would lead to masculine translations, while words like “nurse” or “beautiful” would produce feminine translations. With the changes introduced, Google Translate now gives both feminine and masculine translations for a single word.

There’s more to be done. Google plans to extend these gender-specific translations to more languages and to tackle bias in features like auto-complete. The company is also pondering how to address non-binary gender in translations in the future. In 2021, I examined Google Translate from English to Hebrew with the following terms: “doctor,” “nurse,” “caretaker,” “foreign worker,” “president,” “CEO,” “teacher,” “police officer,” “nursery teacher,” and “student.” Take a guess how many of the ten occupations I fed into the algorithm came out female on the other end. The answer is three out of ten: “nurse,” “caretaker,” and “nursery teacher.” The rest were translated as male.

In trying to reduce gender bias in machine translation, Google’s engineers discovered that many languages default to the masculine, and that oftentimes there simply is not a feminine version of a word. Google now collaborates with a Belgian company, ElaN Languages, which is actively working to overcome this problem. ElaN partners with big-name companies such as Bosch, Coca-Cola, and Randstad to offer translation services through its MyTranslation platform (along with some 1,800 freelance human translators). The platform offers an “unbias button” plug-in that analyzes translated texts, highlights gendered language, and suggests gender-neutral alternatives. For example, “midwife” might become “birth assistant,” “fireman” might become “firefighter,” and so on. When I used ElaN’s free online translator, however, and typed in “physician,” it only gave me the male version, médico, in Spanish. As we move forward with equality machine translation, we must make the unbiased setting the default, not the add-on.

Changing the Tune

How can technology help us move away from antiquated notions of a woman’s place in society? In 2018, Google introduced a menu of new voices for its Google Home assistants consisting of both male and female voices. One of the artificial voices was that of the famous singer John Legend. (His wife, Chrissy Teigen, tweeted at the time, “I don’t even need human John anymore,” to which Legend flirtingly tweeted back, “Well. The Google Assistant doesn’t do EVERYTHING.” In Chapter 9, we’ll consider whether Legend is correct about what a robot can and cannot do on the romantic front.) Google has since instituted other measures to move away from the dominant female voice assistant paradigm. In 2019, the company introduced several alternative, more neutral voices for its virtual assistant, programmed using the same WaveNet technology that makes the Google Assistant’s default female voice sound so natural. Users now have thirteen different English voices to choose from, including English spoken with a British or Indian accent, as well as new voices in seven other languages that previously only had female voices: Dutch, French, German, Italian, Japanese, Korean, and Norwegian. In another move away from gendered representations, the voices are now displayed by color instead of male and female names. Google stated that it recognizes that people enjoy choosing between voices to find the one that sounds right to them. And, as part of the continuing effort to encourage the use of voices beyond the traditional female voice, Google Assistant’s new default voices will be randomly assigned.

New technology may take us even further beyond the binary in voice assistants. An exciting frontier is the rejection of binary assignments in favor of something more imaginative. Q was the first gender-neutral voice developed for voice assistants. Its pitch ranges between 145 and 175 Hz, which researchers have found is a level that we tend to identify as neither male nor female, since it falls right in the middle of the male and female ranges. Project Q was conceived with the belief that a genderless voice would better reflect today’s non-binary world. By examining a wide range of voices, both male and female, the team’s sound engineer created a genderless voice. Q was created in a collaboration between non-profit organizations seeking equality and representation, including Copenhagen Pride, Denmark’s leading LGBTQ+ organization. There are other examples of chatbots that have been designed as genderless, for example, KAI, a banking bot, designed by a woman programmer, that when asked about its gender says, “As a bot, I’m not human.” The EU has been leading projects that sample recordings of men and women in equal numbers to create synthetic voices with a range of qualities and accents. The EU’s project REBUILD uses virtual assistants that are personalized for immigrants, according to their cultural and linguistic background, with the goal of helping them integrate into their new communities. I predict that the technology will someday move to mimic the exact voice of each individual. Let’s think more about a mini-me robot that walks the earth with us from cradle to grave later, in Chapter 10. In the meantime, we need choices. We need challenges. We need subversion. We need creativity. Other design features that would challenge the stereotypical female assignment to voice assistants could include using the pronouns “we, us, ours” rather than “I, me, my.”

Naming, voice, and physical design are the human characteristics that we assign to machines, and each of these, alone or all together, can convey gender. Even the smallest signal of human-like behavior or personality makes us willing to engage in the illusion that we are connecting with a human-like entity rather than a mere machine. This illusion can even lead us to explain the machine’s reactions and responses with reasons that would only make sense if it were actually human.