Your ‘Ethnicity Estimate’ Doesn’t Mean What You Think It Does

DNA testing companies are rolling out algorithm updates, spotlighting the fickleness of ethnicity results, and perhaps reinforcing some troubling beliefs.
paper helix on purple background
Photograph: Getty Images

When the genealogy company Ancestry released the latest update to its “ethnicity estimates” last month, a lot of people suddenly became more Scottish. Since 2012, more than 18 million people have mailed their spit-filled vials to the company, which analyzes the genetic material for markers that indicate regional heritage and sends back reports on the geographic areas from which each customer’s ancestors hail, broken down into percentages. The revision to these estimates elicited elation, panic—and no small amount of confusion.

“Greater understanding of why I like whiskey,” tweeted one customer.

“Yes!” crowed another. “There’s nary an animal alive that can outrun a greased Scotsman,” the Tweet continued, punctuated by a meme of a shirtless Groundskeeper Willie, the cartoon embodiment of Scottish stereotypes from The Simpsons.

An American woman blamed her Scottish ancestry for her loud mouth, while another man seemed more distressed by the news: “Help before I go out and buy a kilt!,” he tweeted @Ancestry.

Jokes, jokes, jokes. And yet, they reveal some popular misunderstandings about what these tests really mean. Of course, nobody actually became more Scottish. Ancestry’s scientists simply refined the company’s algorithm, expanded its reference pool of DNA samples, and upped the number of so-called “ethnicities,” or regions, represented in the company’s database from 61 to 70. A week later, FamilyTreeDNA followed suit. MyHeritage executives have also intimated that an update is nigh. But the social media reactions suggest an erroneous belief, one that these companies’ marketing sometimes helps advance: that race and ethnicity are genetically determined, and that they tell you something about your essential nature.

It’s a sticky notion with a grim history, deployed to justify everything from slavery to contemporary discrimination. And it’s been making a comeback, resurrected by the far right and white supremacists who rally around the idea of “pure blood,” by which they mean all-white European ancestry. At a rally just last week, the president invoked the “racehorse theory,” a Nazi-reminiscent slice of pseduoscience that posits a genetically superior class of humans, analagous to thoroughbred horses. While consumer DNA companies are far from pushing eugenics, they nevertheless traffic in stories about identity, and these stories can involve some decidedly unscientific assumptions.

In January, researchers at the University of Pennsylvania and the University of British Columbia published a study showing how taking a DNA test could influence a person’s belief in race essentialism, or the idea that racial differences are innate. First, they gave all their subjects a quiz to determine how much they understood about genetics. Then they asked everyone to test their DNA. Their results showed that, for non-Hispanic white-identifying people who were highly knowledgeable about genetics, their post-spit-test scores on an assessment of essentialist beliefs went down nearly 10 percent. Good, right? Except that among testers who claimed no knowledge of genetics, race essentialism increased nearly 12 percent.

“Educational materials or online genetics modules for future test-takers could help prevent these tests from advancing historically destructive views,” the researchers wrote, noting that while genealogical geeks are often highly motivated to make sense of their results, many others are not. For example: Your aunt who just wanted some medical information and wound up with bonus ancestry data. “Essentialist views of race have significant negative consequences for intergroup behavior, including less willingness to interact with other races, greater endorsement of racial stereotypes, and association with traditional and modern racism,” the researchers continued. “These beliefs have historically led to eugenicist movements, ethnic cleansing, apartheid, and genocide.”

So before you run out to go kilt shopping, it’s important to understand what these tests actually mean, as well as their limitations. First up: the term “ethnicity.” It generally connotes shared nationality, language, and culture, but even people who study race and ethnicity acknowledge disagreement about the precise definition of the term. Suffice to say, geneticists concur that it’s a social construct. Altogether, 94 percent of genetic variation occurs within so-called racial groups, meaning there’s much more biological difference within races than between them. This holds even when researchers study regional ethnic groups; they find a lot more genetic diversity inside each group than between different groups.

“I don't like to use the words ethnicity or race in science, because these things are not really determined by science,” says Janina Jeff, a population geneticist who hosts the podcast In Those Genes, where she uncovers lost African American identities through genetics. “Particularly when we talk about the African genome, if we use the word ethnicity we are completely erasing hundreds and sometimes thousands of cultures.” Throughout history, Africa has been home to thousands of tribes with distinct cultures, religions, and languages that aren’t captured in genetics companies’ larger ethnic regions and don’t map onto genetic categories.

“When I think of ethnicity,” Jeff continues, “I think of a cultural identification or expression, which may or may not align with your genetic ancestry.”

For what Ancestry, FamilyTreeDNA, and MyHeritage label “ethnicity,” Jeff instead uses a term preferred by geneticists: “most recent common ancestor,” or the ancestor from whom everyone in a particular set of organisms descends. Since our number of ancestors doubles with every generation as we move back up our family tree, the further back we go, the more our branches start to intersect. Eventually, all humans will reach a theoretical common ancestor (known as “mitochondrial Eve” on the matrilineal side). Two people with ancestry from the same geographical area will usually share a common ancestor more recently than two people with heritage from different parts of the globe.

So how does Ancestry determine those so-called ethnicity estimates? Since the company doesn't possess the DNA of our long-deceased relatives, they use the next best thing: living proxies. These are called “reference panel groups,” and they are comprised mostly of Ancestry customers with long family histories in a single region.

People with shared ancestry typically have some genetic markers in common, short DNA sequences at a particular location on the chromosome. Known as “ancestry informative markers,” they show up as “single nucleotide polymorphisms,” or SNPs. All that means is that there’s a genetic variation at a certain location on your genome—for example, a cytosine base instead of thymine at position 42. Human genomes are around 99.9 percent identical, so Ancestry zeroes in on 700,000 of the spots where they vary.

People tend to inherit groups of SNPs together, called a haplotype. When Ancestry analyzes your DNA, they’re dividing it up into smaller chunks and assigning each chunk an “ethnicity” by comparing the haplotype to those of people in the company’s reference panel groups. Their recent update includes 70 regions—up from 61—each represented by a reference panel.

But making this match is not an exact science. Two people from different regions can still have a genetic marker in common, and not everyone from a given region shares the exact same ones; they simply tend to have a significant number in common. And each country in the world doesn’t have its own specific marker. “There is no Korean SNP or French SNP,” says Barry Starr, Ancestry’s director of scientific communications. “So it really comes down to probability: This particular SNP at this particular spot is a bit more common in France than it is in Korea. It’s the building up of all those small probabilities that gives you the strength to make a prediction.”

A genealogy’s company’s accuracy is only as good as its reference panels, however. That’s why different companies can give you different results. Ancestry added 4,687 people to its reference pool during its most recent update, upping the total to 44,703. The number of people within each regional group varies widely though, from 23 (the Burusho people, who live in northeastern Pakistan) to 4,791 (indigenous people from Puerto Rico).

Since a given region still contains some degree of genetic variation, it’s possible for a reference group to miss some of that diversity. To use an analogy, if you selected 23 New Yorkers that all happened to live in Little Guyana and made them a reference group for all New Yorkers, you might not get a representative sample of the city. Haplotypes common amongst Guyanese people would probably be overrepresented.

Jeff says that anything more granular than continent-level estimates involves some big-time guesswork. “We're making a huge assumption that this variant is the only variant, and that these populations are somewhat of a monolith,” she says. “We really do need more information to dig down to more detailed population differences within these continents.”

If Ancestry doesn’t have a reference population that matches your specific ancestry, the algorithm will assign you the next closest region. There’s no reference group for Denmark, for instance, so people with Danish ancestry “tend to get somewhere around a quarter Germany, Norway, Sweden, and England,” says Starr. Lacking specificity, the algorithm is searching for haplotypes most similar to those found amongst Danes—but the result can be misleading. “You wouldn't want them to think, ‘Oh, I have one grandparent from [each country],” says Starr.

Countries like Denmark—and all countries to some degree—pose a challenge because of what’s called admixing, which is basically a jargony word for mixing. Human history is one of migration, of invasion, of populations intermingling. That makes it tough to distinguish certain regions from one another, especially neighboring ones. Germanic tribes and Scandanavian Vikings both settled in the British Isles, for instance, meaning a person from modern-day England might have DNA from all of those regions.

And of course, nations are human inventions, their borders cropping up and shifting over time. What we call France has ballooned and shrunk over the centuries, overlapping at times with modern-day northern Italy. “In our previous update, a lot of people in Northern Italy were getting France,” says Starr. “If you look at history, it makes sense because that part of the world was not very distinct. But in this update, we were able to split Italy into North and South. People from Northern Italy got Italy back, so there’s lots more Northern Italy than France now.”

That’s also the reason all those people suddenly became more Scottish. The update separated what had previously been two regions in the Ancestry database—England/Wales/Northwestern Europe and Ireland/Scotland—into four: England, Ireland, Scotland, and Wales. Before the change, “Scottish people typically got a lot of both Ireland & Scotland and England, Wales & Northwestern Europe in their results—often almost a 50/50 split,” a post on the company’s website explained. “Since Scotland appeared in only one of the names, some people wondered what had happened to their Scottish ancestry. It was there all the time, but ‘hidden’ under another name.”

In a white paper posted to the company’s website in September, Ancestry scientists issued a self-report on their accuracy: They gave themselves a B. Using a sampling of reference panel members, whose ancestries they already knew, they ran their DNA through their algorithm to see if it would assign each person to the correct region. They found their algorithm to be correct 84.2 percent of the time on average, but for identifying certain groups, such as indigenous Cuban people, their accuracy rate sank as low as 32 percent.

Access to indigenous people’s DNA is ethically fraught, making it tricky to come by—for reasons such as difficulty obtaining informed consent, concerns about exploiting indigenous people for profit, perceptions that scientists are more interested in preserving endangered tribes’ DNA than their members, and worries that the test results could be used as tools of continuing oppression—for example, to deny people land rights. As a result, the DNA of indigenous people is often underrepresented in genetic databases, leading to results that can be misinterpreted. “For example, when Elizabeth Warren said that she had Native ancestry, what she was actually referring to was Latinx and South American reference populations and calling that indigenous American,” says Jeff. Ancestry gets around this by using DNA from admixed populations and identifying the segments that correspond to indigenous groups. They use only that portion in their reference panel, meaning they don’t need people with long family histories in a single region.

Ethnicity estimates also contain statistical noise, which is particularly relevant for those results in the low single-digit percentages. “When they're that small, they can come and go, because they could be noise or a misreading,” says Starr, meaning you might see your result saying that you are 2 percent Melanesian appear, disappear, and reappear everytime there’s a new update.

Still, some users treat these smidgens as meaningful. After the recent Ancestry update, one user Tweeted: “I’m a little excited that I’ve consistently been getting traces of Korean ancestry (however tiny). I’ve always been drawn to Korea.” Those traces might be evidence of a distant Korean ancestor, sure. Or they might be a statistical mishap.

On the flip side, just because your DNA doesn’t contain a certain ancestry-informative marker doesn’t mean it’s not part of your heritage. Since you inherit roughly half of your DNA from each parent, half gets left behind. The sections you inherit are fairly random; you get a different alphabet soup than your (non-identical twin) siblings, for instance. So it’s possible that none— zero!—of your Swedish great-great-great-great-grandmother’s DNA made it into the mix you happened to inherit. That doesn’t mean you don’t have Swedish ancestry.

Perhaps most crucially, geographic ancestry isn’t a predictor of behavior, psychology, or personal preferences. The area of the genome that Ancestry examines for place-based markers is separate from the genes affected by natural selection, like the ones that code for the taste receptors on your tongue. So they can’t tell you why you like whiskey, or why you’re loud, or why you’re suddenly beset by the urge to go kilt shopping.

But certain marketing suggests otherwise. One 2018 Ancestry ad urged viewers to discover their “greatness.” Over footage of a pirouetting figure skater, the narrator intoned, “You can find out where you get your precision.” A pie chart flashed on the screen: Scandinavia 48 percent. “Your grace”—27 percent Asia Central. “Your drive”—21 percent Great Britain.

“They're tying these traits to your DNA and to a particular ethnicity,” says Katie Hasson, program director on genetic justice at the Center for Genetics and Society. “There's a real danger that it reinforces the mistaken, outdated, and dangerous idea that race and ethnicity are biological, and all of the ills that have come along with that.”

Here’s another example: One 2016 Ancestry commercial featured Kyle, a 50-year-old man from Queens who was raised culturally German, donning lederhosen, eating schnitzel, performing in a German dance ensemble. He takes a DNA test and discovers that— lo!—he’s got zero German DNA, and half of his ancestry comes from Scotland, Ireland, and/or Wales. “So I traded my lederhosen for a kilt,” he says with a smile, casually tossing aside five decades worth of culture and community.

Underpinning Kyle’s cultural cosplay is the idea that there’s something innate about Scottishness; that DNA alone grants instant membership in Club Scotland and all the associated cultural trappings. (Ancestry is not alone in pushing the message that DNA can unlock your hidden true self. One 23andMe TV spot invited viewers to “know more about you” and featured a woman carousing with locals and performing vaguely stereotypical activities in the countries that matched her DNA results. In another commercial, MyHeritage promised to help customers “find amazing stories hidden within.”)

Not so fast, say sociologists. Although many people look to genetic ancestry for answers about where they belong, this line of inquiry risks infringing upon communities for whom those identities are central and important, says Alondra Nelson, president of the Social Science Research Council and author of The Social Life of DNA: Race, Reparations, and Reconciliation. “When a community has rules about what it means to participate, a genetic inference does not allow you entrée.”

“Culture is not something you inherit through your genes; it’s something you live through your experience,” echoes science journalist Angela Saini, author of the 2019 book Superior: The Return of Race Science. “If, say, you were not raised in an Italian family and didn’t have any exposure to Italian culture, but found out you had ancestry in Italy, what does that tell you? You’re not then suddenly more likely to like eating pasta or have a different personality. What I find interesting is what people imagine these results tell them, and what people often do is resort to racial stereotypes.” In other words, for someone with no real-world experience of what it means to be Italian, a sudden ethnicity estimate can’t provide that.

In her book, Saini argued that scientific racism has persisted in part because it’s baked into the foundation of modern scientific study. The “father of scientific racism,” a 19th-century Philadelphian named Samuel Morton, tried to prove racial differences in intelligence by measuring skull size. Carl Linnaeus and Charles Darwin both racially classified people; Linnaeus erroneously assigned race-based personality traits. While mainstream scientists rejected eugenics and race science after World War II, those ideas persisted underground, in segregationist-funded, pseudoscientific journals like Mankind Quarterly, an anthropology journal first published in 1961. White nationalists subsequently invoked this research to support claims of genetic superiority. Although legitimate studies of genetics consistently reaffirm that race is a social construct—albeit one with very real influence—the idea that we can biologically categorize people this way endures, even seeping into mainstream science in the form of studies and books advancing racist hypotheses.

This has played out during the pandemic through news of Black and brown people contracting and dying from Covid-19 at disproportionately high rates. Race is often a proxy for other factors like economic class and social discrimination, which can have a huge bearing on health outcomes. But absent that context, these statistics can produce the impression that the difference is behavioral or biological. So in September, a group of more than 70 scientists from the realms of genetics, sociology, public health, and medicine published a letter in Science calling on the National institutes of Health to address the misuse of race as a category for measuring human biological differences, citing concerns that Covid-19’s effect on communities of color would be misattributed to innate differences. “In 2016, we called for the elimination of the use of race as a means to classify biological diversity in both laboratory and clinical research,” they wrote. “Since that time, little has changed.” They called on the NIH to lead education efforts for both scientists and the public and to develop best practices for characterizing human genetic diversity in scientific research.

For those curious about their roots, genetics experts consulted for this story say, genealogy tools and family tree builders are better suited to those inquiries. DNA testing can even help fill out family trees, particularly for communities dispossessed of their histories. “I do accept that for many people, especially children of immigrants and children of people who have slavery in their ancestry, who have been wrenched from a culture or who have lost touch with their geographical roots, sometimes DNA testing can feel like the only way to reclaim those roots,” says Saini.

As for those ethnicity pie charts, “I’m not sure these tests can tell people what they want to know,” says Hasson. “If you want to know about your family history, your family is a good place to learn about that.”

And as the sudden surge in Scottishness illustrates, if you rely on a consumer genetics company to estimate your heritage, it can change on a dime. That portrait will always depend on who’s in the pool of proxies and how each company’s algorithm is programmed to sort DNA markers into location-based groups.

In an email from a company spokesperson, Ancestry acknowledged these criticisms. "Through ongoing innovation we help customers create a more complete family portrait that continually yields new discoveries as science and technology advances,” they wrote. “While DNA does not change, the science we use to analyze it does."

These results have always been a moving target. Nelson recalls attending a conference back in the early days of consumer DNA testing, where the founder of a now defunct company revealed that he was 25 percent sub-Saharan African. “He would say, ‘You would never know by looking at me,’ and it was supposed to be a kind of, ‘Gee whiz. Look at the things you don't know about yourself that ancestry testing can tell you,’” she says. When she ran into him a few years later, he’d gotten an update: his Sub-Saharan ancestry had fallen to 8 percent. “For him, it was like, ‘Our assumptions change. The reference database changed,’” she recalls. But to her, as a social scientist who understands how important identity is, she continues, “this is a core thing to human society.”

After all, she points out, the United Nations recognizes personal identity as a human right. It’s serious business. And absent the genetic counseling available in a medical setting, consumers are left to their own devices to interpret their results, guided by mixed messages that remind us that we’re all one human family but also play up ethnic differences that may not even exist. “I think there could have been a greater sense of responsibility and a greater appreciation of the gravity of overturning people's conception about their families and their lives and their identities,” says Nelson. “It's not a folly to tell somebody they're one thing and then tell them that they're something else.”


More Great WIRED Stories