Genetic advances threaten privacy of one hundred thousand genome project

Advances in genetics are mostly wonderful (and completely amazing) but each discovery chips away at privacy. In September a paper showed that sequencing a person’s genome allows reconstructing their face.

So far, the quality of reconstruction is not that impressive. It may allow identifying a match between 10 photographs, but not a million. In other words so far the technology mostly identifies a person’s race, creating an average of faces from a race, but not much more specific than that.

But ultimately, it is likely that a reconstruction ought to be possible because identical twins separated at birth but with identical genetics end up with very similar faces. Machine learning moves quickly. In 2014 professor Levinovitz wrote that “another ten years until a computer Go champion may prove to optimistic” as Go was much more difficult than chess for computers. In December 2015 professor Chabris wrote “Why Go Still Foils Computers”. The following month, DeepMind published a paper showing a Go-playing computer beating humans. It went on to beat the world champion in March 2016.

The obvious advantage of facial reconstruction is in crimes: it would be great to produce a picture of a criminal from the genetic evidence they leave in a crime scene.

But it also threatens privacy. Anyone providing their genetic sequence expecting anonymity should probably not expect such anonymity to last. This is particularly the case as Facebook and other social media platforms improve their facial recognition algorithms and make them available via APIs. A genome will reveal a photo and a photo will reveal an identity.

While you consider this, some ironic trivia comes from the company behind the paper: Human Longevity Inc was founded by Craig Venter, a giant of genetic science. In 2002 the world of genetic science was rocked by his disclosure that the first privately sequenced genome was his own, rather than the planned use of anonymous donor. No one knew Venter was doing this until he confessed. If his facial reconstruction algorithm was around back then, it would have become immediately apparent what he had done.

At PKB we think about medical privacy a lot, and genetics is the part of the medical record that causes the most difficulty.

Even simple genetics causes common problems: like your blood group. Daisy Lowe was taught about blood groups at school and came home to ask her mother why neither parent had O blood group, while Daisy did. Her mother confessed to an affair with Gavin Rossdale, a paternity test confirmed that Daisy’s biological father was Rossdale. Daisy’s parents divorced. What makes this story particularly sad is that Daisy and Pearl’s interpretation of the blood group was wrong, and the lack of O blood group did not mean a different biological father. But given that at least 1 in every 50 children don’t have the biological father they think they do, the simple act of putting blood groups on frequently seen identity cards is a privacy problem.

Even putting gender is problem. The point of identity cards is to state the identity (“my name is Mohammad Al-Ubaydli”) and allow identification (“this is what my face looks like”). Gender is not necessary to identification, and as we are increasingly aware, it can be separate from identity. To protect transgender people – without affecting the usefulness of identity documents – campaigners in the UK and other countries are calling to remove gender from driving licenses and passports.

This has implications for the usability of privacy labels. Genetic sequences and interpretation are hard to deal with. Every data point in PKB has one of four privacy labels (general, mental, sexual, social care) and the patient decides which combination of privacy labels they share with each team (eg just the general with a physiotherapy team or all four labels with a psychiatrist).

A genome indicates paternity, and in the future it may well indicate sexuality and gender identity. Individual genes will indicate all kinds of other aspects of a person’s body health and future (their phenotype). It is likely not possible to compartmentalise the interpretations from a genome sequence (genotype). So if you care about your privacy, the only safe way to proceed is to hide your genotype and use privacy labels for your phenotype.

This has implications for the 100,000 genomes project. These wonderful volunteers are greatly advancing science and society by making their genomes available for research. But as advances accumulate their privacy erodes. This is a sacrifice they made for all of us and for that I am grateful.

Leave a Reply