In today’s forensic science theory lectures we got taught that not only is DNA not unique, but there is an actual chance of two people having the same DNA profile. The lecturer first explained the birthday paradox, and then tried to explain it with DNA and got me terribly confused with what numbers go where in what equations. So I’ve read up on it and will now try and explain the birthday paradox and why there are potentially thousands of ‘DNA doppelgangers’ in the world.
If you ask someone their birthday, the chances that you will have the same one is 1 in 365. (I am ignoring leap years and assuming uniform distribution etc). However the birthday paradox is that if you ask a room full of 23 people, there is a 50% chance that two people will have the same birthday. Above 57 people and the chances tend towards 100%. I remember this at school in a class of 30 and we had two with the same birthday, and lo and behold we had two with the same in the forensics class today. It works because you’re not comparing your one birthday with everyone’s; you’re comparing everyone’s with everyone’s which improves the odds of finding a match dramatically. You can read more about it on Wikipedia.
The formula involves calculating 365!, which is just enormous, so approximating formulas are used such as the Taylor series, which I’ve implemented here in Python.
def computeBirthday(num, total): """ Computes a rough estimate of the probability to birthday problem based on Taylor series: p(n) =~ 1 - e^((-num^2)/(2*total)) """ denom = total * 2.0 nom = -(math.pow(num,2)) p = nom / denom e = math.pow(math.e, p) ans = (1 - e) * 100 print ('%s%%') % (ans) return ans
If you call computeBirthday(23, 365) you get the answer 51.55% (which is roughly the correct answer of 50.7%). I’ve also made the reverse, which computes the approximate number of people needed to get a 50% change of a match. Calling howManyFor50Percent(365) gives the answer 22.9999, which is pretty much 23.
def howManyFor50Percent(n): """ Computes approx number of people needed to get a 50% change of matching N = 1/2 + squareroot(1/4 - (2*n) * ln(0.5)) """ sqrt = 0.25 - (2 * n) * math.log(0.5) ans = 0.5 + math.sqrt(sqrt) print ('%s') % (ans) return ans
I then applied this to DNA. This article explains that the birthday problem works in just the same way for DNA. I’ve been often told the odds of a random person having the same DNA as you is 1 in a billion. There is a national database in the UK of DNA which currently has 3.4million profiles according to the Home Office. So swapping 1/365 for 1/1000000000 and 23 people for 3,400,000 people – what will the result be? 100%! So there is 100% chance that at there is at least one matching DNA profile in the database. That itself is not so amazing. What is amazing is when you do howManyFor50Percent(1000000000) and get the answer 37,233. You only need 37,233 people before you get a 50% chance of a matching DNA profile!
I tried it with bigger numbers, and 274,000 seems to be the minimum number of people needed to have a 100% chance of finding a match. Assuming 6,796,000,000 people in the world, that means 24,803 people (6,796,000,000/274,000) in the world with dopplegangers! And 222 people in the UK (assuming population of 60,943,912). This of course is crude and approximate, and doesn’t consider twins, ethnicity, and family members. But still, I think it’s awesome maths! If I have any calculations wrong, please comment 🙂