« Great Article about Benefits of Twine from a Beta User | Main | Great Collective Intelligence Book; Includes a Chapter I Wrote »

April 19, 2008

The Wikipedia, Knowledge Preservation and DNA

I had an interesting thought today about the long-term preservation and transmission of human knowledge.

The Wikipedia may be on its way to becoming the one of the best places in which to preserve knowledge for future generations. But this is just the beginning. What if we could encode the Wikipedia into the Junk DNA portion of our own genome? It appears that something like this may actually be possible -- at least according some recent studies of the non-coding regions of the human genome.

If we could actually encode knowledge, like the Wikipedia for example, into our genome, the next logical step would be to find a way to access it directly.

At first we might only be able to access and read the knowledge stored in our DNA through a computationally intensive genetic analysis of an individual's DNA. In order to correct any errors in the data from mutuation, we would also need to cross-reference this individual data with similar analyses from the DNA of other people who also carry this data in their DNA. But this is just the beginning. There are however ways to stored data such that there is enough redundancy to protect against degradation. Assuming we could do this we might be able to eliminate the need for cross referencing as a form of error correction -- the data itself would be self-correcting so to speak. If we could accomplish this then the next step would be to find a way for an individual to access the knowledge stored in their DNA in real-time, directly. That's a long way off but there may be a way to do this using some future nano-scale genomic-brain interface. This opens up some fascinating areas of speculation to say the least.

 

 

Why The Wikipedia?

The Wikipedia has certain qualities that make it better than other forms of knowledge preservation and transmission:

  • The Wikipedia exists primarily in electronic form. It is not subject to age or decay like a physical encyclopedia or document. This means it can persist forever, and will not be lost to time, if it continues to be maintained electronically in the future.
  • The Wikipedia is replicated in multiple locations around the world. The fact that it is so easy to replicate, and is so widely replicated means that it is less at risk of being lost due to a local disaster at any given storage location. It also means it is more likely to continue, somewhere, as a living document that goes on to reflect majority consensus reality into the distant future. It is highly improbable that it will ever suffer the same fate as certain ancient documents which only existed in one place and were subsequently lost in floods, fires, or wars, etc. At this point only a planet-wide extinction level event could erase the Wikipedia and/or prevent future generations from finding it.
  • The Wikipedia is highly viral, it's content is increasingly cited and it is far ahead of any competing system in terms of coverage and brand-recognition. Because so many other pieces of content on the Web and in other media refer to the Wikipedia as the world's global authority for knowledge, it is considered increasingly authoritative and is increasingly visible and increasingly cited. The Law of Increasing Returns indicates that this will continue to self-amplify, making the Wikipedia the best candidate for an authoritative global repository of knowledge.

What this means is that if you have any knowledge that you want to preserve for future generations, a good place to put it is in the Wikipedia. Putting it there almost guarantees that it will propagate around the world and throughout the human-explored universe (in the future, if we become a spacefaring civilization), and into the distant future of human civilizations.

The Potential For Storing Knowledge in DNA

Is it possible to store knowledge -- such as the Wikipedia -- in human DNA? It would certainly be useful if we could do this. By storing knowledge in human DNA of living humans, or of common bacteria for that matter, it could then potentially be passed down and spread through generations into the far future. However the mutability of DNA over time might gradually introduce errors that would degrade the information within particular lines of DNA over long periods of time.

Perhaps this could however be mitigated by comparing DNA samples from a large cross-section of individuals within the population of descendants of original holders of DNA-knowledge-archives in the future -- this would effectively enable statistical error cancellation. The farther in the future from the date at which the knowledge is "written" to the DNA of some number of humans, the more people's DNA would be needed to eliminate the errors statistically. This would however in principle counteract mutations and enable the reliable recovery of messages in DNA even very far in the future.

The fact that it is in principle possible to encode knowledge into human (or other) DNA begs the question of whether there is already knowledge stored there? It's certainly worth a look! Maybe there is already a message there for us? One can only wonder if there is already an ancient "Wikipedia" of sorts already written there.

Interestingly enough, when certain statistical tests are run against human DNA,  it does seem to have properties that are indicative of written language, but only in the "junk" regions of the genome. Maybe it's not "junk" after all. Below is an article that discusses a recent discovery related to this:

Language in junk DNA

You've probably heard of a molecule called DNA, otherwise known as "The Blueprint Of Life". Molecular biologists have been examining and mapping the DNA for a few decades now. But as they've looked more closely at the DNA, they've been getting increasingly bothered by one inconvenient little fact - the fact that 97% of the DNA is junk, and it has no known use or function! But, an usual collaboration between molecular biologists, cryptoanalysists (people who break secret codes), linguists (people who study languages) and physicists, has found strange hints of a hidden language in this so- called "junk DNA".

Only about 3% of the DNA actually codes for amino acids, which in turn make proteins, and eventually, little babies. The remaining 97% of the DNA is, according to conventional wisdom, not gems, but junk.

The molecular biologists call this junk DNA, introns. Introns are like enormous commercial breaks or advertisements that interrupt the real program - except in the DNA, they take up 97% of the broadcast time. Introns are so important, that Richard Roberts and Phillip Sharp, who did much of the early work on introns back in 1977, won a Nobel Prize for their work in 1993. But even today, we still don't know what introns are really for.

Simon Shepherd, who lectures in cryptography and computer security at the University of Bradford in the United Kingdom, took an approach, that was based on his line of work. He looked on the junk DNA, as just another secret code to be broken. He analysed it, and he now reckons that one probable function of introns, is that they are some sort of error correction code - to fix up the occasional mistakes that happen as the DNA replicates itself. But even if he's right, introns could have lots of other uses.

The next big breakthrough came from a really unusual collaboration between medical doctors, physicists and linguists. They found even more evidence that there was a sort-of language buried in the introns.

According to the linguists, all human languages obey Zipf's Law. It's a really weird law, but it's not that hard to understand. Start off by getting a big fat book. Then, count the number of times each word appears in that book. You might find that the number one most popular word is "the" (which appears 2,000 times), followed by the second most popular word "a" (which appears 1,800 times), and so on. Right down at the bottom of the list, you have the least popular word, which might be "elephant", and which appears just once.

Set up two columns of numbers. One column is the order of popularity of the words, running from "1" for "the", and "2" for "a", right down "1,000" for "elephant". The other column counts how many times each word appeared, starting off with 2,000 appearances of "the", then 1,800 appearances of "a", down to one appearance of "elephant".

If you then plot on the right kind of graph paper, the order of popularity of the words, against the number of times each word appears you get a straight line! Even more amazingly, this straight line appears for every human language - whether it's English or Egyptian, Eskimo or Chinese! Now the DNA is just one continuous ladder of squillions of rungs, and is not neatly broken up into individual words (like a book).

So the scientists looked at a very long bit of DNA, and made artificial words by breaking up the DNA into "words" each 3 rungs long. And then they tried it again for "words" 4 rungs long, 5 rungs long, and so on up to 8 rungs long. They then analysed all these words, and to their surprise, they got the same sort of Zipf Law/straight-line-graph for the human DNA (which is mostly introns), as they did for the human languages!

There seems to be some sort of language buried in the so-called junk DNA! Certainly, the next few years will be a very good time to make a career change into the field of genetics.

So now, around the edge of the new millennium, we have a reasonable understanding of the 3% of the DNA that makes amino acids, proteins and babies. And the remaining 97% - well, we're pretty sure that there is some language buried there, even if we don't yet know what it says. It might say "It's all a joke", or it might say "Don't worry, be happy", or it might say "Have a nice day, lots of love, from your friendly local DNA".   (source)

Now to complete this thought: what if the information-carrying capacity of the so-called Junk DNA of the human genome is sufficient to hold the content of the Wikipedia? Then all we would need is some way of writing to it -- perhaps via gene therapy via infection by a virus that carries a copy of the Wikipedia.

This would enable volunteers to accept copies of the Wikipedia into their DNA and become vectors for the Wikipedia. They and their descendants would become walking encyclopedias and would preserve human knowledge for future generations. If only some people had this done then they and their lineages would be a sort of priesthood with particular importance for the future of humanity. It sounds like the basis for a really great science-fiction thriller!

By copying the Wikipedia into our own DNA we might be able to ensure that wherever human beings end up in the universe, the Wikipedia will go with them. Even if in some distant world humans destroy their civilization in a nuclear holocaust or are almost wiped out by an asteroid and have to rebuild from the stone-age again, they will eventually rediscover genomics and soon after that they will find the Wikipedia in their genome.

This is a kind of "backup strategy" for our civilization and all the knowledge we consider to be most important. Of course it is not clear yet whether the Junk DNA could carry enough information to encode the entire Wikipedia, nor is it clear that the Junk DNA is actually "junk" -- perhaps there is already something there that should not be overwritten? Or perhaps it serves some other purpose in human development and evolution that we shouldn't mess around with. It remains to be seen.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451b21169e200e551eb8e9d8833

Listed below are links to weblogs that reference The Wikipedia, Knowledge Preservation and DNA:

Comments

I couldn't help but think of the film, Johnny Mnemonic, when reading your last blog post regarding the transfer/preservation of data through DNA; I question the implications that it would have on the human psyche given that internal, all-encompassing information storage would inevitably develop into the ability to access and process it. As we live today, my generation's unbridled access to knowledge, specifically the depth of human suffering and disaster, has crippled the ability for emotion.

Good comments. But would the non-coding portion of bacterial DNA be any more unusued in those organisms than the non-coding areas of our human DNA is used by us (if that is the case)?

It would be very cool to encode the Wikipedia, or a good portion of it, into bacterial DNA -- perhaps the DNA of a common human stomach bacteria for example -- and set it loose to preserve this information for the long-term future. Of course this would only be worthwhile if introducing this data into the bacterial genome did not harm the bacterium or negatively impact its environmental fitness or ability to replicate without errors.

At first flush, the most obvious way a sequence can survive being broken into words of 3, 4 and 5 and have the same distribution of word frequency is if the most common subsequences are the same letter repeated. If you have AAAAAAAAAAA twice as often as CCCCCCCCC then it's stable, if you have ACATATAGACAT then the 3 groups are all 1, but the 4 groups are 2,1 and the 5 groups are all 1.

I'm not good enough at math to prove that after a sunday dinner and a couple of glasses of wine, but it would be interesting what other sequences have a stable Zipf distribution under different subdivisions. You'd expect the most common case in the length 3 split to cause there to be 3 equally common cases in the length 4 split.

I think there is a big problem with this idea conceptually. First off, the entire genome is only about 750MB, so not a whole lot of storage to begin with. Second, we already know that the 98% non-coding regions are not at all junk. Recent papers have demonstrated that important transcription factor binding sites and other regulatory elements lay in the non-coding region. Furthermore, non-coding microRNA is now known to play a large role in gene regulation. Non-coding regions of the genome also become methylated in important epigenetic events. I don't think there is much, if any, "free space" in the human genome. You would get much more mileage with this idea if you apply it to bacteria.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

My Photo

Get my RSS Feed

Twine | Nova Spivack - My Public Twine items

Radar Networks

  • twine.jpg
  • logo_v5_03b.jpg
  • logo_v5_03b.jpg

Nova's Trip to Edge of Space

  • Stepsedgestratosphere
    In 1999 I flew to the edge of space with the Russian air force, with Space Adventures. I made it to an altitude of just under 100,000 feet and flew at Mach 3 in a Mig-25 piloted by one of Russia's best test-pilots. These pics were taken by Space Adventures from similar flights to mine. I didn't take digital stills -- I got the whole flight on digital video, which was featured on the Discovery Channel.

Nova & Friends, Training For Space...

  • Img021
    In 1999 I was invited to Russia as a guest of the Russian Space Agency to participate in zero-gravity training on an Ilyushin-76 parabolic flight training aircraft. It was really fun!!!! Among other people on that adventure were Peter Diamandis (founder of the X-Prize and Zero-G Corporation), Bijal Trivedi (a good friend of mine, science journalist), and "Lord British" (creator of the Ultima games). Here are some pictures from that trip...

Featured Past Articles

Pages

People I Like

  • Peter F. Drucker
    Peter F. Drucker was my grandfather. He was one of my principal teachers and inspirations all my life. My many talks with him really got me interested in organizations and society. He had one of the most impressive minds I've ever encountered. He died in 2005 at age 95. Here is what I wrote about his death. His foundation is at http://www.pfdf.org/
  • Mayer Spivack
    Mayer Spivack is my father; he's a brilliant inventor, cognitive scientist, sculptor, designer and therapist. He also builds carbon fiber trimarans in his spare time, and studies animal intelligence. He is working on several theories related to the origins of violence and ways to prevent it, new treatments for learning disabilities, and new theories of cognition. He doesn't have a Web site yet, but I'm working on him...
  • Marin Spivack
    Marin Spivack is my brother. He is the one of the only western 20th generation lineage holders of the original Chen Family Tai Chi tradition in China. He's been practicing Tai Chi for about 6 to 10 hours a day for the last 10 years and is now one of the best and most qualified Tai Chi teachers in America. He just returned from 3 years in China studying privately with a direct descendant of the original Chen family that created Tai Chi. The styles that he teaches are mainly secret and are not known or taught in the USA. One thing is for sure, this is not your grandmother's Tai Chi: This is serious combat Tai Chi -- the original, authentic Tai Chi, not the "new age" form that is taught in the USA -- it's intense, physically-demanding, fast, powerful and extremely deadly. If you are serious about Tai Chi and want to learn the authentic style and applications, the way it was meant to be, you should study with my brother. He's located in Boston these days but also travels when invited to teach master classes.
  • Louise Freedman
    Louise specializes in art-restoration. She does really big projects like The Museum of Fine Arts in Boston, The Gardner Museum and Harvard University. She's also a psychotherapist and she's married to my dad. She likes really smart parrots and she knows how to navigate a large sailboat.
  • Kris Thorisson
    Kris has been working with me for years on the design of the Radar Networks software, a new platform for the Semantic Web. He has a PhD from the MIT Media Lab. He designs intelligent humanoids and virtual realities. He is from Iceland, which makes him pretty cool.
  • Kimberly Rubin
    Kim is my girlfriend and partner, and also a producer of 11 TV movies, and now an entrepreneur in the pet industry. She is passionate about animals. She has unusual compassion and a great sense of humor.
  • Kathleen Spivack
    Kathleen Spivack is my mother. She's a poet, novelist and creative writing teacher. She was a personal student of Robert Lowell and was in the same group of poets with Silvia Plath, Elizabeth Bishop and Anne Sexton. She coaches novelists, playwrites and poets in France and the USA. She teaches privately and her students, as well as being published, have won many of the top writing prizes.
  • Josh Kirschenbaum
    Josh is a visual effects whiz, director and generalist hacker in LA. We have been pals and collaborators since the 1980's. Josh is probably going to be the next Jim Cameron. He's also a really good writer.
  • Joey Tamer
    Joey is a long-time friend and advisor. She is an expert on high-tech strategic planning.
  • Jim Wissner
    Jim is among the most talented software developers I've ever worked with. He's a prolific Java coder and an expert on XML. He's the lead engineer for Radar Networks.
  • Jerry Michalski
    I have been friends with Jerry for many years; he's been advising Radar Networks on social software technology.
  • Chris Jones
    Chris is a long-time friend and now works with me in Radar Networks, as our director of user-experience. He's a genius level product designer, GUI designer, and product manager.
  • Bram Boroson
    Bram is an astrophysicist and college pal of mine. We spend hours and hours brainstorming about cellular automata simulations of the universe. He's one of the smartest people I ever met.
  • Bari Koral
    Bari Koral is a really talented singer songwriter. We co-write songs together sometimes. She's getting some buzz these days -- she recently opened for India Arie. She worked at EarthWeb many years ago. Now she tours almost all year long and she just had a hit in Europe. Check out her video, on her site.
  • Adam Cohen
    Adam Cohen is a long-term friend; we were roommates in college. He is a really talented composer and film-scorer. He doesn't have a Web site but I like him anyway! He's in Hollywood living the dream.

Interesting Links

Blog powered by TypePad
Member since 08/2003

Tip Jar

Give me a tip!

Tip Jar