physics4me

physicsgg

Information Theory of DNA Sequencing

leave a comment »

Schematic for shotgun sequencing

Abolfazl Motahari, Guy Bresler, David Tse
DNA sequencing is the basic workhorse of modern day biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original sequence. A basic question is: given a sequencing technology and the statistics of the DNA sequence, what is the minimum number of reads required for reliable reconstruction? This number provides a fundamental limit to the performance of any assembly algorithm. By drawing an analogy between the DNA sequencing problem and the classic communication problem, we formulate this question in terms of an information theoretic notion of sequencing capacity. This is the asymptotic ratio of the length of the DNA sequence to the minimum number of reads required to reconstruct it reliably. We compute the sequencing capacity explicitly for a simple statistical model of the DNA sequence and the read process. Using this framework, we also study the impact of noise in the read process on the sequencing capacity….
Read more: arxiv.org/pdf/1203.6233v2.pdf

Written by physicsgg

April 2, 2012 at 2:14 pm

Posted in BIOLOGY

Tagged with

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: