Jump to content

genes in human genome


mfa5

Recommended Posts

the question was "to use a stretech of the human DNA genome (25k) as a unique identification tag what length of DNA sequence would you use?"

 

The answer is below.

 

I am struggling to figure out where the exponents and bases came from?

 

"For calculations such as these, it is useful for purposes of estimation to

 

 

 

remember that 4

 

 

5 =103 (4n produces the series: 4, 16, 64, 256, 1024; thus, 45 =

 

 

1024

 

 

=103) and that (1/4)5 (1/10)3. Hence, 4 different nucleotides can generate

 

 

1024 different DNA sequences, each 5 nucleotides long. Similarly, an 8-

 

 

 

nucleotide DNA sequence can provide enough diversity to tag 25,000 genes,

 

 

 

there being 4

 

 

8 or 65,536 possible 8-nucleotide sequences.

 

.

 

"

Link to comment
Share on other sites

The question is completely unrelated to genes (not all sequences on the DNA are coding for genes). Instead, it asks how long does a sequence has to be, in order to be unique for a 25 kb region. The exponent is derived from the fact that each position can have on of four bases (ACGT). Now, if your sequence has only one base, you will find that at each position of the 25 kb stretch you have a 1/4 chance of having this particular base. Obviously, this is not unique at all. So how long does it has to be?

 

Also moved to homework.

Edited by CharonY
Link to comment
Share on other sites

thanks for this, sorry my question wa snot more clear

 

given the starting point of the question, 25,000 genes, 3.20E+9 nucleotides and the need to use stretch of DNA in each gene as a unique identification tag, I don't understand I don't understand the maths, where does 4.0E+5 and 10.0E+3 come from? In particular the 4.0E+5 and why is it important to remember that and 10.0E+3?

 

 

 

Thanks

 

 

 

 

 

The question is completely unrelated to genes (not all sequences on the DNA are coding for genes). Instead, it asks how long does a sequence has to be, in order to be unique for a 25 kb region. The exponent is derived from the fact that each position can have on of four bases (ACGT). Now, if your sequence has only one base, you will find that at each position of the 25 kb stretch you have a 1/4 chance of having this particular base. Obviously, this is not unique at all. So how long does it has to be?

 

Also moved to homework.

Link to comment
Share on other sites

Actually re-reading the question I have to say that I am not sure what the question really is. It is not clear to me, for instance, what precisely should be unique. Unique for an individual (i.e. a specific genome), unique for a genomic region or unique for a gene. My initial assumption was that the question aimed at looking at a stretch that would uniquely hybridize to a given 25 kb region. But that may very well not be the case.

Link to comment
Share on other sites

  • 2 weeks later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.