amino acid or nucleotide

huda · November 16, 2011

hi,

if i have sequences of integers (have meaning in ordering) , and i want to make pairwise alignment between each two sequences.

what is the best way to encod these sequenes?

is converted each sequence of integers into nucleotides sequences?

or convert each sequence of integers into amino acid sequences?

or convert each sequence of integers into nucleotides sequence, then convert it into amino acids sequences?

is the length of sequences has relation with the method of the converting?

many thanks

**CharonY** · November 16, 2011

This does not make much sense to me. Why would you want to convert integers into DNA/AA sequences? Conversion from DNA to AA creates a specific context (governed by the genetic code, as well as likelihood of certain point mutations). If the integer sequence was obtained from another source, the whole context does not apply (i.e. it does not matter what you do, it would be arbitrary in any case).

Arete · November 16, 2011

Noncoding DNA will generally not be translatable into AA.

What does your numerical data represent and what are you attempting to do with it?

There's probably a more appropriate method for analyzing it than translation into nucleotides...

huda · November 18, 2011

sequence alignment is not relate just with DNA sequences , this technique is used to detect computer virus , detect malicious packet in networks .

So, there are some researches did that.

for example: the code of computer virus is converted into sequence of commands , then transllate these commands into nuclotide or amino acid seq.

I want to say that some researchers simulate DNA sequence ,and converted their data into a biological sequence but of course if these data meet some conditions.

this article :

http://ultrastudio.org/en/Sequence_alignment

this article is confirming what I say. Of course, it is not the only.

there are papers, theses did seq alignment with non biological data, I can provide it if you want.

my query was , which is the best way to encoding non biological data?

**CharonY** · November 18, 2011

You were asking for algorithm that are used for DNA or AA matches. From your own link you will realize that there the type of information is relevant, specifically for the scoring matrix. Of course, the problem can be generalized to general pattern matching, however, other algorithms or different implementations of the algorithms used for DNA and protein alignment are more useful than that as Arete mentioned.

Maybe to illustrate:

compare Amino acid sequence SPT to SPT. Apparently no difference. However, let us say that on the DNA level it is TCT CCT ACT versus TCC CCC ACC. You will note that despite the same AA sequence at least three substitutions happened. The scoring matrix assigns that a specific distance.

Or another example. Assume you have a substitution of S to I on the AA level. If you just measure letter exchange it would be the same as, say C to W. However, in the first case the exchange may have been anything from TCU/T/A/G to ATT/C/A (so at least two substitutions). However from C to W it would be from TGT/C to TGG (so only one substitution needed). The substitution matrices take these possibilities into account (so funnily the Blosum62 had an error in it, but BLAST performed better with the error).

However, for an arbitrary string in substitutions do not follow a specific rule (as in this case that of DNA mutations) the distance would be based on a totally different measure, or could be binary.

Edited November 18, 2011 by CharonY

huda · November 25, 2011

You were asking for algorithm that are used for DNA or AA matches. From your own link you will realize that there the type of information is relevant, specifically for the scoring matrix. Of course, the problem can be generalized to general pattern matching, however, other algorithms or different implementations of the algorithms used for DNA and protein alignment are more useful than that as Arete mentioned.

Maybe to illustrate:

compare Amino acid sequence SPT to SPT. Apparently no difference. However, let us say that on the DNA level it is TCT CCT ACT versus TCC CCC ACC. You will note that despite the same AA sequence at least three substitutions happened. The scoring matrix assigns that a specific distance.

Or another example. Assume you have a substitution of S to I on the AA level. If you just measure letter exchange it would be the same as, say C to W. However, in the first case the exchange may have been anything from TCU/T/A/G to ATT/C/A (so at least two substitutions). However from C to W it would be from TGT/C to TGG (so only one substitution needed). The substitution matrices take these possibilities into account (so funnily the Blosum62 had an error in it, but BLAST performed better with the error).

However, for an arbitrary string in substitutions do not follow a specific rule (as in this case that of DNA mutations) the distance would be based on a totally different measure, or could be binary.

THANK U VERY MUCH FOR INFORMATION.

I'm begginer in this field, may i do not need to know deeply.

I'm still doubt if need that.

Really, your commnte draw my attantions to important things. I understood your eamples, good example

but there is still somthing not clear.

please , be patient with me.

regarding my data( sequence of integers).do u mean that If I have non biological data, I have not to use AA or DNA seq.?

or I can do that.

my data is integers , the language that I used provide tool to convert integers into AA or DNA SEQ. , but I understood from u in this case I will get arbitrary seq.

Right? the problem is that authors who used seq. alignment did not point for that. So I'm confused.

please, if u have any link relate seq. alignment in computer science provide me.

tanks

huda · November 26, 2011

THANK U VERY MUCH FOR INFORMATION.

I'm begginer in this field, may i do not need to know deeply.

I'm still doubt if need that.

Really, your commnte draw my attantions to important things. I understood your eamples, good example

but there is still somthing not clear.

please , be patient with me.

regarding my data( sequence of integers).do u mean that If I have non biological data, I have not to use AA or DNA seq.?

or I can do that.

my data is integers , the language that I used provide tool to convert integers into AA or DNA SEQ. , but I understood from u in this case I will get arbitrary seq.

Right? the problem is that authors who used seq. alignment did not point for that. So I'm confused.

please, if u have any link relate seq. alignment in computer science provide me.

tanks

please , I will concentrate my query.

if the range of my data is more than 4(a,c,g,t) and more than 20(in case of amino acid).

can i use this way

let say

if the max values in my data is 11000 = (4^7) this if the base=4(g,t,c,a) or 20^4 if the base =20 in case of amino acid

is the composition among symbols is allowed in case of alignment?

thanks

Sign In

amino acid or nucleotide

Recommended Posts

huda

Link to comment

Share on other sites

CharonY

Link to comment

Share on other sites

Arete

Link to comment

Share on other sites

huda

Link to comment

Share on other sites

CharonY

Link to comment

Share on other sites

huda

Link to comment

Share on other sites

huda

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Important Information