Jump to content

amino acid or nucleotide


Recommended Posts

hi,

 

if i have sequences of integers (have meaning in ordering) , and i want to make pairwise alignment between each two sequences.

what is the best way to encod these sequenes?

 

 

 

is converted each sequence of integers into nucleotides sequences?

 

or convert each sequence of integers into amino acid sequences?

 

or convert each sequence of integers into nucleotides sequence, then convert it into amino acids sequences?

 

is the length of sequences has relation with the method of the converting?

 

many thanks

 

 

 

 

Link to comment
Share on other sites

This does not make much sense to me. Why would you want to convert integers into DNA/AA sequences? Conversion from DNA to AA creates a specific context (governed by the genetic code, as well as likelihood of certain point mutations). If the integer sequence was obtained from another source, the whole context does not apply (i.e. it does not matter what you do, it would be arbitrary in any case).

Link to comment
Share on other sites

Noncoding DNA will generally not be translatable into AA.

 

What does your numerical data represent and what are you attempting to do with it?

 

There's probably a more appropriate method for analyzing it than translation into nucleotides...

Link to comment
Share on other sites

sequence alignment is not relate just with DNA sequences , this technique is used to detect computer virus , detect malicious packet in networks .

So, there are some researches did that.

 

for example: the code of computer virus is converted into sequence of commands , then transllate these commands into nuclotide or amino acid seq.

 

I want to say that some researchers simulate DNA sequence ,and converted their data into a biological sequence but of course if these data meet some conditions.

this article :

http://ultrastudio.org/en/Sequence_alignment

this article is confirming what I say. Of course, it is not the only.

there are papers, theses did seq alignment with non biological data, I can provide it if you want.

 

my query was , which is the best way to encoding non biological data?

Link to comment
Share on other sites

You were asking for algorithm that are used for DNA or AA matches. From your own link you will realize that there the type of information is relevant, specifically for the scoring matrix. Of course, the problem can be generalized to general pattern matching, however, other algorithms or different implementations of the algorithms used for DNA and protein alignment are more useful than that as Arete mentioned.

 

Maybe to illustrate:

compare Amino acid sequence SPT to SPT. Apparently no difference. However, let us say that on the DNA level it is TCT CCT ACT versus TCC CCC ACC. You will note that despite the same AA sequence at least three substitutions happened. The scoring matrix assigns that a specific distance.

Or another example. Assume you have a substitution of S to I on the AA level. If you just measure letter exchange it would be the same as, say C to W. However, in the first case the exchange may have been anything from TCU/T/A/G to ATT/C/A (so at least two substitutions). However from C to W it would be from TGT/C to TGG (so only one substitution needed). The substitution matrices take these possibilities into account (so funnily the Blosum62 had an error in it, but BLAST performed better with the error).

 

However, for an arbitrary string in substitutions do not follow a specific rule (as in this case that of DNA mutations) the distance would be based on a totally different measure, or could be binary.

Edited by CharonY
Link to comment
Share on other sites

You were asking for algorithm that are used for DNA or AA matches. From your own link you will realize that there the type of information is relevant, specifically for the scoring matrix. Of course, the problem can be generalized to general pattern matching, however, other algorithms or different implementations of the algorithms used for DNA and protein alignment are more useful than that as Arete mentioned.

 

Maybe to illustrate:

compare Amino acid sequence SPT to SPT. Apparently no difference. However, let us say that on the DNA level it is TCT CCT ACT versus TCC CCC ACC. You will note that despite the same AA sequence at least three substitutions happened. The scoring matrix assigns that a specific distance.

Or another example. Assume you have a substitution of S to I on the AA level. If you just measure letter exchange it would be the same as, say C to W. However, in the first case the exchange may have been anything from TCU/T/A/G to ATT/C/A (so at least two substitutions). However from C to W it would be from TGT/C to TGG (so only one substitution needed). The substitution matrices take these possibilities into account (so funnily the Blosum62 had an error in it, but BLAST performed better with the error).

 

However, for an arbitrary string in substitutions do not follow a specific rule (as in this case that of DNA mutations) the distance would be based on a totally different measure, or could be binary.

 

 

THANK U VERY MUCH FOR INFORMATION.

I'm begginer in this field, may i do not need to know deeply.

I'm still doubt if need that.

Really, your commnte draw my attantions to important things. I understood your eamples, good example

 

but there is still somthing not clear.

please , be patient with me.

regarding my data( sequence of integers).do u mean that If I have non biological data, I have not to use AA or DNA seq.?

or I can do that.

my data is integers , the language that I used provide tool to convert integers into AA or DNA SEQ. , but I understood from u in this case I will get arbitrary seq.

 

Right? the problem is that authors who used seq. alignment did not point for that. So I'm confused.

please, if u have any link relate seq. alignment in computer science provide me.

tanks

Link to comment
Share on other sites

THANK U VERY MUCH FOR INFORMATION.

I'm begginer in this field, may i do not need to know deeply.

I'm still doubt if need that.

Really, your commnte draw my attantions to important things. I understood your eamples, good example

 

but there is still somthing not clear.

please , be patient with me.

regarding my data( sequence of integers).do u mean that If I have non biological data, I have not to use AA or DNA seq.?

or I can do that.

my data is integers , the language that I used provide tool to convert integers into AA or DNA SEQ. , but I understood from u in this case I will get arbitrary seq.

 

Right? the problem is that authors who used seq. alignment did not point for that. So I'm confused.

please, if u have any link relate seq. alignment in computer science provide me.

tanks

 

 

please , I will concentrate my query.

if the range of my data is more than 4(a,c,g,t) and more than 20(in case of amino acid).

 

can i use this way

let say

if the max values in my data is 11000 = (4^7) this if the base=4(g,t,c,a) or 20^4 if the base =20 in case of amino acid

 

is the composition among symbols is allowed in case of alignment?

 

thanks

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.