Jump to content

ALIGNMENT DNA SEQUENCE


huda

Recommended Posts

hi,

I'm PhD student, working in DNA sequence field , but with non biological data.

my problem is : I don't know which alignment I have to use with my data

global or local alignment

 

what is I must depend on it to decide working with global or local.

anyone need more details about my problem, let me know

if you need sample of my data, I can send it

I do appreciate any advise

 

thanks

Link to comment
Share on other sites

It depends on what you want to see. Local alignments are superior in detecting local similarities and are generally better suited for sequences that are overall dissimilar but with smaller conserved areas. Global alignments are better at aligning larger stretches of sequences, however. Again, it depends a lot on the type of sequence you got (e.g. overall similarity) and what you want to see.

Link to comment
Share on other sites

Well, most of the time you would have some idea how similar they are. If you really have no clue, then I would just try common two common algorithms for each case.

 

I'm very new in this field ,so I have no idea.

 

I do appreciate your help

 

thanks

 

Well, most of the time you would have some idea how similar they are. If you really have no clue, then I would just try common two common algorithms for each case.

 

I'm very new in this field ,so I have no idea.

 

I do appreciate your help

 

thanks

Link to comment
Share on other sites

Ok, do you have two or more sequences and more importantly, what do you want to see?

 

thanks,

I have two sequences, so i have to use pairwise alignment. Right?

 

I'm not sure, but I think that my problem may be solved by using pairwise not multiple sequences.

 

I will talk briefly,

I try to find out the similarity(homology) among set of data to make clusters.

my dataset is from an online forum(social networks).

 

if you need more details or need sample of data , let me know.

Link to comment
Share on other sites

Yes pairwise is appropriate (though not precisely necessary). I am still not sure regarding how the sequence is supposed to look like and what kind of clusters you want to have (or what you mean with cluster for that matter). For starters I would just go for a tool that e.g. uses an implementation of the Smith-Waterman and take it from there. Again, since I am not quite sure what you really want to have I would just hunt down some tools and play around with it.

Link to comment
Share on other sites

thanks,

 

I were waiting your reply , and badly need it.

 

Ok, you need more details about my work.

 

as I said earlier , I have not biological data (dataset from online forum) about users who are subscribers in an online forum.

these data represents actions of users over time. I would like to find clusters of users who are alike in terms of activities(behaviour) over time.

 

You said that the choice the type of alignment depend on data.

so,my last query were :how I can know that my data contains local similarity or global?

 

Did I be clear?

 

many many thanks

 

Yes pairwise is appropriate (though not precisely necessary). I am still not sure regarding how the sequence is supposed to look like and what kind of clusters you want to have (or what you mean with cluster for that matter). For starters I would just go for a tool that e.g. uses an implementation of the Smith-Waterman and take it from there. Again, since I am not quite sure what you really want to have I would just hunt down some tools and play around with it.

 

sorry, I forgot tell you that I converted my data into protein sequence .

I sent thread regarding this topic.

 

My data with range (0-1600), so need 11 bits to represent it as binary.

 

Then , convert it to protein seq.

as long as have 20 amino acid, I took each five bits and convert it to one amino acid.

 

is that representation proper?

Link to comment
Share on other sites

OK, I still do not understand the purpose, however the problem here that I see are the distance matrices. The substitutions in an amino acid are related to the changes on the base level (i.e. it is connected to the genetic code). So an amino acid exchange that only requires one base exchange is treated differently than one that takes to, for instance.

 

Since your amino acid string is based on a completely different system the distance estimations will be off. In fact, I think what you have is a simple computational problem that I am not qualified to solve. Since the string has no biological basis, you cannot apply the same theoretical framework. From what I understand the only reason to call it an amino acid sequence is because you use the same 20 letter code. I am going to move this to the computational science section, maybe someone else can look over that.

Link to comment
Share on other sites

OK, I still do not understand the purpose, however the problem here that I see are the distance matrices. The substitutions in an amino acid are related to the changes on the base level (i.e. it is connected to the genetic code). So an amino acid exchange that only requires one base exchange is treated differently than one that takes to, for instance.

 

Since your amino acid string is based on a completely different system the distance estimations will be off. In fact, I think what you have is a simple computational problem that I am not qualified to solve. Since the string has no biological basis, you cannot apply the same theoretical framework. From what I understand the only reason to call it an amino acid sequence is because you use the same 20 letter code. I am going to move this to the computational science section, maybe someone else can look over that.

 

hi,

sir, please give me chance to give you more details about my work , it is not as you think.

I can not give details in online forum. I need your email. my email is halmamory@yahoo.com

please please,I badly need your advice

 

 

thanks in advance

Link to comment
Share on other sites

Mr. Huda, I've not worked on DNA Alignment problem, but I have experience in algorithms .. if you need any help in forming your algorithms,

 

anyway, you should know that an overall similarity depends on local similarities, and thus you have to plan how your local similarity can lead you to global optima

Link to comment
Share on other sites

  • 3 weeks later...

Mr. Huda, I've not worked on DNA Alignment problem, but I have experience in algorithms .. if you need any help in forming your algorithms,

 

anyway, you should know that an overall similarity depends on local similarities, and thus you have to plan how your local similarity can lead you to global optima

 

I left private message for you

Link to comment
Share on other sites

Problem: DNA Alignment

 

Type: Sequence Alignment -- see Wikipedia

 

Complexity: NP ?

 

Algorithms:

- Heuristic Search

- Linear Optimization

- Genetic Programming

- Probabilistic Methods

- Dynamic Programming

- Global Optimization

 

You have to specify your needs, do you prefer time over quality of solution, or you'd like a slow method that give good results ?.. the size of the DNA database matter too !

 

Based on those answers, you will be able to choose the algorithm that fits ...

Edited by khaled
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.