Jump to content

similarity matrix of sequence alignment


huda

Recommended Posts

hi,

 

I applied sequence alignment with dataset of social networks, and I got similarity matrix.

 

for ex.

z is similarity matrix with size 3*3

 

z=[0 2 3;

 

2 0 4;

 

3 4 0];

 

when I did sequence alignment, I skipped the comparison the element with itself, so get zeros in diagonal. Is the processing is right?

 

which the best method to do clustering given similarity matrix of sequence alignment?

 

and if similarity matrix h as zeros in diagonal , is that has affecting the clustering results?

 

 

 

thanks in advance

 

 

 

 

 

Link to comment
Share on other sites

if it's a similarity matrix, the diagonal should be 1s (0s would be a distance matrix).

 

As for clustering, it really depends on your task/goals. Hierarchical clustering is a fairly standard approach, however.

Link to comment
Share on other sites

if it's a similarity matrix, the diagonal should be 1s (0s would be a distance matrix).

 

As for clustering, it really depends on your task/goals. Hierarchical clustering is a fairly standard approach, however.

 

 

 

why the diagonal must be 1? if do alignment sequence and compare the object with itself , it get max score because there is fully matching.

 

please, clarify that.

 

 

 

thanks

Link to comment
Share on other sites

why the diagonal must be 1? if do alignment sequence and compare the object with itself , it get max score because there is fully matching.

 

please, clarify that.

 

 

 

thanks

 

 

For a similarity score, 0 is not at all similar and 1 is identical. The diagonal of a square similarity matrix should, therefore, by 1s. 1 - similarity is distance, if your max score is 0, you are measuring distance, not similarity.

Link to comment
Share on other sites

For a similarity score, 0 is not at all similar and 1 is identical. The diagonal of a square similarity matrix should, therefore, by 1s. 1 - similarity is distance, if your max score is 0, you are measuring distance, not similarity.

 

 

No, I do not mean the max scor is zero, but mean if I have to fill diagonal with elements it must be the max score that result from comparison the elment with itself.

 

Now, u mean similarity is 1-distance , this is why be 1. Please, if u have any reference about what you said indicate for it.

 

 

 

thanks

Link to comment
Share on other sites

No, I do not mean the max scor is zero, but mean if I have to fill diagonal with elements it must be the max score that result from comparison the elment with itself.

 

Now, u mean similarity is 1-distance , this is why be 1. Please, if u have any reference about what you said indicate for it.

 

 

 

thanks

 

Think about distance in Euclidean space. What is the distance from a point to itself? Obviously zero, since its the same point. So, if you're looking at a distance matrix, there should be zeros on the diameter.

 

Now the rest depends on what metric you're using, but if you have zeros on the diameter I suspect you're using a distance metric and not a similarity matrix - which is perfectly fine for clustering applications, but it helps to be clear about what you're doing.

Link to comment
Share on other sites

Think about distance in Euclidean space. What is the distance from a point to itself? Obviously zero, since its the same point. So, if you're looking at a distance matrix, there should be zeros on the diameter.

 

Now the rest depends on what metric you're using, but if you have zeros on the diameter I suspect you're using a distance metric and not a similarity matrix - which is perfectly fine for clustering applications, but it helps to be clear about what you're doing.

 

 

thanks

 

all what u said , I know it.

 

I want to know if I have this matrix :

 

10 2 4

 

2 5 3

 

4 3 6

if the diagonal in similarity matrix must be 1

so the matrix will be:

 

 

1 2/10 4/10

2/5 1 3/5

4/6 3/6 1

 

 

is that what u mean?

 

 

 

thanks

Link to comment
Share on other sites

  • 4 months later...

There's a multitdude of potentially applicable statistical analyses: k means clustering, hierarchial bootstrapped clustering, Bayesian information criterion model based clustering, discriminant fuction of principle components... knowing which one is most appropriate for your data would mean knowing which assumptions best fit it - e.g. bootstrapped clustering would assume that you could apply Euclidean distances to your data.

 

Can you use R?

 

http://www.statmethods.net/advstats/cluster.html

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.