Jump to content

similarity matrix of sequence alignment

Featured Replies

hi,

 

I applied sequence alignment with dataset of social networks, and I got similarity matrix.

 

for ex.

z is similarity matrix with size 3*3

 

z=[0 2 3;

 

2 0 4;

 

3 4 0];

 

when I did sequence alignment, I skipped the comparison the element with itself, so get zeros in diagonal. Is the processing is right?

 

which the best method to do clustering given similarity matrix of sequence alignment?

 

and if similarity matrix h as zeros in diagonal , is that has affecting the clustering results?

 

 

 

thanks in advance

 

 

 

 

 

if it's a similarity matrix, the diagonal should be 1s (0s would be a distance matrix).

 

As for clustering, it really depends on your task/goals. Hierarchical clustering is a fairly standard approach, however.

  • Author

if it's a similarity matrix, the diagonal should be 1s (0s would be a distance matrix).

 

As for clustering, it really depends on your task/goals. Hierarchical clustering is a fairly standard approach, however.

 

 

 

why the diagonal must be 1? if do alignment sequence and compare the object with itself , it get max score because there is fully matching.

 

please, clarify that.

 

 

 

thanks

why the diagonal must be 1? if do alignment sequence and compare the object with itself , it get max score because there is fully matching.

 

please, clarify that.

 

 

 

thanks

 

 

For a similarity score, 0 is not at all similar and 1 is identical. The diagonal of a square similarity matrix should, therefore, by 1s. 1 - similarity is distance, if your max score is 0, you are measuring distance, not similarity.

  • Author

For a similarity score, 0 is not at all similar and 1 is identical. The diagonal of a square similarity matrix should, therefore, by 1s. 1 - similarity is distance, if your max score is 0, you are measuring distance, not similarity.

 

 

No, I do not mean the max scor is zero, but mean if I have to fill diagonal with elements it must be the max score that result from comparison the elment with itself.

 

Now, u mean similarity is 1-distance , this is why be 1. Please, if u have any reference about what you said indicate for it.

 

 

 

thanks

No, I do not mean the max scor is zero, but mean if I have to fill diagonal with elements it must be the max score that result from comparison the elment with itself.

 

Now, u mean similarity is 1-distance , this is why be 1. Please, if u have any reference about what you said indicate for it.

 

 

 

thanks

 

Think about distance in Euclidean space. What is the distance from a point to itself? Obviously zero, since its the same point. So, if you're looking at a distance matrix, there should be zeros on the diameter.

 

Now the rest depends on what metric you're using, but if you have zeros on the diameter I suspect you're using a distance metric and not a similarity matrix - which is perfectly fine for clustering applications, but it helps to be clear about what you're doing.

  • Author

Think about distance in Euclidean space. What is the distance from a point to itself? Obviously zero, since its the same point. So, if you're looking at a distance matrix, there should be zeros on the diameter.

 

Now the rest depends on what metric you're using, but if you have zeros on the diameter I suspect you're using a distance metric and not a similarity matrix - which is perfectly fine for clustering applications, but it helps to be clear about what you're doing.

 

 

thanks

 

all what u said , I know it.

 

I want to know if I have this matrix :

 

10 2 4

 

2 5 3

 

4 3 6

if the diagonal in similarity matrix must be 1

so the matrix will be:

 

 

1 2/10 4/10

2/5 1 3/5

4/6 3/6 1

 

 

is that what u mean?

 

 

 

thanks

huda - without telling me what distance metric you're using I can't really answer/ find an answer to your question. It's beyond me why you won't be explicit about it.

  • 4 months later...

Dear ,

 

I have a large similarity matrix (8000 X 8000) and I want to cluster the data.

 

Please suggest me some directions .... which one will be th best approach?

 

I am new in the area.

 

Please help & Thanks in advance

There's a multitdude of potentially applicable statistical analyses: k means clustering, hierarchial bootstrapped clustering, Bayesian information criterion model based clustering, discriminant fuction of principle components... knowing which one is most appropriate for your data would mean knowing which assumptions best fit it - e.g. bootstrapped clustering would assume that you could apply Euclidean distances to your data.

 

Can you use R?

 

http://www.statmethods.net/advstats/cluster.html

Archived

This topic is now archived and is closed to further replies.

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.