similarity matrix of sequence alignment

August 16, 201213 yr

hi,

I applied sequence alignment with dataset of social networks, and I got similarity matrix.

for ex.

z is similarity matrix with size 3*3

z=[0 2 3;

2 0 4;

3 4 0];

when I did sequence alignment, I skipped the comparison the element with itself, so get zeros in diagonal. Is the processing is right?

which the best method to do clustering given similarity matrix of sequence alignment?

and if similarity matrix h as zeros in diagonal , is that has affecting the clustering results?

thanks in advance

August 16, 201213 yr

if it's a similarity matrix, the diagonal should be 1s (0s would be a distance matrix).

As for clustering, it really depends on your task/goals. Hierarchical clustering is a fairly standard approach, however.

August 19, 201213 yr

Author

if it's a similarity matrix, the diagonal should be 1s (0s would be a distance matrix).

As for clustering, it really depends on your task/goals. Hierarchical clustering is a fairly standard approach, however.

why the diagonal must be 1? if do alignment sequence and compare the object with itself , it get max score because there is fully matching.

please, clarify that.

thanks

August 20, 201213 yr

why the diagonal must be 1? if do alignment sequence and compare the object with itself , it get max score because there is fully matching.

please, clarify that.

thanks

For a similarity score, 0 is not at all similar and 1 is identical. The diagonal of a square similarity matrix should, therefore, by 1s. 1 - similarity is distance, if your max score is 0, you are measuring distance, not similarity.

August 20, 201213 yr

Author

For a similarity score, 0 is not at all similar and 1 is identical. The diagonal of a square similarity matrix should, therefore, by 1s. 1 - similarity is distance, if your max score is 0, you are measuring distance, not similarity.

No, I do not mean the max scor is zero, but mean if I have to fill diagonal with elements it must be the max score that result from comparison the elment with itself.

Now, u mean similarity is 1-distance , this is why be 1. Please, if u have any reference about what you said indicate for it.

thanks

August 21, 201213 yr

No, I do not mean the max scor is zero, but mean if I have to fill diagonal with elements it must be the max score that result from comparison the elment with itself.

Now, u mean similarity is 1-distance , this is why be 1. Please, if u have any reference about what you said indicate for it.

thanks

Think about distance in Euclidean space. What is the distance from a point to itself? Obviously zero, since its the same point. So, if you're looking at a distance matrix, there should be zeros on the diameter.

Now the rest depends on what metric you're using, but if you have zeros on the diameter I suspect you're using a distance metric and not a similarity matrix - which is perfectly fine for clustering applications, but it helps to be clear about what you're doing.

August 22, 201213 yr

Author

Think about distance in Euclidean space. What is the distance from a point to itself? Obviously zero, since its the same point. So, if you're looking at a distance matrix, there should be zeros on the diameter.

Now the rest depends on what metric you're using, but if you have zeros on the diameter I suspect you're using a distance metric and not a similarity matrix - which is perfectly fine for clustering applications, but it helps to be clear about what you're doing.

thanks

all what u said , I know it.

I want to know if I have this matrix :

10 2 4

2 5 3

4 3 6

if the diagonal in similarity matrix must be 1

so the matrix will be:

1 2/10 4/10

2/5 1 3/5

4/6 3/6 1

is that what u mean?

thanks

August 23, 201213 yr

huda - without telling me what distance metric you're using I can't really answer/ find an answer to your question. It's beyond me why you won't be explicit about it.

December 24, 201213 yr

Dear ,

I have a large similarity matrix (8000 X 8000) and I want to cluster the data.

Please suggest me some directions .... which one will be th best approach?

I am new in the area.

Please help & Thanks in advance

December 24, 201213 yr

There's a multitdude of potentially applicable statistical analyses: k means clustering, hierarchial bootstrapped clustering, Bayesian information criterion model based clustering, discriminant fuction of principle components... knowing which one is most appropriate for your data would mean knowing which assumptions best fit it - e.g. bootstrapped clustering would assume that you could apply Euclidean distances to your data.

Can you use R?

http://www.statmethods.net/advstats/cluster.html

Sign In

similarity matrix of sequence alignment

Featured Replies

Archived

Important Information

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)