Jump to content


Senior Members
  • Posts

  • Joined

  • Last visited

Everything posted by huda

  1. thanks all what u said , I know it. I want to know if I have this matrix : 10 2 4 2 5 3 4 3 6 if the diagonal in similarity matrix must be 1 so the matrix will be: 1 2/10 4/10 2/5 1 3/5 4/6 3/6 1 is that what u mean? thanks
  2. No, I do not mean the max scor is zero, but mean if I have to fill diagonal with elements it must be the max score that result from comparison the elment with itself. Now, u mean similarity is 1-distance , this is why be 1. Please, if u have any reference about what you said indicate for it. thanks
  3. why the diagonal must be 1? if do alignment sequence and compare the object with itself , it get max score because there is fully matching. please, clarify that. thanks
  4. hi, what are the numbers have to bee in similarity matrix? is it be the the large real vaule of comparison the element with itself? thanks
  5. hi, can I find code in matlab of agglomerative clustering method? thanks
  6. hi, what this distance metric is called that equal ∑ ABS(ai-bi) thanks
  7. huda

    distance metric

    what I have to do now?
  8. hi, If I have similarity matrix that I got it from sequence alignment, where each element of this matrix repersent the rate of similarity .i.e the max score represent mas similarity. say n*n symmetric that show the rate of similarity among n objects . ex. x=[10 3 2 3 6 4 2 4 5]; here obj. 1 is most similar to obj. 2 because the max score except diagonal(10) is 3 and so on. the diagonal here is represent the comparison the object with itsel so has the max value if I used distance metric such euclidean distance to cluster these objects, is it best to set the diagonal to zero, or leave it ? by the way , which the best way to cluster the similarity matrix that result from sequence alignment?
  9. Regarding smith_waterman algorithm(local alignment),the main difference to the Needleman–Wunsch algorithm is that negative scoring matrix cells are set to zero, which renders the (thus positively scoring) local alignments visible
  10. thanks, how can asssign value for match, mismatch, gap sometime gap is given -1, and other time 0, or -2 the same thing for match , sometime is given 2 and i other time 1. how can know the best? when I did sequence alignment, I skipped the comparison the element with itself, so get zeros in diagonal. Is the processing is right? which the best method to do clustering given similarity matrix of sequence alignment? and if similarity matrix has zeros in diagonal , is that has affecting the clustering results?
  11. hi, I applied sequence alignment with dataset of social networks, and I got similarity matrix. for ex. z is similarity matrix with size 3*3 z=[0 2 3; 2 0 4; 3 4 0]; when I did sequence alignment, I skipped the comparison the element with itself, so get zeros in diagonal. Is the processing is right? which the best method to do clustering given similarity matrix of sequence alignment? and if similarity matrix h as zeros in diagonal , is that has affecting the clustering results? thanks in advance
  12. hi, is the score of sequence alignmnet can be negative when alignmnet non biological sequence? Note: whether global or local alignmnet thanks in advance
  13. hi, I have to use newman algorithm in my work in PhD thesis. I have no idea about it. so, I would like to know : is newman algorithm clustering method or it is used after clustering method to remove the edges? thanks
  14. Are there anybody know any paper work particle swarm opt. with local sequence alignment but not global seq. alignmnet . I did not find any paper use pso with local alignment. please, I badly need any information about it. thanks, huda
  15. hi, I'm working in sequence alignment with non biological sequences. I read much about gap penlty, each author deal with gaps differently. how i can decide which policy is best? thanks, huda
  16. hi, I passed the encoding stage and the discussion here is very handy for me. Now plesae, I would like to know in case where the sequences is not iological , is unitary scoring matrix suitable or I have to design new scoring matrix ? thanks
  17. hi, I'm working with sequence alignment with non biological sequences , I want know how design scoring matrix? the code that i used 20 symbols, to cover the range of my data i used 20^4 as follows: for ex. 'arxz', 'gacb',......... thanks in advance
  18. my sequence is discrete thanks
  19. hi, I'm working in sequence alignment. I would like to know how I can determin the scoring matrix? thanks in advance
  20. You want to concatenate these data points into a string and then treat them as you would either a string of unlinked genetic loci or as a single linked gene to compare them, right? yes exactly, but later I knew that is not possible treat my data as single linked gene, ecause I can not find a unique code for each value.May treat it as unlinked genetic is more suitable as: agh, tre, zca.,..... I'm not sure if my analysis is right or not. ok, I will give u more details. I'm intending to find out the relationships among set of users. the data that I got it for this purpose is representing actions of users over time in online forum. so, I have array , each row represents actions of one user over time. I'm trying to make alignment for thier sequence of actions to determine which individuals are more similar to each other,then making clusters and find out communities I'm trying to find out the similarity among users that it result from influence on each other because of social relationships among them in online community. If I used relastic techniques as you mentioned , i can make clusters, but did not find out relationships. I badly need to know your comment thanks
  21. thanks for examples, you wondering about why I insist to encode my data instead of leaving it as raw data. I were intending to use the same existing algorithms that designed for biological sequences instead of beginning from scratch. But, I think it is not possible because of the large domain of my input. In addition , I did not expect that gttcaag for example as an element , I were expecting be part of sequence as DNA or amino acid gttcaaggaagtgcgttcaaagtagatc thanks for clarifying , but I would like to here from u if u have more comments I welcomes any suggestions, advices as long as the intent is help me I hope u read what I wrote to Schrödinger's hat. thanks i think u know what i were intending to do if u read what i wrote above. ok, if the coding can no allow me to use the existing algorithms . I can leave raw data , but in this case the domain of input will be very large, and this is diffcult in design algorithm. so, you suggested to use multiple code to minimize the domain of input. thanks
  22. please , I will concentrate my query. if the range of my data is more than 4(a,c,g,t) and more than 20(in case of amino acid). can i use this way let say if the max values in my data is 11000 = (4^7) this if the base=4(g,t,c,a) or 20^4 if the base =20 in case of amino acid is the composition among symbols is allowed in case of alignment? thanks
  23. thanks Khaled , I know that, but I want to avoid the compositin . I think , but not sure it is possible in case of alignment. look: let say 11000(4^7) (g,t,c,a)-> DNA or 20^4 in case of amino acid for ex. in case of DNA: gttcaag-> code for 100 for example gttcaaa-> code for 90 for example the two codes are roughly alike (the alg. consider it as same seq. ), but in fact they are quite different. this is my prolem, I'm not sure if compositin possile or not . I think in bioinformatic forum can discuss that but someone will move my questions to computer science if I did. many thanks again khaled
  24. Anyway, in all cases I appreciate your suggestions Look, you see my queries it is not logical for PhD student, but I see the coding is the most important stage in my work. Where, the design of algorithm of sequence alignment depends on method of coding. u need ore information aout my work. briefly, I try find out the similarity among group of users in terms of their actions over time in online forums. these actions are represented with integers ,so each user has sequence of integers. Now, I need a way to represent these integers in order to apply seq. alignment algo. you mentioned , that difficult to use unicode. But in this paper used unicode, because the authors have large set of inputs s my situation.Where their inputs are wepages . I hope to know the suggestions thi s paper: "GROUPING WEB ACCESS SEQUENCESUSING SEQUENCE ALIGNMENT METHOD" HUPENDRA S CHORDIA PG Student, Department of Computer Engineering, SSBT, COET, Bambhori Jalgaon, Maharsahtra 4250001, India chordiabs@yahoo.com
  25. I think to apply the sequence alignment operations ,I must be encode my data. Look in this paper "Sociological Methods & Research", setion 3 the data must be first encoded into a set of sequences using a finit alphaet of states. my data is integer with big range , need much charecters to represent each value uniquely , i.e my data is asequence of integers. if you know other information , please tell me
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.