Jump to content


  • Content Count

  • Joined

  • Last visited

Community Reputation

0 Neutral

About beaumontfletcher

  • Rank

Profile Information

  • Favorite Area of Science
  1. Success! The formula is confirmed. I wrote my program, which works out probabilities as follows: Given integers T, X, Y and N: 1. Initialise a counter S to 0. 2. Pick X different random integers in the range 1-T. 3. Pick Y different random integers in the range 1-T. 4. Iff the sets picked in steps 2 and 3 have exactly N integers in common, then add 1 to S. 5. Repeat steps 2-4 a million times. 6. The actual probability = (S / 1,000,000). Doing this for a few examples today, I found the actual probability equals the theoretical probability from your formula every time, though some
  2. Doh! Sorry, you're right. I wrote what I wrote, went out and realized while I was out that I had worked out the probability of at least N matches rather than exactly N matches. I expanded your new formula and it comes out the same as the previous one, but I find the new reasoning behind it easier to understand. I will work on my brute force program tomorrow so I can try to calculate these probabilities in an independent way and compare with the formula answers.
  3. Sorry but I think I have disproved the formula. I took a small set T = {a,b,c,d,e} and let X=Y=2 and N=1. The matrix below shows all possible combinations of sets X and Y, with a 1 indicating that they have an element in common and 0 indicating that they do not. There are exactly sevens 1's in each row and column, totalling 70. Since there are 100 cells, the probability is 0.7 whereas the formula gives 0.6
  4. You're right. I fixed T=10 and X=Y=5, and varied N instead. This yields: N p 1 0.099206 2 0.396825 3 0.396825 4 0.099206 5 0.003968 So maybe we're OK after all. The only way to check is for me to go ahead and write my brute force program which will calculate these probabilities by using a random number generator, without using the formula, and see if I get the same values as above. I'll be back.
  5. Hello again. I'm afraid the formula now looks wrong to me, after trying it out in Excel for small values. I set X=Y=5 and N=1 and varied only T. In other words, given a set T, pick two subsets of size 5 and ask what is the probability that they have exactly one element in common. If you use Excel and put T in cell A1, X in cell B1, Y in cell C1 and N in cell D1, then the Excel formula for the probability is: =FACT(B1)*FACT(C1)*FACT(A1-B1)*FACT(A1-C1)/(FACT(A1)*FACT(D1)*FACT(B1-D1)*FACT(C1-D1)*FACT(A1-B1-C1+D1)) The surprising thing is that, at the start, the probability goes up as
  6. I am going to write two software programs, one to calculate these probabilities according to your formula (for big numbers, even Excel can't do factorials). The other program will be to calculate the probability by brute force, i.e. given T, use an industrial-strength random number generator to pick sets X, Y and N, and see if N is common to X and Y. Do this a million times and see what the actual probability comes out as. If this equals the theoretical probability for a varied set of values of X, Y and N, then we can be fairly confident that your formula is correct. I will come back and u
  7. Hello, md65536 (if you're still round). I sat down this weekend and went through your new analysis (for which, thanks again). Firstly, I thought I'd reduce the number of symbols we need by using the same symbol to represent a set and the number of elements in it. So T is both the set of all elements and also the number of those elements. The context should make it obvious which meaning is being used at any point in the analysis. I drew two diagrams, one to show the general case when we have a match, i.e. N is contained in both X and Y; and the general case when we do not have a match,
  8. Sorry, I need a fresh mind to go through this, so I will have to wait till the weekend. Thanks again for the new ideas!
  9. Thanks again for having a go at this! I hope it's been some fun for you. I am not convinced by the reasoning of removing elements from one set but not the other. I think there must be an error because I think we can show that the formula is wrong by looking at the simple case X=Y=T, i.e. my two subsets are the whole set. In this case, it doesn't matter what N is, the probability has to be 1, because whatever element(s) you pick from one subset are bound to be in the other. But your formula for P2 gives 0 if Y=T, so the final probability also comes out to be 0. As I said, I will try the N=1
  10. I think I may try a less general case first, such as N=1 or X=Y, and see if I can work out the formula for that, and hope that it will help me see the general problem in the right way. The second part will have to be done in some such way as the one you suggest, but I haven't found a way to think about that yet.
  11. Thanks again for giving time to this. You are right that I had not taken into account that the remaining elements must not match. What I worked out was actually the probability of "at least N" rather than "exactly N" common members. For my purposes, this will do for now. If I find the formula for the probability of at least N matches, it should be straightforward to work out the formula for exactly N. The XCN formula you mention might be correct (I haven't got my head round it yet) but, if so, it means that my formula was wrong, because if I combine yours with mine, the result is demonstrably
  12. Thanks. I had a similar idea to yours. You're using subsets A and B, having X and Y members respectively. Assume you've picked your first subset and have picked N members from that subset. Now look in the second subset, the one that has Y members. The probability that all N of your members got picked for the second subset as well must be the product of (Y/T), (Y-1 / T-1), (Y-2 / T-2)...and so on, down to (Y-(N-1) / T-(N-1)). This evaluates to: Y!(T-N)! / T!(Y-N)! This formula looks right to me. E.g. if you simplify by setting N=Y, then the formula becomes Y!(T-Y)!/T! which is obvio
  13. I have a problem which I will first state formally and then give some real-life context. Any help appreciated! Suppose I have a set with T members. I randomly choose a subset with X members. Independently of that, someone else randomly chooses a subset with Y members. Without loss of generality, assume X <= Y. Given a number N (N <= X), what is the probability that the two subsets have exactly N members in common? Now for the real-life context. Historians often find multiple copies of very old manuscripts but don't know how they came into existence. One technique they use is to look
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.