Need help with code (to calculate mutual information)

April 12, 201114 yr

Hi all, I'm writing this code in R and it should be relative straight forward, but I keep running into negative numbers, which MI should NOT have! Not sure if the code is buggy or if I'm getting the math wrong. Am I correct in thinking there should be 1 value for MI between each number in an associated list?

First the pseudocode:

1 initialize parameters
2 Generate 2 lists of random numbers btw 0,1 called x and y
3 cut number list into bins
4 create frequency table for number of counts in each bin
5 create function to calculate marginal distribution of binned x
       for every x
            find the frequency of that value in x
            save as ki
Do the same for y and call it kj
6 create function to calculate joint distribution of x and y
  a. generate frequency table for every x and y combo
     for every x and associated y
          find the frequency of that combination in the data
          call it kij
7 calculate mutual information (mi) 
mi = 0
for each value taken by x
{  sum = 0
  for each value taken by y:
  {  kij = find freq where x and y are associated
     ki = find freq of x
     kj = find freq of y
     sum = sum + kij * log2(kij/(ki*kj))
  }
  mi = mi + sum
}

Here's the actual code in R

[font="Courier New"]#calculate MI of random variables. Notation is taken from Steuer, et al. 2002

#Set sample size & bin number
N <- 30
bins <- 2


#Generate N random "data" points for x and y in the space [0,1]
x <- runif(N, 0, 1)
y <- runif(N, 0, 1)

#Cut x and y into bins
xcut <- cut(x, bins)     #ai
ycut <- cut(y, bins)     #bj

#create frequency table for counts per bin
xfreq <- data.frame(table(xcut))  #table for kis
yfreq <- data.frame(table(ycut))  #table for kjs

#script for reading freq table into kis/kjs
#b/c cutting data into bins leads to weird character types
ki.find <- function(n) {        #### n <- xcut[i] 
   for (k in 1:bins) {
       if (as.character(n) == xfreq$xcut[k]) {
           ki <- xfreq$Freq[k]
           return(ki)
       }
    }
}            

kj.find <- function(m) {        #### n <- ycut[i] 
   for (l in 1:bins) {
       if (as.character(m) == yfreq$ycut[l]) {
           kj <- yfreq$Freq[l]
           return(kj)
       }
   }
}      

#kij is the joint prob distribution, 
#AKA freqency that 2 bins are associated at a (x,y) point 
cut.table <- as.data.frame(table(xcut,ycut))

kij.find <- function(p) {
   for (g in 1:(2*bins)) {
      if (as.character(xcut[p]) == cut.table$xcut[g] && as.character(ycut[p]) == cut.table$ycut[g]) {
           kij <- cut.table$Freq[g]
       }    
   }
   return(kij)
}        

#start with assumption of no MI
#then calculate MI

MI <- 0
for (i in 1:N) {
   sum.temp <- 0
   for (j in 1:N) {
       ki <- ki.find(xcut[j])
       kj <- kj.find(ycut[j])
       kij <- kij.find(j)
       sum.temp <-  sum.temp + (kij * log2((kij/(ki*kj))))
       sum.temp2 <- (sum.temp/N)
       }  
    MI <- c(MI, log2(N) + sum.temp2)   
}         
[/font]

sample output is:

> MI
[1]  82.288187  44.607130  36.576861  28.348042  20.119224  11.890405
[7]   3.661586  -4.567232 -12.796051 -21.024870 -29.253688

Please help me!!!

April 13, 201114 yr

I think if you adjust your last bit of code as follows:


MI <- 0
MI.run <- function() {
   for (i in 1:N) {
       sum.temp <- 0
       for (j in 1:N) {
           ki <- ki.find(xcut[j])
           kj <- kj.find(ycut[j])
           kij <- kij.find(j)
           sum.temp <-  sum.temp + (kij * log2((kij/(ki*kj))))
           sum.temp2 <- (sum.temp/N)
           print(ki)
           print(kj)
           print(kij)
           print(sum.temp)
           print(sum.temp2) }  
        MI <- c(MI, log2(N) + sum.temp2)   
   }
}

and maybe set N<-5 you will quickly see why kij * log2((kij/(ki*kj))) is giving you a negative number :/

April 13, 201114 yr

Author

I think if you adjust your last bit of code as follows:

and maybe set N<-5 you will quickly see why kij * log2((kij/(ki*kj))) is giving you a negative number :/

So it's because i'm taking a log of a fraction < 1, which I guess means i'm doing the calculation incorrectly.

April 13, 201114 yr

You're applying a variation of Mutual Information that involves discrete variables and you are using continuous. Your cut is being made with no break and as such your bins are not all formed the same so if you are looking for the members of two sets that intersect but you are using a top half bottom half index to decide membership there might be some inconsistency. I wasn't really sure what "#b/c cutting data into bins leads to weird character types" meant exactly. Do you have a link/reference to this "#calculate MI of random variables. Notation is taken from Steuer, et al. 2002"?

April 13, 201114 yr

Author

You're applying a variation of Mutual Information that involves discrete variables and you are using continuous. Your cut is being made with no break and as such your bins are not all formed the same so if you are looking for the members of two sets that intersect but you are using a top half bottom half index to decide membership there might be some inconsistency. I wasn't really sure what "#b/c cutting data into bins leads to weird character types" meant exactly. Do you have a link/reference to this "#calculate MI of random variables. Notation is taken from Steuer, et al. 2002"?

http://bioinformatics.oxfordjournals.org/content/18/suppl_2/S231.full.pdf

I'm not sure that it matters the bins are not formed the same. For example if your comparing two lists of different types the only thing that should matter is the probability distribution, not the exact numbers.

In R, the cut function creates a list that looks something that replaces an exact number with a range, like (0.43,0.616], which is not numeric type. I can create a frequency table that counts how many times each range is present in the list.

I have to do a character match to find the associated frequency. in the table, which I think is the marginal distribution (divided by N).

The probability of the intersection is the frequency that both ranges appear at the same time in both lists. Unless I've misunderstood.

Thanks a bunch for your help so far, btw.

April 14, 201114 yr

After reading the paper the idea has formed much clearer in my head. The entire purpose of binning, as you have been, is to discretize the continuous function(which I'm sure you already knew ) so that a discrete model may be applied. What I had said before becomes even more important in that you will only have discrete values if the bins form uniformly. In case you may have thought they were uniform, having directly observed these values myself they are in fact not(sometimes.) This requires a partitioning function as described in the three sections following 'NUMERICAL ESTIMATION' and this would be applied to the break value and correspondingly to the number of breaks produced.

Thanks for the paper, I think I might play with this for a little while!

break: either a vector of cut points or number giving the number of intervals which x is to be cut into.

Edited April 14, 201114 yr by Xittenn

April 14, 201114 yr

Author

After reading the paper the idea has formed much clearer in my head. The entire purpose of binning, as you have been, is to discretize the continuous function(which I'm sure you already knew ) so that a discrete model may be applied. What I had said before becomes even more important in that you will only have discrete values if the bins form uniformly. In case you may have thought they were uniform, having directly observed these values myself they are in fact not(sometimes.) This requires a partitioning function as described in the three sections following 'NUMERICAL ESTIMATION' and this would be applied to the break value and correspondingly to the number of breaks produced.

Thanks for the paper, I think I might play with this for a little while!

break: either a vector of cut points or number giving the number of intervals which x is to be cut into.

ah shit. I hope this is the major problem I've been having. I'm looking for an easy solution in R to cut a vector into equal bins. Histogram does it, but only to plot data. weird.

April 14, 201114 yr

[math] k_i \neq freq_x \; k_i = \frac{freq_x}{N} [/math]

Correct me if I'm wrong but this is in fact the issue, if you hadn't already deduced it. I had already, yesterday, but I wasn't sure what you were doing.

April 14, 201114 yr

Author

[math] k_i \neq freq_x \; k_i = \frac{freq_x}{N} [/math]

Correct me if I'm wrong but this is in fact the issue, if you hadn't already deduced it. I had already, yesterday, but I wasn't sure what you were doing.

Then is equation 17 in the paper incorrect?

April 14, 201114 yr

or better yet

[math] k_i = freq_x \land p(a_i) = \frac{k_i}{N} [/math]

which you weren't doing!

Edited April 14, 201114 yr by Xittenn

April 14, 201114 yr

Author

or better yet

[math] k_i = freq_x \land p(a) = \frac{k_i}{N} [/math]

which you weren't doing!

sorry i think we crossposted, but this just doesn't seem like what's going on in equation 17. He's taken N out of each k term, but adds it into the MI calculation separately.

April 14, 201114 yr

It isn't the same, I think multiple forms are being presented and you are equating them to being the same. Your code looks like it is of the form that is taken from wolfram which is not the form of 17 and where certain definitions seem to have taken different meaning on account of various mathematical assumptions.

My approach on this would be to start back at the beginning from scratch and build up again and to this point of knowledge. Sometimes it takes less time to figure out a problem if you throw away everything you think you know and start from square one. This is just my humble opinion however.

Sorry I can't give you a better explanation, I'm just trying this out for the first time myself. I took interest because it is something I was already working on, but in different areas.

Edited April 14, 201114 yr by Xittenn

April 14, 201114 yr

Author

It isn't the same, I think multiple forms are being presented and you are equating them to being the same. Your code looks like it is of the form that is taken from wolfram which is not the form of 17 and where certain definitions seem to have taken different meaning on account of various mathematical assumptions.

My approach on this would be to start back at the beginning from scratch and build up again and to this point of knowledge. Sometimes it takes less time to figure out a problem if you throw away everything you think you know and start from square one. This is just my humble opinion however.

Sorry I can't give you a better explanation, I'm just trying this out for the first time myself. I took interest because it is something I was already working on, but in different areas.

Ahh. damn I see what's going on a bit better now, and rebinning into equal portions definitely helped.

April 14, 201114 yr

>cross posting there<

Yeah sorry, I see where I went off track .... :/

Edited April 14, 201114 yr by Xittenn

April 14, 201114 yr

Author

>cross posting there<

Yeah sorry, I see where I went off track .... :/

Yeah but you are right that he's estimating the probability distribution differently because he's using a histogram (fixed that problem, btw, my data points are being split up into equal bins now). That assumption probably won't work with real data though.

April 14, 201114 yr

I have two code samples, one fulfills Kullback Entropy equivalent of Mutual Information and the other fulfills the histogram model. The results of each leave me asking the value of the information provided by each value for which each model results.

The problem I am seeing with your original post is again that you have mixed the two concepts together. You say 'fixed' I'm curious how and if we have come to the same conclusion. I have concluded that

for (i in 1:N) {   <------  ******
   sum.temp <- 0
   for (j in 1:N) {   <------  ******
       ki <- ki.find(xcut[j])   <------  ******
       kj <- kj.find(ycut[j])

was the cause of the problem and that in fact kij * log2((kij/(ki*kj))) should be negative and that the sum of ij kij * log2((kij/(ki*kj))) should never be greater than log(N). You should be stepping through the bin divisions not the samples .... :/

note the little note on S233

and modify this appropriately


if ( ( as.character( ycut[i] ) == cut.table$xcut[g] ) && ( as.character( ycut[j] ) == cut.table$ycut[g] ) ) {

Edited April 14, 201114 yr by Xittenn

April 14, 201114 yr

I hope you don't mind I posted my results in my blog .... w/ credits

April 14, 201114 yr

Author

I hope you don't mind I posted my results in my blog .... w/ credits

link?

edit: sorry. I'm an idiot. you meant your SFN blog.

April 14, 201114 yr

It's my SFN blog . ...

Sign In

Need help with code (to calculate mutual information)

Featured Replies

Archived

Important Information

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)