# How much one's individuality cost? At least 2.77544 bits of information

## Recommended Posts

Imagine there is a population/database/dictionary and we would like to distinguish its elements.

So for each element, let us somehow encode its individual features (e.g. using a hash function) as a bit sequence - the most dense way is to use sequence of uncorrelated P(0)=P(1)=1/2 bits.

We can now create minimal prefix tree required to distinguish these sequences, like in the figure below.

For such ensemble of random trees of given number of leaves ($n$), we can calculate Shannon entropy ($H_n$) - the amount of information it contains.

It turns out that it asymptotically grows with at average 2.77544 bits per element $(\frac{1}{2}+(1+\gamma)\lg(e))$.

The calculations can be found here: http://arxiv.org/abs/1206.4555

Is it the minimal cost of distinguishability/individuality?

How to understand it?

ps. related question: can anyone find $D(n)=\sum_{i=0}^{\infty} 1-(1-2^{-i})^n$ ?

Clarification:

The first thought about this distinctiveness is probably n! combinatorial term while increasing the size of the system, but n! is about the order and its logarithm grows faster than linearly. Distinctiveness is something much more subtle.

It is well seen in equation (10) from the paper (which should like):

$H_n+\lg(n!)=nD_n$

where $D_n$ is the average depth of leaves of such n leaf tree - average amount of bits distinguishing given element.

So this equation says:

distinctiveness/individualism + order = total amount of information

distinctiveness grows linearly with n (2.77544 asymptotic linear coefficient)

information about their ordering grows faster: $\lg(n!)\approx n\lg(n)-n\lg(e)$.

Edited by Duda Jarek

##### Share on other sites

Even individuality is devaluating in our times - to about 2.3327464 bits per element/specimen (thanks to James Dow Allen): http://groups.google.com/forum/#!topic/comp.compression/j7i-eTXR14E

But this is the final minimal price tag - it cannot be reduced further.

Specifically, for the minimal prefix tree, a random sequence (representing individual features of a specimen) has about 0.721 probability of being identified as belonging to the population ... so if we are interested only in distinguishing inside the population, we can afford increasing this probability up to 1.

To reduce the amount of information in the minimal prefix tree, let us observe that if there appears degree 1 node inside the tree, all sequences from the population going through that node will certainly go in the corresponding direction - we can save 1 bit about the information which exactly is this direction.

In standard prefix tree these degree 1 nodes were the place where it could turn out that an outsider does not belong to the population - removing this information would raise false positive probability from 0.721 to 1.

So if for sequences (0101.., 1010.., 1001..),

the minimal prefix tree remembers (without ordering!): (0....., 101..., 100...),

such reduced one remembers only (0....., 1.1..., 1.0...)

What decreases its asymptotic cost from 2.77544 bits/specimen to about 2.332746.

##### Share on other sites

But, to decide if that individual is male or female: young or old and tall or short takes 3 bits, which is more than 2.78...

##### Share on other sites

But, to decide if that individual is male or female: young or old and tall or short takes 3 bits, which is more than 2.78...

This is only a lower boundary - in reality there is used much more.

But maybe there can be found a transition - for example ant's behavior doesn't rather depend from which specimen from its family it is interacting - the entropy/complexity of their society grows let say with log(n) ...

Maybe there are some species where the distingctiguishabily starts and so the complexity start growing linearly with e.g. 2.77544 coefficient.

If there would be e.g. essential ordering of individuals, entropy would grow faster: with log(n!)~nlog(n)

If there would be essential interactions for different pairs, it would grow with n^2

If interactions within larger groups, even with 2^n...

And remember that distinguishness is weaker than ordering: if you would just write these sequences of minimal distinguishing features, it would grow like

H_n + lg(n!) = nD_n

these 3 bits you are talking about is D_n - average number of distinguishing bits - the shortest unique prefix. You need to subtract the ordering information.

Bits carries the most information (are the most distinguishing) if they have probability 0.5 - so e.g. the first bit could say if someone is taller than median height...

##### Share on other sites

If there are two mechanisms to calculate a lower boundary for something and they give different answers then the higher one is a more accurate estimate of that boundary.

While there are many examples where evolution screws up a bit, the fact that our DNA has the equivalent of rather more than 3 bits suggests that we are worth more than 2.8 bits.

##### Share on other sites

Once again, it's the lower boundary - we can approach it in computer science, but in biology I totally agree there is usually used much larger amount.

From the other side ... DNA contains muuuch more more information, but the question is the perspective we look from - I'm talking about e.g. sociobiological level - of interactions ... entropy corresponds to complexity of the society.

Important are not genotypes, but phenotypes - precisely: how other specimens react on them - does the behavior depend on which individual it is?

For bacterias not really ... what are the simplest colonies which distinguish individuals in their population?

##### Share on other sites

3 is still a better estimate for the true value than 2.7 because they are both lower boundaries.

Both estimates are absurdly small, in my case this was deliberate.

##### Share on other sites

3 features of 0/1 type can distinguish maximum 8 entities.

The expected number of features required to distinguish individual (D_n bits) has to grow with approximately lg(n)

More precisely: D_n ~ lg(n) - lg(e)/2n + 1.332746

But while calculating entropy of the whole population, we need subtract information about their order:

H_n = nD_n - lg(n!) ~ n(lg(n) - lg(e)/2n + 1.332746) - nlg(n) + nlg(e) - lg(2pin)/2 ~ 2.77544 n

where we've used Stirling formula.

ps. I've just spotted that the amount of bits per element for the reduced minimal prefix tree seem to have the same decimal digits after coma as the 1.332746 above from the D_n formula.

It suggests that it is

3/2 + gamma*lg(e) ~ 2.3327461772768672 bits/element

Is it just a coincidence or maybe there is some nice and simple interpretation??

##### Share on other sites

i guess i don't quite understand the problem.

let's say we have 7 individual objects, boxes. each box has a distinguishing feature, a binary serial number. how long would the serial number need to be in order to determine which box is which? seems to me you would only need 3 bits.

with 128 boxes, you would need 7 bits. in general, the number of bits necessary is proportional to the number of individuals; lg2 n.

with over 7 billion people on the planet, is seems to me one's individuality costs 33 bits roughly.

##### Share on other sites

Phillip,

If for each box there is practically random sequence representing its individual features, to make it distinguishing it indeed has to be about lg(n) bit length in the perfect binary tree case.

The randomness makes it a bit larger - this average length becomes D_n ~ lg(n) + 1.3327

So the total amount of information of this population would be nD_n ~ nlg(n) ... but we we don't care about the order of these sequences - all their orderings are in the same equivalence class, so the number of possibilities decreases n! times - the total amount of information is decreased by lg(n!) ~ nlg(n):

H_n = nD_n - lg(n!) ~ 2.77544n

The amount of information can be further reduced for degree 1 nodes: if all specimens from the population reaching given node, make the same next step (like all tall blonds are slim in given population) - for these nodes we don't need to store this direction, reducing informational content by 1 bit per each degree 1 node.

While there are n leaves and n-1 degree 2 nodes, it turns out that the expected number of degree 1 nodes is about lg(e/2)~0.44

So after such reduction, the population requires 2.3327 bits/specimen (calculated in last version of the paper).

##### Share on other sites

okay, i think i understand what you are saying.

let's say there are 200 boxes; our population. each box has several features:

size, material, wieght, shape, packing, and wrapping.

each feature can have one of several values.

to accurately represent a box, requires a random string of data; lets say 30 bits long.

with 200 boxes, that's 200*30 bits; plus an error correction, requireing 200*1.3327 more bits.

however, becuase we dont care about the order of the boxes; only thier identification, we can reduce this by

200*lg2 200; and likely are able to reduce it further for any rare or unused features.

makes sense.

##### Share on other sites

Generally, yes.

It's important that these bit sequences are uncorrelated and with P(0)=P(1)=1/2 to make it contain maximum amount of information (it would be interesting to consider different cases...) - for example you ask if value is smaller or larger then median of given group.

The 30bit length can be sometimes not enough - for simplicity assume you start with infinite one, then cut to required lengths - the average length is expected to be about lg(200)+1.33.

So just encoding sequences would take about 200*(lg(200)+1.33) ... but you can subtract lg(200!) bits of information about their order.