Jump to content

Entropy and Information


Fred56

Recommended Posts

When I learned about Shannon's theories I was intrigued by the concept of information having entropy.

 

The explanation, I recall, went something like: A message contains information, or has information content. This content may describe an expected, or an unexpected, event. An unexpected message has more information content than an expected one. This content is said to be a measure of the entropy of the information (Shannon entropy).

 

A message is both a real (i.e. encoded in binary) piece of information, and also any event (like say a supernova, or the decay of a particle, or a change in kinetic energy, ...) which is encodable.

 

It is fairly intuitive that unexpected information has more content than expected information. For example, if someone woke you up in the morning and said: "The sun came up", or: "Breakfast will be ready soon", this is a lot less information than if they said: "The house is on fire", or: "There's a spaceship parked on the front lawn", for instance.

 

Entropy of a particular message is measured by the number of bits used to encode it (and I know the math works, because I've "done" it).

 

Thermodynamic entropy is a measure of change, specifically change in heat content, and is a statistical phenomenon. It also has to do with a much larger field of study called ergodics, which deals with how a system changes over time (often a large interval of time), and with measuring group behaviour, among other things.

 

The units of thermodynamic or classical entropy are units of energy per degree of temperature (joules per degree K). The units of information entropy are bits per message.

 

So just how are thermodynamic and information entropy related to each other? Are they the same thing (one uses physical units, the other uses dimensionless bits)?. I have not really ever been able to reconcile this.

 

Can anyone point out the blindingly obvious (which I must have missed)?

Link to comment
Share on other sites

Great reflections, I think there is probably no end to how far this can be discussed. I guess many others will response but but some first comments.

 

A simple answer might be to say that thermodynamic entropy and shannon entropy are different things, which they are... but that is a too simple and unsatisfactory answer because they have obvious relations which are interesting which the curious mind can't dismiss as conincidence :)

 

One can also discus the concept of information and entropy all on it's own. Shannon entropy is in fact far from the only measure of "information". There are different versions, sometimes one can assing desirable properties axiomatic status and prove from that one particular entropy definition is singled out. But then again, those axioms can be chosen differently. Usually information are defined in terms of probability, or in terms of combinatorics based on microstructures, which are loosely related too. So the concept of information itself is at least in physics not so unambigous as one might think.

 

Except for the factor of boltzmanns constant, thermodynamic entropy is pretty much at least IMO "in the spirit of shannon entropy" if one defines a probability space from the microstructure and takes N -> infinity, then there is a relation between shannons formula and the number of distinguishable microstates of the chose microstructure.

 

There are several philosophical problems with this that may suggest that shannons measure isn't universally the best one. For example when you try to do relativistic thermodynamics. The large N is one issue, but also a serious problem is how to select a microstructure where each state is a priori "equally likely". This is not a trivial, especially if you try to understand this in a context where you want all constructs to be induced from real observations. Taking arbitrary prior distributions into account, one is led to various relative entropies (K-L divergence, or information divergence), which is sort of an update version based on conditional probabilities. The getting rid of the "large N" thinking will I think have even more profound implications, connecting to change time and possibly also energy and mass.

 

I think there is alot around this that to my knowledge nonone has satisfactory answers too, and people has different "effective methods" to handle this issue.

 

I personally think the revolution in physics will related to information and information processing, and that sooner or later we will reveal the fundamental connection to the laws of physics and the laws of communicating and information processing observers.

 

/Fredrik

 

When I learned about Shannon's theories I was intrigued by the concept of information having entropy.

 

I think the next intriguing this is the information probably have mass/energy too, but that only makes sense in a context where there is changes.

 

I do not have the answers but I think this is a very good focus.

 

A extremely simple first hand association is inertial mass ~ confidence of your prior probability distribution. When new information arrives, you need to make a decisions if, and how, to update your prior. Clearly one fundamental element is to assign relative confidences to your prior and the new possibly conflicting information. This allows for a very simple yet profound possible link to physical intertia.

 

This is something I started thinking about again since this spring, and it's quite exciting. It's way too fascinating and associative to be a unlikely conincidence.

 

Unfortunately, and also strangely it is hard to see much papers that treats this from a fundamental viewpoint. There are many other approaches that tangents this, but from a completely different angle - coming from withing other major approaches. I think there is need for a clean revision of this from more first hand principles. I think ideas from different fields are healthy.

 

/Fredrik

 

A extremely simple first hand association is inertial mass ~ confidence of your prior probability distribution. When new information arrives, you need to make a decisions if, and how, to update your prior.

 

One of the implications of my previous "issues" is that consistency requires that sometimes not only the prior is updates, the probability space itself that the prior "lives in" can also change! All this can be viewed from pure information processing reasonings. This makes it very complicated and all the more fascinating.

 

/Fredrik

 

Another way of seeing the latter is that the microstructure itself - which is used to define information in the first place, is itself uncertain! In the shannon case, we really make an assumption... we select a microstructure (formally at will, although of course there are good grounds for it, but they are not perfect) and fix it - this does not make sene in the general case IMO.

 

/Fredrik

Link to comment
Share on other sites

OK I didn't really mean to get into ergodics so much, either. Rather look at the fundamentals of what information is and what entropy is. Shannon was apparently against calling it information (not sure why) but there is an argument that information reduces uncertainty, and so entropy. Where does that go? Information, which "has" entropy, can reduce entropy, or something. Negative entropy? I think there's something wrong with that argument.

Link to comment
Share on other sites

I don't know your background of the questions, I'm only guessing...

 

Here are some quite PERSONAL comments... that is strongly colour by my own perspective... which may not be relevant to your questions... You decide for yourself.

 

Rather look at the fundamentals of what information is and what entropy is. Shannon was apparently against calling it information (not sure why) but there is an argument that information reduces uncertainty, and so entropy.

 

In the usual notation entropy and information are related in the sense that entropy of something is a measure of the observers/receivers prior missing information about this. But one can also say that the entropy is a measure of the quality of information relative to your prior knowledge.

 

IMO, the philosophy of information has strong similarities to the philosophy of probability theory.

 

Certain branches to which I belong, argues that all probabilities are relative, in consistency with this I also think that all information is relative. When one tries to define an absolute(non-relative) measure of information (like the shannon entropy) that in fact contains a hidden implicit condition. That's the selection of the microstructure and the ergodic hypothesis that there exists an unique absolute prior or symmetry.

 

The the shannon entropy is in fact conditional on these assumptions, the the absolute appearance is IMO deceptive.

 

IMO, it makes little sense to make a fundamental dicussion or reflection on what is information and what is entropy without considering this things, becase this is where the concepts are ultimately rooted.

 

So one can take information as a measure of information (or missing information, depending on how you see it).

 

But still, what information is, is not unambigous so is entropy. There are different definitions of entropy, depending on what propertis you want it to have - what are you going to do with the property?

 

Instead of the shannon entropy there is the relative entropy (also called KL-divergence) which is a measure of information relative to a prior. There are three different entropies which are related.

 

[math]

S_{cross} = S_{KL} + S_{shannon}

[/math]

[math]

S_{KL} >= 0

[/math]

 

By some simple toying with basic combinatorics I found that one can find the KL-divergence related to the expected probability P to make a certain relative frequencey observation... it's something like this from the top of my head if Iremember correct.

 

[math]

<P> = w e^{-S_{KL}}

[/math]

 

The the number of degrees of freedom of hte microstructure goes to infinity, w -> 1.

 

that information reduces uncertainty, and so entropy.

 

Entropy is a measure of missing information about a message relative to the receiver/observer (=a measure of NEW information IN the message).

 

Thus, the entropy of the message is clearly generally reduced if the receivers or observers prior information in general is larger.

 

I'm not sure if I got your point though? Like I said originally I think one can make many different reflections on this, from different angles depending on your purpose.

 

[math]

<P> = w e^{-S_{KL}}

[/math]

 

In this elaboration, P is is the expected probability to observe an unlikely message. The problem is that the probability is not exact, it's only an expected probability. But this can be expanded into an algorithm. This is how the information divergence is intuitively related to the "probability" of seeing a particular message. Unexpected messages are more unlikely to be seen, and thus have larger relative entropy.

 

To elaborate that consider the following situation that the best estimate you have for hte probability given no other info, is the relative frequency as registered from the experience. Consider this to be your prior. Then ask what is the estimated probability that you will find an fixed size message, assuming your prior is constant. Then one can apply the multinomial distribution and to find the estimated probability,which gives the formula above. Basically an estimate of the probability to observer a certain "probability distribution" with a certain confidence level. To observer an unlikely "message" with low confidence, is quite likely, but to observer a unlikely message with high confidence is simply unlikely. It sounds like a play with words, but I found that toying with the combinatorics of relative frequencies gives some intuitive understanding on some of these concepts.

 

But in reality, the prior might be dynamically updated if the receiver/observer will respond(change) in response to the new info, so the prior is I think dynamical. That's at least how I see it. But this is an area where there is seemingly debate and there is probably no universal answer that everyone currently agrees upon and it gets complicated when you consider this because the notion of change makes things complex, and then comes time.

 

I think most of the information concepts are abstractions one way or the other, and the question is what abstraction is most efficient for the quest in question.

 

I'm sorry if it's unstructured. I'm working on some papers but nothing is done yet. It's just meant as inspiration and hints.

 

/Fredrik

Link to comment
Share on other sites

Hm. You might be at a bit higher level than me on all this theory. I have only a first degree and studied Information Theory as part of a communications course.

So this has already gone into territory that I'm not at home in, but I can deal with calculus etc.. What I'm confused about is how information can have entropy and reduce entropy too. Doesn't this require negative entropy, because I learned that entropy can only be zero or positive?

But the point I suppose is that terminology can be deceptive in itself, mostly because of the different meanings of words like "information", "communication", and so on. I have done a bit more looking and Wikipedia says that Shannon wanted to call it information uncertainty (and this is still being suggested).

Link to comment
Share on other sites

What I'm confused about is how information can have entropy and reduce entropy too. Doesn't this require negative entropy, because I learned that entropy can only be zero or positive?

 

I'm not sure I exactly follow why you want negative entropy? In the shannon definition he considers probabilities of distinguishable events, and asks if we know these probabilities (this is our information) how uncertain are we of the true microstate?

 

To answer this question one can define the shannon entropy, which is devised to be a measure of our uncertainty of the microstructure.

 

[math]S = - \sum_i p_i ln p_i[/math]

Shannon argues how to get this expression up to a constant in his original paper: http://plan9.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf

 

The link to statistical physics is that the microstate is the motion of the billions of billions of individual molecules spinning, moving etc, and the macrostate (pressure, temperature etc) yields a probability distribution over the microstates.

 

This means that if the entropy of our information (probability distribution) is high, our _uncertainty_ about the true microstructure is high.

 

So high entropy is a measure of missing information or uncertainty of the underlying state of the microstructure.

 

But shannon entropy does not consider uncertainty in the microstructure itself, just it's state. I guess my original posts went too far in one go.

 

alot of missing information about microstructure

~=

alot of uncertainty in the information

~=

high entropy

 

low entropy means that the information gives very high certainty about the microstructure.

 

> how information can have entropy and reduce entropy too

 

If we take information to mean the probability distribution - information about the microstructure (or distinguishable events like shannon called it).

 

Then this information have an entropy as per

[math]S = - \sum_i p_i ln p_i[/math]

 

To avoid confusion perhaps in this case, calling entropy uncertainty of information rather than information is actually better.

 

Information have entropy and if hte uncertainty in the information is lower, the entropy is lower.

 

I have done a bit more looking and Wikipedia says that Shannon wanted to call it information uncertainty (and this is still being suggested).

 

Suppose we have a set of possible events whose probabilities of occurrence are p1; p2; : : : ; pn. These probabilities are known but that is all we know concerning which event will occur. Can we find a measure of how much “choice” is involved in the selection of the event or of how uncertain we are of the outcome?

 

 

This is shannons question, and his answer to what this measure is, is the shannon entropy.

 

With information uncertainty he means, HOW much "information" or "knowledge" about the microstructure is MISSING, if we only know the probabilities of the distinguishable microstates? This is the same as to ask what is the uncertainty in this information.

 

If the entropy is high, the probability distribution has many many possible combinations of microstats that gives the SAME probability distribution, this is why we are then "less certain" about the true microstate.

 

I'm not sure if that made it clearer?

 

/Fredrik

 

Mmm. I think I see your confusion... a suggestion

 

Perhaps a better wording is to call the probability distribution information about the microstructure.

 

If we are given a probability distribution about the distinguishable microstates, that clearly provides us with (using english) "information" about the microsstructure, right? We can also consider this probability distribution to be a message sent to us.

 

Now we wonder, what is the quality of this information? i.e how much uncertainty is in this information? ie. how much information is still MISSING to have complete knowledge of all the microstates?

 

A measure of that, is the entropy. So I usually think of entropy as a measure of missing information/knowledge about the microstructure.

 

Most of my early comments attempted to be elaborations and extensions to this, and I suspect I instead only messed it up because I didn't quite get your question.

 

/Fredrik

Link to comment
Share on other sites

No, thanks for all the input. It has been interesting but I simply didn't get the depth needed, I suppose, from that course I did because we only did simple examples, and the concept of information entropy wasn't explored much except towards the encoding and so on, side of things. But it was a CS, rather than math or physics, class.

Link to comment
Share on other sites

Entropy will create disorder and will absorb energy, while enthalpy will create order and release energy. If we expand a gas into higher entropy it gets colder, because it is absorbing the energy. If we compress it into order or enthalpy, it gets hotter, as it releases energy.

 

As an example of these two opposite information affects, in the middle ages, scientist would marvel at the rock and the feather falling down to the earth. Sometimes, big things fell faster and sometimes small things. This is high entropy information, with total disorder. This entropy info absorbs mental energy leading to all types of theories and speculations. Newton came along and lowered entropy (disorder) with rational enthalpy, causing all this apparent chaos to crystalize out in a simple relationship. It lowered mental energy and so everyone stopped speculating.

 

High entropy brings new information into awareness that can improve our understanding of things we never thought of before. Eventually, this will trigger the enthalpy affect and an energy lowering compression should occur. That is where people should be marvelling, the entropy is the high energy precursor that eventually leads to a stable compact lower enthapy state. Entropy is the mental arousal, but enthapy is the mental climax. One gets to smoke a cigarrette and finally go to sleep.

 

Before Newton, they would have loved the modern theories of random, chaos, statitisical, etc., because these would have allowed pigs to fly on certain occations. This would be allowable if everyone believed in the power of disorder, with enthalpy and rational order, a minor affect. Luckily Newton nipped this in the bud with rational enthalpy. We have this same affect today. But few realize, the entropy is nothing but a precursor state, that is suppose to lead to some type of enthalpy climax.

 

If you look at a diamond, the value is in its enthalpy perfection and not it entropy flaws. The flaws tell use little about the essense of the diamond. But because, they tell still us something, and entropy will absorb energy, the endothermic increases our mental energy for mightly speculations. It could be the entropy buzz, that makes knowledge entropy fun. We have philosophically turned entropy information into high energy mental viagra, so we can go on and on, to avoid reaching a mental entropy climax.

 

It reminds me of a mental entropy joke. After making love, the hottie, all exhausted, turns to her partner and asks him", What is you name?". He proudly says, "I am Thor!". She says, "you think you're thor, I'm the one who is thor (sore)." All little more enthalpy would have been considerate.

Link to comment
Share on other sites

Entropy will create disorder

 

No, entropy is disorder.

 

This entropy info absorbs mental energy leading to all types of theories and speculations.

 

How? By what mechanism? And how does it lead to theories and speculations (which are mental processes, and physiological, so require energy).

 

Sorry, the rest of your post is quite difficult for me. Possibly even a bit impenetrable. Perhaps you could elaborate.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.