# need help understanding data and information abstractly

## Recommended Posts

I was studying some computer science concepts when I realized that I am missing some key fundamental understanding. I am not entirely sure what other ones are, but one concept that I am sure that I am having problems with is information. What exactly is information conceptually? Is it a pattern of behavior? Is it the classification of some signal making it useful to some system?

Once I understand this I believe that I can understand data and thus more built to programming and computer science concepts.

##### Share on other sites
Posted (edited)
9 hours ago, ALine said:

What exactly is information conceptually?

My very short answer: A value of a random variable.

Let's say you receive a symbol "1". If this is the only possible symbol the fact that you received it does not give you information. But if this symbol is one of two possible, "0" and "1", then the reception of symbol "1" may contain information. So having more that one symbol is a requirement, but not sufficient. Lets say you receive the pattern "111111...". The probability of the symbol "1" is 1. Again there is no information. But if random sequences are allowed, for example "00", "01", "10", "11" then we may use these sequences to represent information. So conceptually information can be seen as a value form of a random variable.

The above is an attempt at an extremely short introduction to information theory, which is tied to discrete probability theory. Most important early contributor was Claude E. Shannon and his paper “A Mathematical Theory of Communication”, dealing quantitatively with the concept of “information”. Shannons concepts and the mathematics he used to describe information and to measure information content is a remarkable contribution. I believe it's tricky to find any areas of IT where his work does not contribute.

*) https://en.wikipedia.org/wiki/A_Mathematical_Theory_of_Communication
For an early predecessor of Shannon, working on sinus signals and frequencies, see Hartley: https://en.wikipedia.org/wiki/Ralph_Hartley

Edited by Ghideon

##### Share on other sites

So from a mathematical perspective, is it the probability function itself or a value that the probability function takes "on" analogously? I remember seeing Y and X in a intro to statistics course last semester and seeing that the probability of a given event space is a function from the events space to I believe the probability variables. Not entirely sure.

Also I do not believe that I fully understand what a random variable actually "is" so I'm gonna revisit that real quick, also thank you for answering my question and also having it lead to more questions.

##### Share on other sites

Also how does this concept of information directly relate to that of data in terms of computer science?

##### Share on other sites
Posted (edited)
8 hours ago, ALine said:

Also I do not believe that I fully understand what a random variable actually "is" so I'm gonna revisit that real quick

Here are some lecture notes starting with

-Review of discrete probability theory (5 pages) This may guide you towards what parts of math you wish/need to check.
-Shannon's measure of information (15 pages) kind of "Mathematical relations between probabilities and information content"

I've not checked the details of the pdf but it looks decent; found it by searching for university course material for information theory.

6 hours ago, ALine said:

Also how does this concept of information directly relate to that of data in terms of computer science?

Note: following is based on a mix of formal studies and practical experience from engineering. It may or may not match what an active scientist would say.

Maybe a table of concepts, bottom up*, and corresponding examples of related computer science task or concepts will help:

Overview:

1: Information Theory: Entropy of information. Mathematical foundation.
Practical examples: Theoretical capacity of a network connection. lossy vs lossless data compression, parity checking, error correction

2: Information Representation: What is used to represent data, what does the bits mean at a basic level
Practical examples: Low level protocols. Character sets such as unicode. Concepts; line endings, byte orders

3: Information structure: Data structures usually tied to programming and algorithms
Practical examples: What are hash tables, linked lists, trees, graphs

4: Higher level information organisation: organisation such as databases.
Practical example: SQL database, Data lake

Study:

When I first studied computer science the order was 3, 2, 4 + 1 (4 and 1 was parallel IIRC). Right before 3 was some programming and algorithms was immediately after 3. I guess you could be a good programmer with knowledge of only (3) but knowledge of the other helps.

*) 1-4 are my rough way of ordering the things for this discussion. There are overlaps, there are examples where the order is different and there are many cases where 1-4 would be nested.

Edited by Ghideon

##### Share on other sites

thanks ghideon, really helps!

## Create an account

Register a new account