Data Compression and its implementation.

krusty · October 19, 2013

I'm rather stuck in my attempt to try to learn something.

I want to learn how to implement data decoding and encoding for compression purposes primarily for decoding text streams in PDF files. I have read the pdf reference though it doesn't teach me what I'll need to know. I didn't for a minute think it would.

I'm stuck in my understanding of this. In order to decode text streams there are a number of different encoding algorithms used, which can depend on the age of the pdf document. I know that in allot of cases now, for text streams, FlateDecode is common so the open source library zlib can be used for this purpose. There are a number of other different encoding standards such as LZW, Ascii base 85 as well, maybe others (this is just off the top of my head).

I'm told that in order to implement a way of decoding these text streams I will need to learn the maths behind the algorithms used in each case, but I'm not sure why that is so important. If I'm implementing an open source library for encoding/decoding streams then surely it's just a matter of implementing the correct functions to decode the stream. Why would learning the maths behind it be important?

The second part of my question is, are there open source libraries for decoding Ascii encoded streams such as Ascii base 85 encoded streams? I'm clearly having allot of trouble starting out in decoding and encoding compressed streams, so if there's any advice on what to start learning to get the ball rolling it would be greatly appreciated. I do understand some of the different ways things can be encoded, huffman etc, but is there a book out there or something which helps tackle or at least introduce one to actually implementing encoding and decoding libraries in C++ .

Thanks

Sensei · October 19, 2013

Why would learning the maths behind it be important?

If you will call somebody else linkable library function, you won't learn anything, no?

If it's homework, and teacher ordered you to implement compression/decompression without using 3rd party library, you should better listen him..

krusty · October 19, 2013

My School teachers are probably all dead now so it's definitely not homework.

Sensei · October 19, 2013

If you will use 3rd party library, you will learn how to link against linkable (or dynamic) library, not compression/decompression algorithm.

If you want to learn something useful, try making compression/decompression algorithm by yourself.

If it's job ordered by somebody, and you will use 3rd party library, you might break license agreement they want to maintain. Some 3rd party library authors disallow selling software made with them (only share, or only share with open source depending on lib). That's mentioned in their license.

Edited October 20, 2013 by Sensei

krusty · October 20, 2013

If you will use 3rd party library, you will learn how to link against linkable (or dynamic) library, not compression/decompression algorithm.

If you want to learn something useful, try making compression/decompression algorithm by yourself.

If it's job ordered by somebody, and you will use 3rd party library, you might break license agreement they want to maintain. Some 3rd party library authors disallow selling software made with them (only share, or only share with open source depending on lib). That's mentioned in their license.

Hmm. Now I'm really lost ?

I see what you mean about learning about the library though. It also answers why I would not need to completely understand the algorithm in question. This is no commercial job, it's my own little project. I'm just trying to understand more about using open source libraries for this sort of thing.

HalfWit · October 20, 2013

My School teachers are probably all dead now

Jeez I'm not going to read about this in the newspapers I hope!

krusty · October 21, 2013

haha.. Surely there's another way you could have interpreted that..

No, they were just old when they were teaching me and that was a long time ago.

Enthalpy · October 21, 2013

The math behind text compression (hence lossless) uses to be fairly simple, like:

- Observe what symbols or strings of symbols appear more often, and define shorter codes for these.

- Check if a subsequence appears many times in a text, then spell it only once.

What's less obvious:

- If no information is available on the data (say, if you ignore it's a text) then no general method can exist.

- Some people have misused the opportunity to introduce "amounts of information", entropy and the like. You guessed, these are not the ones who invented nor programmed the methods.

As long as you decide to ignore these little useful additions, data compression uses reasonably simple and intuitive maths. Anyway, what you need to program the decompression algorithm is simple.

Sign In

Data Compression and its implementation.

Recommended Posts

krusty

Link to comment

Share on other sites

Sensei

Link to comment

Share on other sites

krusty

Link to comment

Share on other sites

Sensei

Link to comment

Share on other sites

krusty

Link to comment

Share on other sites

HalfWit

Link to comment

Share on other sites

krusty

Link to comment

Share on other sites

Enthalpy

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Important Information