Jump to content

Data Compression and its implementation.


krusty

Recommended Posts

I'm rather stuck in my attempt to try to learn something.

 

I want to learn how to implement data decoding and encoding for compression purposes primarily for decoding text streams in PDF files. I have read the pdf reference though it doesn't teach me what I'll need to know. I didn't for a minute think it would.

 

I'm stuck in my understanding of this. In order to decode text streams there are a number of different encoding algorithms used, which can depend on the age of the pdf document. I know that in allot of cases now, for text streams, FlateDecode is common so the open source library zlib can be used for this purpose. There are a number of other different encoding standards such as LZW, Ascii base 85 as well, maybe others (this is just off the top of my head).

 

I'm told that in order to implement a way of decoding these text streams I will need to learn the maths behind the algorithms used in each case, but I'm not sure why that is so important. If I'm implementing an open source library for encoding/decoding streams then surely it's just a matter of implementing the correct functions to decode the stream. Why would learning the maths behind it be important?

 

The second part of my question is, are there open source libraries for decoding Ascii encoded streams such as Ascii base 85 encoded streams? I'm clearly having allot of trouble starting out in decoding and encoding compressed streams, so if there's any advice on what to start learning to get the ball rolling it would be greatly appreciated. I do understand some of the different ways things can be encoded, huffman etc, but is there a book out there or something which helps tackle or at least introduce one to actually implementing encoding and decoding libraries in C++ .

 

Thanks

Link to comment
Share on other sites

Why would learning the maths behind it be important?

 

If you will call somebody else linkable library function, you won't learn anything, no?

 

If it's homework, and teacher ordered you to implement compression/decompression without using 3rd party library, you should better listen him..

Link to comment
Share on other sites

If you will use 3rd party library, you will learn how to link against linkable (or dynamic) library, not compression/decompression algorithm.

 

If you want to learn something useful, try making compression/decompression algorithm by yourself.

 

If it's job ordered by somebody, and you will use 3rd party library, you might break license agreement they want to maintain. Some 3rd party library authors disallow selling software made with them (only share, or only share with open source depending on lib). That's mentioned in their license.

Edited by Sensei
Link to comment
Share on other sites

If you will use 3rd party library, you will learn how to link against linkable (or dynamic) library, not compression/decompression algorithm.

 

If you want to learn something useful, try making compression/decompression algorithm by yourself.

 

If it's job ordered by somebody, and you will use 3rd party library, you might break license agreement they want to maintain. Some 3rd party library authors disallow selling software made with them (only share, or only share with open source depending on lib). That's mentioned in their license.

 

Hmm. Now I'm really lost ?

 

I see what you mean about learning about the library though. It also answers why I would not need to completely understand the algorithm in question. This is no commercial job, it's my own little project. I'm just trying to understand more about using open source libraries for this sort of thing.

Link to comment
Share on other sites

The math behind text compression (hence lossless) uses to be fairly simple, like:

- Observe what symbols or strings of symbols appear more often, and define shorter codes for these.

- Check if a subsequence appears many times in a text, then spell it only once.

 

What's less obvious:

- If no information is available on the data (say, if you ignore it's a text) then no general method can exist.

- Some people have misused the opportunity to introduce "amounts of information", entropy and the like. You guessed, these are not the ones who invented nor programmed the methods.

 

As long as you decide to ignore these little useful additions, data compression uses reasonably simple and intuitive maths. Anyway, what you need to program the decompression algorithm is simple.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.