Jump to content


Photo
- - - - -

compression on cuda


  • Please log in to reply
9 replies to this topic

#1 karthika

karthika

    Lepton

  • Members
  • 5 posts

Posted 13 September 2012 - 06:42 PM

is compression algorithms on CUDA a good area to doing project as part of mtech course?
please suggest some papers related to this topic.
  • 0

#2 Enthalpy

Enthalpy

    Primate

  • Senior Members
  • 1,979 posts

Posted 16 September 2012 - 01:44 PM

Why not.

But if you want your project to be useful, it should fit normal users, not exotic research:
  • Work on BOTH nVidia AND Amd processors. I wouldn't try a soft that I lose if I change my card.
  • Works on normal graphics cards, not on specialized parallel processors that nobody has.
  • Use standard compression formats like zip, rar, gzip, 7zip...
  • As you won't write a full user interface (that's a different big project) your software should interface with an existing archiver like 7zip, IZArc... 7zip is an open source project, so people there publish their interfaces and may welcome you.
  • Be faster than a four-core Cpu, on which 7zip uses the SSE instructions and hyperthreading on all cores...
Under these conditions, of which I don't see any single one as optional, your compressor would be useful.

Other pieces of software that would benefit from Gpgpu:
  • An antivirus file parsing algorithm. At least one antivirus editor (Kaspersky?) has begun doing it.
  • Cryptography. Less new.
The two latter are harder to interface with an existing software as these tend to be commercial and non-public, so such a project would be rather academic with little usefulness.

One other use, very specialized but badly needed, is a step in the design of lithography masks for integrated circuits production. I believe it's the splitting of patterns into rectangles - sorry it's old story for me. It was a two-weeks task on the biggest computers 20 years ago, and still is today, because computers evolve as slowly as chips do. Costly, and wastes two weeks (often several times) in the time to market, so if you find a good implementation on a many-Gpu machine, the few users will be very interested and angry to interface your implementation in their programme.

Edited by Enthalpy, 16 September 2012 - 01:51 PM.

  • 0

#3 karthika

karthika

    Lepton

  • Members
  • 5 posts

Posted 17 September 2012 - 03:36 PM

I am satisfied with your reply...can you suggest more..
  • 0

#4 Enthalpy

Enthalpy

    Primate

  • Senior Members
  • 1,979 posts

Posted 18 September 2012 - 10:59 PM

Not so easy, because programmes benefit more from subtle algorithms than from computing speed, and subtle algorithms tend to run horribly on a Gpu...

But on such a parallel machine (a Gpu) you could try to implement
  • An inference engine
  • A Prolog interpret (if this is still any fashionable)
as both exhibit a regular uniform parallelism accessible to the machine. It could be less banal than image processing or finite elements.
Though, I don't know Cuda enough to tell if it's the right layer for such a purpose. Maybe you need first your own layer, different from Cuda.

==================================

You might have a look at error-correcting codes (ECC). A Gpu helps a lot decoding them, but... Codes are full of maths. Not complicated for a mathematician :rolleyes:: Galois fields and statistics. Telecom engineers achieve to understand them (...more or less :P); with a software background, you would need to have studied this particular math or have a real passion and ease for it.
http://en.wikipedia....correcting_code

You should first check what is already done on Gpu, because ECC specialists program Digital Signal Processors and are easy with software.

Some decoding tasks that might benefit from a Gpu:
  • Decoding bigger Reed-Solomon codes. This means just solving a big set of linear equations, where linear means in a Galois field instead of float numbers. Please check if the Gpu can efficiently access random table elements, since these computations rely on (Galois field) logarithm tables.
  • Decoding of concatenated codes. As they run many codes in parallel, they may fit a Gpu - but often rely on Galois field computations.
  • Decoding of random codes. These codes are optimized for correction efficiency but have no fast formal decoding algorithm, so a Gpu may enable them.
  • Soft-decoding of block codes. Soft means here that the received bits aren't hard-decided to 1 or 0 prior to error correction, but given a continuous value or a probability - it improves the correction performance. Fast and imperfect algorithms exist and work for "convolutional codes", but algorithms correcting block codes properly would perform a lot better. Here near-brute force algorithms may be best (please let check), comparing the probability of all allowed and credible combinations of error positions; they'd need massive computation on float numbers, which a Gpu would enable. Easier maths!
  • Soft-decoding of random codes. Of course. Even more so because they are high-performance block codes and no good decoding method exists. As soft-decoding looks no more difficult for them than hard-decoding, grasp the opportunity. Easier maths!
  • Soft-decoding of turbo codes. I suppose it exists already. The common description of turbo codes claims bits are 0 or 1 (hard decision) and are flipped when decoding along alternate directions; instead, you could attribute each bit a probability and alter it softly according to the code in each direction, which would bring a brutal improvement to the correction capability. Since turbo codes let many codes run in parallel, they could fit a Gpu.
  • Soft-decoding of turbo codes that combine random codes. Of course.
I suppose these nearly-brute force methods will stay rather slow even on a Gpu, so the typical use wouldn't be fiber optics transmissions, but rather deep space crafts. Improving data comms for mythical probes like Voyager or Pioneer would of course be fascinating, knowing that the Square Kilometer Array is under construction... Or for planned probes, sure.
http://en.wikipedia....wiki/Pioneer_10 and http://en.wikipedia....wiki/Pioneer_11
http://en.wikipedia..../Voyager_1 and http://en.wikipedia.org/wiki/Voyager_2
http://en.wikipedia....ce_Network and http://en.wikipedia....Kilometer_Array

Marc Schaefer, aka Enthalpy

==================================

Some big science projects may benefit from Gpu and still not use them... Perhaps!
  • The LHC lets a huge Internet grid of remote computers analyze and store the data collected by the detectors. I suspect it runs on the Cpu of these computers. Use their Gpu, if it's fast, and if the task fits it?
  • LHC's programmed successors?
  • The Square Kilometer Array needs processing power and is to rely on a grid of remote computers.
  • Neutrino detectors, dark matter detectors, gravity waves detectors...
  • DNA sequencing
Smaller science or engineering:
  • Compute the shape of a molecule, both from force fields or from first principles
  • Compute the arrangement of (stiff...) molecules in a solid
  • Compute band diagrams in crystals
In all these applications, first check what is already done on a Gpu (protein folding, Seti and many more are) and what the people involved desire. Also, as nearly all run on Linux, Cuda may not be the proper layer... A Linux equivalent, or maybe OpenGL.
  • 0

#5 Enthalpy

Enthalpy

    Primate

  • Senior Members
  • 1,979 posts

Posted 19 September 2012 - 02:08 PM

For the inference engine and the Prolog interpret (or interpreter), you could try to take the opinion of Yann Lecun. He speaks English and worked at New York university last time I heard from him. He was previously involved in artificial intelligence and special hardware.

He might well suggest to program a neural network on the Gpu (he worked on machine learning by gradient back-propagation). I just fear this is banal.

-----

For the error-correction codes, if you can associate with a telecom engineer, then programming the soft-decoding of block codes is accessible to a software engineer.
  • 0

#6 jimmyhelu

jimmyhelu

    Quark

  • Members
  • 12 posts

Posted 23 September 2012 - 01:40 PM

Typically compression algorithms cannot make use of parallel tasks, it is not easy to make the algorithms highly parallelizeable. In your examples, TAR is not a compression algorithm, and the only algorithm that might be highly parallelizeable is BZIP because it is a block compression algorithm. Each block can be compressed separately, but this would require lots and lots of memory. LZMA does not work in parallel either, when you see 7zip using multiple threads this is because 7zip splits the data stream into 2 different streams that each are compressed with LZMA in a separate thread, so the compression algorithm itself is not paralllel. This splitting only works when the data permits this.

<br class="Apple-interchange-newline">
  • 0

#7 Enthalpy

Enthalpy

    Primate

  • Senior Members
  • 1,979 posts

Posted 23 September 2012 - 03:04 PM

A much simpler project now. I gratefully use a file navigator called FreeCommander which can compare two folders in order to synchronize their contents (and does much more). It tells: these files are identical, others differ, these are missing - and then: which ones do you want to update or copy.

I misuse it frequently on huge folders, like collections of graphics drivers, and then it gets slow and the CPU is the bottleneck, not the SSD. Easy to improve with a GPU.

Even better: I often rename folders or reorganize them, which makes the answer by FreeCommander little useable. I wish it answers "this file was just renamed" or "this subfolder was moved there". It would need to compute hashes of the files to later compare many ones more quickly, but FreeCommander is slower at computing hashes. If you define your hash algorithm properly, it can be parallel even on one file, and there are several files anyway, so a GPU can improve that.

Note: comparing just the very beginning of the files even in a first step won't suffice, since many ones begin with "I'm a Jpeg picture".
Note: the hash function must give a unique value but needs no cryptographic strength.
Note: on any nVidia or Amd graphics card, not an exotic board - but you might restrict to dX10 hardware and above.
Note: you can compute independent hash value for each 512B or 4kiB block and leave the CPU compare the hash. Flexibler for the CPU which can cut unnecessary work, more parallel for the GPU.

Many similar navigators and synchronizers exist, FreeCommander is just the one I use. Its author is a nice guy and speaks many languages, so if you develop the file comparison or hash on GPU and take contact with him he might integrate it in FreeCommander.

Marc Schaefer, aka Enthalpy

Edited by Enthalpy, 23 September 2012 - 03:11 PM.

  • 0

#8 Enthalpy

Enthalpy

    Primate

  • Senior Members
  • 1,979 posts

Posted 23 September 2012 - 09:41 PM

Alas! A hash value for such purpose is too easy to compute, and already the CPU is faster than any disk. That way :

for (a = successive 16-bit slices in a cluster) {
f = float32 (a);
s1 = f + 0.9990*s1;
s2 = f + 0.9991*s2;
s3 = f + 0.9992*s3;
s4 = f + 0.9994*s4;
}

The four float32 s1, s2, s3, s4 make the hash value of one 4096B cluster. Good enough to compare files.

A GPU with 128 MAC at 1500MHz would have hashed files at 96GB/s but the video Ram limits to some 50GB/s and a Pci-E 2.0 *16 to 8GB/s...
One SSE crunches a slice per clock, so a 3GHz quad-core hashes 24GB/s but the Ram limits...
And a fast SSD limits to 0.5GB/s, blistering barnacles!

So while I'm still interested by the added functions on file explorers like FreeCommander, the task is too easy for a GPU. Obviously, any O(n) computation will stumble at data throughput.

Back to the error-correcting codes then...
Marc Schaefer, aka Enthalpy

Edited by Enthalpy, 23 September 2012 - 09:46 PM.

  • 0

#9 Enthalpy

Enthalpy

    Primate

  • Senior Members
  • 1,979 posts

Posted 26 September 2012 - 11:46 AM

I suggested four 32-bits floats to compute hash values quickly on all GPU, but if using the SSE, MMX or a wide FPU, two 64-bits floats are as good or better. Convert 32-bits slices of the files to be hashed then.
  • 0

#10 karthika

karthika

    Lepton

  • Members
  • 5 posts

Posted 1 February 2013 - 08:03 AM

comparison of two algorithms using CUDA is better approach for research
project.then how can i implement this?should i implement all algorithms?



can i proceed with BZIP on CUDA?then please give the details how to do this as a thesis project.buying GPU is  feasible

please give the procedure and approximate cost of the processor.



LZW compression algorithm on CUDA already implemented?is it feasible?any improovent than with cpu ?


  • 0




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users