Jump to content

Compression slow but good


fredreload

Recommended Posts

And here it is, the long waited algorithm thanks to Staphen that provided the algorithm in Crunchyroll. You are free to test it out on bigger files, currently it only runs for Blank.gif without going too slow. You are free to criticize only if you found a way to make this faster. Starting with iteratedbitcount. Thanks in advance!

 

using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
using System.Numerics;
using System.Threading.Tasks;

namespace HelloWorld
{
class Hello
{
public static byte[] bytes = System.IO.File.ReadAllBytes(@"E:\data\Blank.gif");

public static byte[] bytes_new;

static void Main()
{
BigInteger number_compress = new BigInteger(bytes);

Console.WriteLine(number_compress);
Console.WriteLine();

int[] one = new int[50];
int[] zero = new int[50];
int[] position = new int[50];

for (int i = 0; i < 50; i++)
{
//Console.WriteLine(i);

one = IteratedBitcount(number_compress);
zero = IteratedZerocount(number_compress);
position = one + zero;

BigInteger fac = Factorial(one);
BigInteger n = FindN(number_compress, one, position);

number_compress = n;


}

Console.WriteLine(number_compress);
Console.WriteLine();

BigInteger number_decompress = 0;

for (int i = 50-1; i >= 0; i--)
{
//Console.WriteLine(i);

number_decompress = FindNumber(number_compress, one, position);

number_compress = number_decompress;
}

Console.WriteLine(number_decompress);
Console.WriteLine();

bytes_new = number_decompress.ToByteArray();

File.WriteAllBytes(@"E:\data\New_Blank.gif", bytes_new);

}

static BigInteger FindNumber(BigInteger n, int one, int position)
{
BigInteger number=0;
int k = one;

for (int j = position-1; j >=0 ; j--)
{
BigInteger mult = j;
BigInteger num = 0;

for (BigInteger i = 1; i < (BigInteger)k; i++)
{
mult *= (j - i);
}


num = (mult / Factorial(k));

if (num <= n)
{
number = number + BigInteger.Pow(2, j);
k--;
n = n - num;
}
if (k == 0)
break;
}
return number;
}

static BigInteger FindN(BigInteger number, int one, int position)
{
BigInteger findMSB = number;
BigInteger test = number;
var numBits = (int)Math.Ceiling(BigInteger.Log(2));
BigInteger MSB=1;
while (findMSB > 0)
{
MSB <<= 1;
findMSB >>= 1;
}
MSB >>= 1;

BigInteger n = 0;
int k = one;
int pos = position;

while (MSB != 0)
{
if ((test & MSB) > 0)
{
int a = pos-1;
BigInteger mult = a;

for (BigInteger i = 1; i < (BigInteger)k; i++)
{
mult *= (a - i);
}


n += (mult/Factorial(k));
k--;

}
MSB >>= 1;
pos--;

}
return n;
}

static int IteratedBitcount(BigInteger n)
{
BigInteger test = n;
int count = 0;

while (test != 0)
{
if ((test & 1) == 1)
{
count++;
}
test >>= 1;
}
return count;
}

static int IteratedZerocount(BigInteger n)
{
BigInteger test = n;
int count = 0;

while (test != 0)
{
if ((test & 1)==0)
{
count++;
}
test >>= 1;
}
return count;
}

static BigInteger Factorial(int arg)
{
BigInteger value = 1;
for (int index = 2; index <= arg; index++)
{
//Console.WriteLine(index);
value *= index;
}
return value;
}


}
}

Link to comment
Share on other sites

You are free to test it out on bigger files, currently it only runs for Blank.gif without going too slow.

 

 

What language is this written in?

 

How large is "blank.gif"?

 

Have you checked that the output of compressing and then decompressing the file is identical to the input?

 

Have you tested it with other files of a similar size (e.g. a sequence of random bytes)?

 

 

 

You are free to criticize only if you found a way to make this faster.

 

Not if we find it doesn't work? :)

Link to comment
Share on other sites

Again, and again, the same..

You have no idea whether it truly works, as you have no decompression function...

Writing off data to file, disallows you checking whether every single byte is equal to original (after compressing, then decompressing data).

 

Result from this line of code is nowhere used, as fac is nowhere else referenced:

BigInteger fac = Factorial(one);

 

What language is this written in?

C# .NET Framework Edited by Sensei
Link to comment
Share on other sites

 

 

What language is this written in?

 

How large is "blank.gif"?

 

Have you checked that the output of compressing and then decompressing the file is identical to the input?

 

Have you tested it with other files of a similar size (e.g. a sequence of random bytes)?

 

 

Not if we find it doesn't work? :)

Yes the output is identical to the input Strange :D. Gonna try optimizing this weekend, let me know if you can help lol

Again, and again, the same..

You have no idea whether it truly works, as you have no decompression function...

Writing off data to file, disallows you checking whether every single byte is equal to original (after compressing, then decompressing data).

 

Result from this line of code is nowhere used, as fac is nowhere else referenced:

BigInteger fac = Factorial(one);

 

C# .NET Framework

Cool, I'll take it out, thanks!

Link to comment
Share on other sites

This step is taking the most amount of my time. My best bet is a multiplication table stored in database, anyone else got any idea? The for loop here is taking too long, no for loop

for (int i = 1; i < k; i++)
{
    mult *= (a - (BigInteger)i);
}
Link to comment
Share on other sites

As far as I can see you are calculating mult = a*(a-1)*...*(a-k+1) in that loop. Special cases like k>a+1 and k<2 aside (consider if they can appear in your code), this is the same as mult=a!/k!. You already have a factorial function in your code: Use it, then optimize it with a lookup-table if needed. Note: You are dividing by another k! later on. You only need to calculate it once, of course.

Link to comment
Share on other sites

I would still like to see some evidence that the algorithm works, and how well it works. What is the size of the inout file? What is the size of the compressed file? How does that change for (a) random data; (b) ordered data; (c ) all zeroes; etc. Is the result of uncompressing the data identical in all cases?

Link to comment
Share on other sites

Let's see, I got a question first, I converted bytes to BigInteger like this

BigInteger number_compress = new BigInteger(bytes.Concat(new byte[] { 0 }).ToArray());

Now the byte array is [378] byte array, as you know byte array is 8 bits each, so that's a total of 3024 bits. The thing is after conversion to number_compress, this BigInteger has 911 base 10 digits, so I'm not sure if BigInteger actually made the value bigger since I can't convert BigInteger to bits in c#. Below is the number.

10511498515079005786001923156952989929760765099887642049490645075407147670074787123453081119964496075798572683908765969937737368112628851278719151548358246532408860460103033076417881911156352613863717173123230737730516444716700786319847260734876596336573683014182274170198577808390323212579797450822781696155303651128091350419064337358263354652969443506081373456328170981909504950042936463577106765595222710454155651057937886003720893239530662179029448821080213785811445808269660735189910921681751787499664128734166678222596180132877088540155530232472178185615057339811168768357276447018187314658464330483545764600882883706505887000156889348127462631364561412835589414227905626001556124920721381180916353004195357239703395912173839712380569892467769276028873585749529029704900277393477308046170568306899216653561856937648186383356499772886802103617644079045242679876437789706198682378675257965584060895470702729
Link to comment
Share on other sites

I used a StringBuilder to append the bits, the compression rate is much better. But I still need to deal with the time = =


As far as I can see you are calculating mult = a*(a-1)*...*(a-k+1) in that loop. Special cases like k>a+1 and k<2 aside (consider if they can appear in your code), this is the same as mult=a!/k!. You already have a factorial function in your code: Use it, then optimize it with a lookup-table if needed. Note: You are dividing by another k! later on. You only need to calculate it once, of course.

Hmm, the thing is it is (a - i), and it could be something like mult = 15*14*13/5!, I'm interested in using a look up table, possibly a database? But I am a newbie at that, how do you think I should start? Other than the look up table, is there other possible methods?


Btw I'm applying the recursive "a" computation I asked here.


As an example of the multiplication

 

BigInteger mult = 274341;

 

for (int j = 1; j < 157594; j++)
{
mult *= (BigInteger)(274341 - j);
}

 

P.S. Right I see it, mult=a!/k!, looks like I need a look up table

Edited by fredreload
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.