View Single Post
  #43   Spotlight this post!  
Unread 08-09-2004, 18:37
ThomasTuttle ThomasTuttle is offline
2004 Beantown Blitz Scorekeeper 1
#0125 (NU-TRONS)
Team Role: Student
 
Join Date: Jan 2003
Location: Boston, MA
Posts: 19
ThomasTuttle is an unknown quantity at this point
Send a message via AIM to ThomasTuttle Send a message via MSN to ThomasTuttle Send a message via Yahoo to ThomasTuttle
Re: New compression method

Quote:
Originally Posted by Aalfabob
See, most of these claims on great compression ratios being made are by supporting a fact that you will always gain a compression down to their so called limit. I have my therom set up so that it has an advantage of gaining compression. This does not mean that it cant gain size either, it just simply means that in the long run the file will get small. And since my therom includes using headers to define what has happened to the piece of data, it has to have a limit because the headers take up space and the data the headers describe take up space.

These headers are 2 bytes each and are just showing predefined info that the compiler already knows. This makes that part 100% reversable. Next the compressed bytes are either 7-bit, 8-bit, or 9-bit. it defines a 7 bit variable with a start of 1 then 6 bits, an 8 with 00 then 6 bits and a 9 with a 01 then 7 bits. As you can see this makes up all 256 values of the asiic set and also is easly reversable. That is a pretty big part in my compression on how it uncompresses. Thats really all i think i can show right now.

edit - I have all of the process written out and I have written it in the way a computer would actually read the files and recompress them, this isnt some theorm that I just wrote down on paper. I went step by step in the process and made sure it was 100% compressable and 100% reversable.

The more I read that document on how this is impossible, the more info im finding out on these people who have claimed that thiers worked when all they had was a mathamatical problem that they thought they could solve before even thinking about programming it or how the computer can use it.
So, here's how it works?

First, you have two constant bytes. Like the "GZ" from gzip, or the "BZ" from bzip2.

Then you have a bunch of 7-, 8-, or 9-byte strings. It works like this, I assume:
ASCII 00###### -> Compressed 1###### (8 bits to 7 bits, 25% of the time)
ASCII 01###### -> Compressed 00###### (8 bits to 8 bits, 25% of the time)
ASCII 1####### -> Compressed 01####### (8 bits to 9 bits, 50% of the time)

Given a random input file, each byte from 0-255 will appear the same number of time. Thus, the average size of a compressed byte is:

(7 * 25%) + (8 * 25%) + (9 * 50%) = 1.75 + 2 + 4.5 = 8.25.

Thus, the file expands by 1/64.

So, unless your input files contain mostly bytes that translate to your 7-bit strings, you should be seeing the file *increase* in size by 1/64 each time, not *decrease* in size.

If your program makes the files smaller when they shouldn't be, chances are it's either cheating, using very compressible input files, or losing data--have you written and tested the decompressor yet? ;-)

If I'm wrong, please correct me--I'm interested in seeing how it actually works if this isn't how it works.

See ya,

Tom