View Single Post
  #14   Spotlight this post!  
Unread 07-09-2004, 23:03
rbayer's Avatar Unsung FIRST Hero
rbayer rbayer is offline
Blood, Sweat, and Code
no team (Teamless Orphan)
 
Join Date: Mar 2002
Rookie Year: 2001
Location: Minnetonka, MN
Posts: 1,087
rbayer is a glorious beacon of lightrbayer is a glorious beacon of lightrbayer is a glorious beacon of lightrbayer is a glorious beacon of lightrbayer is a glorious beacon of light
Send a message via AIM to rbayer
Re: New compression method

Quote:
Originally Posted by Aalfabob
Alright, first off I'm not trying to pull anything. Why would I create an account and waste my time to mess around with people, a friend gave me this site to ask a few people if they think it would be possible. Second off, my compression theorm isnt like the others, it doesnt run on how many times certain charactors show up in a file as a hole. This makes it able to recompress the same file over and over again almost always gaining a compression. This also means that it can work on any type of file, .zip, .exe, .jpg, ect. But it does reach a limit to the file sizes it can reach, with the current program I have made it can compress any file type and size down to 508 bytes and ussually fluctuates around 508 and 515 bytes. Because a file is larger then another doesnt mean it can not hit this limit, it just means that more attemps must be made to reach it. I have some data charts if anyone wishes to see them.
Ah, now you've made a _provably_ false claim. And here I go with my proof (because I have nothing better to do and am extremely bored right now):

A few definitions first:

Let c be the compression function, so c(f) represents the compressed version of f.
Let d be the decompression function, so d(e) represents the decompressed version of e.
Also, since the compression is lossless, we have d(c(p)) = p for all p.

Lemma 1: In a lossless compression scheme, at most 2^k different files can be represented by k-bits.
Proof: Assume for sake of contradiction that 2^k + 1 distinct files can be represented in k-bits. Since there are only 2^k different possible bit strings of length k, by the pigeon hole principle, we can conclude that two distinct files (say, f1 and f2) compress to the same output e. Formally, we have:
c(f1) = c(f2) = e, and f1 <> f2 (I use <> for not equal)
applying the function d to both sides, we get:
d(c(f1)) = d(c(f2)) = d(e) by our cancellation law, we get:
f1 = f2, but this contradicts the assumption that f1 <> f2. Thus, the lemma holds.

Applying lemma 1 to the range of bit values given, we trivially conclude that at most
2^(8*508) + 2^(8*509) + 2^(8*510) + 2^(8*511) + 2^(8*512) + 2^(8*513) + 2^(8*514) + 2^(8*515)
= 2^(8*508)(1 + 2^8 + 2^16 + 2^24 + 2^32 + 2^40 + 2^48+ 2^56)
= 2^(8*508) < 2^(8*508) 2^57 = 2^(4121)

distinct files can be compressed to sizes between 508 and 515 bytes.

Now, there are 2^(8*1024*1024) files of size 1MB. With a little division, we see that at most, one in every
2^(8388608)/2^4121 = 2^8384487

can be compressed to be within that range. AT THE MOST. For reference, there are approximately 2^265 atoms in the entire universe.



On the other hand, it is perfectly possible that your compression is amazing for certain types of files. It just isn't mathematically possible to be _that_ amazing for an arbitary file.


EDIT:
Before someone starts talking about how the lemma breaks down because it doesn't take into account the fact that you're running the algorithm multiple times, consider the following:
Let c be the one-pass compressor, and d be the one-pass decompressor. Then we can write (in pseudocode):
function cmult(f){
while(prev_file_size > curr_file_size)
f = c(f)
endwhile
}

Thus, cmult will continually compress using whatever algorithm of choice until it stops getting smaller. Then just replace c with cmult in the proof of the lemma.
__________________
New C-based RoboEmu2 (code simulator) available at: http://www.robbayer.com/software.php

Last edited by rbayer : 07-09-2004 at 23:37. Reason: repeated function application note