Log in

View Full Version : New compression method


Aalfabob
06-09-2004, 23:04
I made a new way to compress files a couple of days ago, ive made the program and have tested them. The only problem is that most people will not believe that what i have done is possible and i cant actually send them how it works because i havent patented it yet. But heres my question, If you are able to compress the same file over and over again would it be possible to reach very small sizes? Ive tryed a 1 Mb file so far and it reached 515 bytes, and it is fully reversable.

Max Lobovsky
06-09-2004, 23:20
It's possible that recompressing a file can make it smaller, depending on the compression scheme used, but, <edit>I think</edit> most modern compression schemes compress as much as they can on there first pass (sometimes because there first pass really includes several passes at compressing the data).

Anyway, compressing a file from 1Mb to 515 bytes doesn't really say anything about your compression scheme. If you give me a file of any size, I can very simply write a compression scheme to compress it to 0 bytes. If you can take arbitrary files and consistently compress them to a small size, then you have a compression scheme of merit.

ahecht
07-09-2004, 00:34
I made a new way to compress files a couple of days ago, ive made the program and have tested them. The only problem is that most people will not believe that what i have done is possible and i cant actually send them how it works because i havent patented it yet. But heres my question, If you are able to compress the same file over and over again would it be possible to reach very small sizes? Ive tryed a 1 Mb file so far and it reached 515 bytes, and it is fully reversable.
I don't know. When a brand new member with no team number comes in here and, in their very first post, posts the equivalent of digital snake oil, I become a bit suspicious. I'm not sure what this guy is trying to pull (such as whether he is an existing member who set up a dummy account for this or just a random troll), but I can assure people that something is up.

Besides, with most compression methods, compressing an already compressed file results in a slightly larger file size.

Ryan M.
07-09-2004, 07:08
I'm not sure what this guy is trying to pull (such as whether he is an existing member who set up a dummy account for this or just a random troll), but I can assure people that something is up.
I agree that it is a little unbelievable. Alaphabob, I'd trust almost anyone here with something that isn't protected yet. Just ask anyone with 5+ "rep(utation) points" to look over your algorithm. (Those are the green dot's on the left of the darker-grey bar above their post.) I'm not saying that less rep'd people aren't trustworthy, but it's something to reasure you.
Besides, with most compression methods, compressing an already compressed file results in a slightly larger file size.That's because, with all most all compression schemes, they work by creating a library, at some point in the file, and put multi-byte strings into it. The rest of the file is then encoded by putting a one byte key in place of the original string. If this string happens more than once, then you save space. For instance, if you have a string in a text file like "happy" three times, then you put "happy" in your library once and 3 one byte markers in the text for a total of around 8 bytes (there's probably also buts separating different items in the library, etc, which is why I say "around"). The original three happies took up 15 bytes.

When you recompress the file, it ends up compressing a file with no, or very few, redundencies, which are what make the library method work so well.

EDIT:
Why the heck did I choose happy? Why not something cool, like FIRST? :D

Aalfabob
07-09-2004, 15:37
Alright, first off I'm not trying to pull anything. Why would I create an account and waste my time to mess around with people, a friend gave me this site to ask a few people if they think it would be possible. Second off, my compression theorm isnt like the others, it doesnt run on how many times certain charactors show up in a file as a hole. This makes it able to recompress the same file over and over again almost always gaining a compression. This also means that it can work on any type of file, .zip, .exe, .jpg, ect. But it does reach a limit to the file sizes it can reach, with the current program I have made it can compress any file type and size down to 508 bytes and ussually fluctuates around 508 and 515 bytes. Because a file is larger then another doesnt mean it can not hit this limit, it just means that more attemps must be made to reach it. I have some data charts if anyone wishes to see them.

Jeff_Rice
07-09-2004, 15:45
Is it a lossy or lossless?

Aalfabob
07-09-2004, 15:48
Lossless, what would be the point of compressing a data file if it was corrupted when uncompressed?

I am going to see if any large buisnesses are interested in this and if not I will make it open source, this is one of the reasons why I am trusting noone. Even if someone is trustful, there is still always that very small chance of it getting out.

Jeff_Rice
07-09-2004, 16:06
Well, why did you post anything at all if you are trusting no one? Without your algorithm it is very difficult to help you.

Compression is like factoring. In factoring, you take a complex equation and can define it simply by its solutions. In compression you do a similar thing. However, you will eventually run into a floor no matter how good a compression system you use. This is due to the fact that the information is still there, just in a compressed format. I am guesing your algorithm has some sort of system for recording what it has done in order that it can be undone. This file created requires space. The more times you compress, the closer the file is to becoming "prime" toward the algorithm. Eventually you reach a point where the information to expand the file makes the file large enough that compression will not make the whole set any smaller.

So basically what ryan morehart said, but in more generic terms.

Aalfabob
07-09-2004, 16:36
I understand the problem of you not being able to help me because im not releasing how it works. Are there any ways that I would be able to make it open source for a little while till I am able to open it up for commericial uses? I want to be able to keep the theorm if I ever decide to make money with it and I want to make sure noone can steal it. Would making it open source save it from being stolen? Ive looked into patents but there is no way I can afford the $4000 to get one and the $1500 to keep them updated every couple of years. If anyone has a link or something to help me out here please post it.

Ill be happy to post all the information needed about it as soon as its safe. And I do understand that this seems impossible but trust me its not :) .

Ryan M.
07-09-2004, 16:44
Well, here (http://www.opensource.org/licenses/)'s a site which lists the most commonly used open source licenses. Read through them and see what you like. Make sure you choose one which prevents the commercial reuse of the source code.

Edit:
Hm, actually, according to them, "open source" licenses do not prevent commercial use. Whatever... :rolleyes:

seanwitte
07-09-2004, 16:48
Go to www.maximumcompression.com and run your utility against their test files. You'll be able to compare your results against a fairly large set of benchmarks. Post your results. If you really beat those benchmarks then you'll need to have a few volunteers verify your results. For that you can distribute a binary without source and a non-disclosure.

Aalfabob
07-09-2004, 16:56
Alright let me rebuild my program (7 days max) because i have a couple new ideas i would like to try out with it. And i will post up the scores by then or earlyer. Hopefully I will be able to get it done alot sooner but it depends on how much work I have.

Joe Ross
07-09-2004, 18:42
Go to www.maximumcompression.com and run your utility against their test files. You'll be able to compare your results against a fairly large set of benchmarks. Post your results. If you really beat those benchmarks then you'll need to have a few volunteers verify your results. For that you can distribute a binary without source and a non-disclosure.

I'm suprised that site doesn't have a file full of pseudo-random data. While very complete in testing different programs, the files it chooses seem rather arbitrary.

Max Lobovsky
07-09-2004, 19:08
I'm suprised that site doesn't have a file full of pseudo-random data. While very complete in testing different programs, the files it chooses seem rather arbitrary.
I'm not sure what you mean by "pseudo", but it's mathematically impossible for a compressor to consistently compress random data. I don't know much information theory at all, but i know that random data is essentially pure information, and there is no way to encode pure information into a smaller amount of information. (This is incidentally why already compressed files (including lossy media compression for images, sound, etc) don't compress well. They have already converted the data to near pure information.)

The site's choice of files seems pretty logical to me. It has a selection of files that are commonly compressed. It might be interesting if they tried recompressing some very common compressed file(s), maybe like the Fedora Linux distribution, or Microsoft Office 2k3.

Aalfabob
07-09-2004, 19:28
I'm not sure what you mean by "pseudo", but it's mathematically impossible for a compressor to consistently compress random data. I don't know much information theory at all, but i know that random data is essentially pure information, and there is no way to encode pure information into a smaller amount of information. (This is incidentally why already compressed files (including lossy media compression for images, sound, etc) don't compress well. They have already converted the data to near pure information.)

Well see :) , sometimes the impossible can be done.

FizMan
07-09-2004, 20:02
Turns out this entire time Aalfabob has been compressing files with the same sequence of information repeated over and over again ;)

j/k So are you consistently compressing files to around 508-515 bytes despite major ranges in their uncompressed format? A 5 meg file would compress to 514 bytes and a 500kb file would also compress to say, 510b? I find that very fascinating...

Aalfabob
07-09-2004, 20:28
Turns out this entire time Aalfabob has been compressing files with the same sequence of information repeated over and over again ;)

j/k So are you consistently compressing files to around 508-515 bytes despite major ranges in their uncompressed format? A 5 meg file would compress to 514 bytes and a 500kb file would also compress to say, 510b? I find that very fascinating...

Yep. the only thing that changes when changing the files size is how many times it needs to be ran through the compressor to get the same results as say a file half its size. To keep track of this, 2 bytes are put as the main header of the file to keep track of how many times its been compressed (Up to 65535 times).

The last time I ran it on a 1 meg file, it had taken around 818 runs to get down to the 515 Bytes, but most of these runs were spent between 2500 bytes and 515 bytes due to only gaining 1 - 10 bytes per try. I made some graphs from the logs put out by that program. If i can get my friend to give me some space again on his server i can post them.

Right now I'm reprogramming the compression portion of the program because the last one was pretty slow from the way i was reading from the file. The new one pretty much just has to run a couple of if then statements and change a byte. Hopefully it will beat the common compressors today in speed also but ill have to see how well i can get it programmed. This next way should also be able to chop of around 2% - 5% of the file every pass.

edit - Ill just put up the graphs here to give you an idea what im talking about. Also, the second chart actually loses some compression some times but the next pass ussually gains it back.

Max Lobovsky
07-09-2004, 21:07
In case any of you are still in doubt about this "new compression scheme", I encourage you to read this discussion of exactly this matter, (http://www.faqs.org/faqs/compression-faq/part1/section-8.html)

A quote from this document:
It is mathematically impossible to create a program compressing without loss*all* files by at least one bit (see below and also item 73 in part 2 of this FAQ). Yet from time to time some people claim to have invented a new algorithm for doing so. Such algorithms are claimed to compress random data and to be applicable recursively, that is, applying the compressor to the compressed output of the previous run, possibly multiple times. Fantastic compression ratios of over 100:1 on random data are claimed to be actually obtained.

Aalfabob
07-09-2004, 21:53
In case any of you are still in doubt about this "new compression scheme", I encourage you to read this discussion of exactly this matter, (http://www.faqs.org/faqs/compression-faq/part1/section-8.html)

A quote from this document:

Wow, had no clue anyone else was claiming that they could do this. Ive read the paper and from what I understand, my method does not fit his arguement for the most part. It uses no fancy math, no special numbers which im guessing that most of the people try to work with because their *special* and really had no proof or ideas that they would actually do something. But I have to admit, alot of the processes he talks about that are flawed I had thought about at one time and I turned them down because it was easly noticed that they were flawed. I check my ideas from start to finish before I go around and claiming that I invented something that works.

Another arguement that he made was that some of these methods used groups of bits to show compression. Most of these methods are flawed majorly because they had no proff that they could ever be reversed. But I do know for a fact that a computer can tell if a start of a group of bits are 1, 00, or 01 which makes them easly seperated. There is also another method which I had made and it also is a way to seperate the bytes but Ill explain that in my post when my program is finished (Was thrown away due to me finding a better faster way).

If this is truly just about if I am lieing about this, just give me 7 days as I had posted earlyer and the new program will be finished. But just because some guy writes a paper on if this is possible or not does not mean he knows 100% what hes talking about. I am positive that he has not tested every single way you can compress a file which makes his arguement invalid about there being no possible way. Every once in a while something *impossible* happens, Im sure you can think of hundreds of examples, but please just give me the time to prove this. It think the short wait of only 7 days isnt going to kill you. And if you still really think its so impossible thats there is no chance at all, you can exit this post and forget all about it.

edit - Btw when i finish the program I will post a link to a video of the program running, and if thats not enough because some people are going to say, "Ahh that can be easly faked" Ill post up some other way that it can be proven unless i somehow get the money for a temporary patent, then ill just put the source up for everyone and the method.

Rickertsen2
07-09-2004, 22:43
Wow, had no clue anyone else was claiming that they could do this. Ive read the paper and from what I understand, my method does not fit his arguement for the most part. It uses no fancy math, no special numbers which im guessing that most of the people try to work with because their *special* and really had no proof or ideas that they would actually do something. But I have to admit, alot of the processes he talks about that are flawed I had thought about at one time and I turned them down because it was easly noticed that they were flawed. I check my ideas from start to finish before I go around and claiming that I invented something that works.

Another arguement that he made was that some of these methods used groups of bits to show compression. Most of these methods are flawed majorly because they had no proff that they could ever be reversed. But I do know for a fact that a computer can tell if a start of a group of bits are 1, 00, or 01 which makes them easly seperated. There is also another method which I had made and it also is a way to seperate the bytes but Ill explain that in my post when my program is finished (Was thrown away due to me finding a better faster way).

If this is truly just about if I am lieing about this, just give me 7 days as I had posted earlyer and the new program will be finished. But just because some guy writes a paper on if this is possible or not does not mean he knows 100% what hes talking about. I am positive that he has not tested every single way you can compress a file which makes his arguement invalid about there being no possible way. Every once in a while something *impossible* happens, Im sure you can think of hundreds of examples, but please just give me the time to prove this. It think the short wait of only 7 days isnt going to kill you. And if you still really think its so impossible thats there is no chance at all, you can exit this post and forget all about it.

edit - Btw when i finish the program I will post a link to a video of the program running, and if thats not enough because some people are going to say, "Ahh that can be easly faked" Ill post up some other way that it can be proven unless i somehow get the money for a temporary patent, then ill just put the source up for everyone and the method.

You have piqued my intrest but i won't believe anyhting till i see it. Honestly jsut think about what you are claiming and give us reason why we should believe you and not jsut think you a nut? We have been given lots of empty promises but no hard evidence. One question... Why do you choose these forums to post your *discovery*? Sure we are all nerds here and many of us are interested in this sort of thing but i am sure there are forums dedicated to compression. Why Chiefdelphi? I truly would like to believe that your claims are real but until i see proof you are a nut in my book. This reminds my of the human cloning ppl a while back who suddenly vanished into the ethersphere when asked to prove what they had done. I have seen too many things like this that all turn out to be nothing. If i were claiming what you are claiming you would think i was nuts too so you can't realyl blame us. I'll admit that many things that have in the past been dismissed at utterly impossible are now intergral parts of our everyday lives. Maybie this is one of them but somehow i am doubtful. I sincerely hope you prove me wrong.

I'm not really sure what i talkign about and i may be completely wrong, but could this possible fall under a copyright which is a much easier process. I think there is even some sort of implied copyright on everythign that you don't actualyl have to file for but it won't hold up in court very as well as an official one. I looked into this at one point a loong time ago but i don't really remember much.

Aalfabob
07-09-2004, 22:56
I understand exactly what you are saying and I would feel the same way. The reason I came to this site is because I was having trouble finding an actual forum on file compression. So I talked to one of my friends and he said that this would be a good site so I made a post.

The only way to give everyone hard proof of this would be to give someone the decompressor, and have them provide me with the file that needs compressing. This is the only fair way I can think of that would prove that it works while keeping how it works a secret until I can actually make the idea *safe*. Stay patient and I will have a fast and working program ready for this.

rbayer
07-09-2004, 23:03
Alright, first off I'm not trying to pull anything. Why would I create an account and waste my time to mess around with people, a friend gave me this site to ask a few people if they think it would be possible. Second off, my compression theorm isnt like the others, it doesnt run on how many times certain charactors show up in a file as a hole. This makes it able to recompress the same file over and over again almost always gaining a compression. This also means that it can work on any type of file, .zip, .exe, .jpg, ect. But it does reach a limit to the file sizes it can reach, with the current program I have made it can compress any file type and size down to 508 bytes and ussually fluctuates around 508 and 515 bytes. Because a file is larger then another doesnt mean it can not hit this limit, it just means that more attemps must be made to reach it. I have some data charts if anyone wishes to see them. Ah, now you've made a _provably_ false claim. And here I go with my proof (because I have nothing better to do and am extremely bored right now):

A few definitions first:

Let c be the compression function, so c(f) represents the compressed version of f.
Let d be the decompression function, so d(e) represents the decompressed version of e.
Also, since the compression is lossless, we have d(c(p)) = p for all p.

Lemma 1: In a lossless compression scheme, at most 2^k different files can be represented by k-bits.
Proof: Assume for sake of contradiction that 2^k + 1 distinct files can be represented in k-bits. Since there are only 2^k different possible bit strings of length k, by the pigeon hole principle, we can conclude that two distinct files (say, f1 and f2) compress to the same output e. Formally, we have:
c(f1) = c(f2) = e, and f1 <> f2 (I use <> for not equal)
applying the function d to both sides, we get:
d(c(f1)) = d(c(f2)) = d(e) by our cancellation law, we get:
f1 = f2, but this contradicts the assumption that f1 <> f2. Thus, the lemma holds.

Applying lemma 1 to the range of bit values given, we trivially conclude that at most
2^(8*508) + 2^(8*509) + 2^(8*510) + 2^(8*511) + 2^(8*512) + 2^(8*513) + 2^(8*514) + 2^(8*515)
= 2^(8*508)(1 + 2^8 + 2^16 + 2^24 + 2^32 + 2^40 + 2^48+ 2^56)
= 2^(8*508) < 2^(8*508) 2^57 = 2^(4121)

distinct files can be compressed to sizes between 508 and 515 bytes.

Now, there are 2^(8*1024*1024) files of size 1MB. With a little division, we see that at most, one in every
2^(8388608)/2^4121 = 2^8384487

can be compressed to be within that range. AT THE MOST. For reference, there are approximately 2^265 atoms in the entire universe.



On the other hand, it is perfectly possible that your compression is amazing for certain types of files. It just isn't mathematically possible to be _that_ amazing for an arbitary file.


EDIT:
Before someone starts talking about how the lemma breaks down because it doesn't take into account the fact that you're running the algorithm multiple times, consider the following:
Let c be the one-pass compressor, and d be the one-pass decompressor. Then we can write (in pseudocode):
function cmult(f){
while(prev_file_size > curr_file_size)
f = c(f)
endwhile
}

Thus, cmult will continually compress using whatever algorithm of choice until it stops getting smaller. Then just replace c with cmult in the proof of the lemma.

FizMan
07-09-2004, 23:09
lolk d00d, I'd be more than happy to test a binary executable or something where I couldn't haxx0r the source code from... heck... the limit of my haxxing skills are doing a whois search on someone's IP ;)

Out of curiosity, you mentioned that the greater the file size, the more iterations it has to go through to reach a file size around 500+b, and the time it takes will be exponential. So what's the difference in times between a 10 meg file and say, a 10 gigabyte file? You said it could do any file type/size... if that's the case, it could totally revolutionize data storage if you can story gigabytes of information as say, 10 megs instead (or heck, if you got the raw power, 500+b)

Though to be honest, I'd really like to see this in action myself because, and no offense to you (I'm sure you'd be just as skeptical if someone else said it), it seems quite the feat to compress 1000000000 bytes into 515 bytes.

And do you know why the limit is 500+ bytes?

Can it currently compress directory information/batch files? I suppose that's a trivial matter to take care of, but I don't know what the foundation to your code is. Or would it end up turning a directory of 500 files into 250000 bytes? (500files X 500b)

And what happens if you try to compress a file that's already LESS than 500 bytes? roffle, utterly trivial too... but like I said, I'm curious.


EDIT:

Don't listen to these non-believers!!!!!!!!11111 I HAVE FAITH IN YOU!!!!11 (actually... just taking the two minutes to show interest and support in the small possibility that I could end up getting a few bucks out of it when this program makes millions and billions of dollars...)

For the benefit of the doubt... who knows? Maybe there is a way around it all...

>.>
<.<
(remember me when you make your billions)

Ryan M.
08-09-2004, 06:14
I'm not really sure what i talkign about and i may be completely wrong, but could this possible fall under a copyright which is a much easier process. I think there is even some sort of implied copyright on everythign that you don't actualyl have to file for but it won't hold up in court very as well as an official one. I looked into this at one point a loong time ago but i don't really remember much.He could copyright any source code released (an open source license is essentially a detailed copyright) or any document detailing the algorithm, but I don't think it wouldn't actually protect the algorithm itself. I may be wrong about that.

Putting a copyright notice on is easy, at least. Just stick a "Copyright 2004 by Alphabob (real name, obviously)." And, the best part is, you can do that for free. No registration, nothing. :D

ErichKeane
08-09-2004, 11:45
A few things: newer compression methods do not base themselves on true ability to compress. Even if you COULD do what you claim, which i highly doubt, it would not be very worth while.

Right now, hard drive space is cheap. Time is not cheap. So if your algorithm took say, 2 hrs to make my 1gb file into a 500byte file, and winrar turns it into 400mb in 3 minutes, what is the advantage? Sure, i saved about 400mb, but i lost 2 hours! Also, you can get a hard drive for a buck a gig, so whats the point?

Also, most people who care about larger files, have broadband. I personally downloaded the entire mandrake linux pack a few years ago on a 1mbit connection. So, i let it run overnight. Consider this: the ISO files totaled to about 2 gigs, it took me overnight to download, so no big deal. Using winrar, i possibly could have gotten these 2 gigs down to about 1.3 gb, but it would have taken 20 minutes or so.

Now, based on the fact that you say it is an exponentially enlarging algorithm, i would find no problem assuming this compression/decompression would take longer than it did to download the whole file on my DSL connection.

NOW, place that file accross my 10mbit connection at college, and compression becomes useless. It used to be for compression that the order of matter was this : Size, Speed, Packageability. Now the order is: Packageability, size, speed.

So unless your "algorithm" can turn the 2gb files into a 500byte file within 20 minutes, it would not be worth while.

FizMan
08-09-2004, 12:34
Sure it would be worth the while... I mean... imagine being able to store entire hard drives on say, your USB pen, or on a floppy. Set your computer to make weekly (or nightly) backups overnight to a 1kb file. And as processors are constantly evolving, so too will the compressing speeds drop. And you probably wouldn't need to compress your files down to 500 bytes... bringing it down to say, 10 megs would be just as ideal (or 1+ megs for your floppy) or to 700 megs and toss it on a CD.

It would naturally change many industries too; especially if under professional development the processing time can be reduced. Transfer full length feature films to the theatres for digital real-time screening... not to mention all those 56k'ers out there ;) mmmm piracy

Fulcrum2000
08-09-2004, 12:35
This claim is total BS. Iterative compression simple isn't possible as it will *increase* filesize instead of reducing it.

Regards,
Werner Bergmans
Webmaster http://www.maximumcompression.com/

ErichKeane
08-09-2004, 13:24
Im calling BS too. As i said before, even if you could compress stuff really well, it wouldnt be worth the time.

Processing power is expensive--it takes a TON of money to get a processor that would be able to do an iterative compression method in any reasonable amount of time.

Storage on the other hand, is relatively cheap: you said that making nightly backups would be the way to go. What happens if nightly backups take say, 16 hours? It makes it not worth it, especially because then you need to factor in transfer time. It is just as cheap to set up a raid array and just live with an extra HD around.

Compression was big a few years ago, but it really isnt important at all right now. Most corperations wont even trust compression, because they lack some of the failsafes that maintain the integrity of the compressed file. In a compressive algorithm, it is possible to screw up the entire file with a relatively small amount of loss (slightly above 1 byte). In this "method" listed by the thread starter, i would assume that a loss of 4 bits could screw the file to hell.

In a real file, there are enough failsafes, that it may take as much as 30% to make a file unrecoverable. The same cannot be said about a compressed file.

Ryan M.
08-09-2004, 15:04
I have to argue from the other side. Compression is good. For instance, I have a 120 GB HD and a 40 GB HD in my computer right now. (For those lazy out there, that's 160 GBs of total storage.) However, I still don't appreciate the 9 GBs that 4 Fedora Core, 3 Mandrake, 3 Redhat, 3 BSD, and a few different live CD ISOs are taking up.

Also, consider the people on dialup users who pay by the minutes they're on. If you had the choice between a 100 MB download or a .5 KB download which are exactly the same things, which would you go for? Sure, the decompression takes time, but you're not paying for the time online you would be spending downloading a larger file. Also, this would be great for those amateur developers/webmasters who use a cheap/free web server with bandwidth limits.

Aalfabob
08-09-2004, 15:08
OK at these speed issues,
This is the information I have compiled about the compressor so far:
If is able to run the compression routine for 7.5 megs in 1 second (It is actually alot faster then this but incase I have any more ideas for the test version). Now if this file was highly in the compressors favor it could reach around a 10.9% gain in that one second for 7.5 megs. This means that it would shorten the file from 7.5 megs to around 6.67 megs in close to 1 second depending on your cpu and memory clock speed. This test was taken on a 1.47 Ghz XP athlon with 133 mhz memory.

Another thing about the time being exponentially larger:
Even though it has more passes to run for a small gain, the data being compressed is much smaller which gives it a pretty constant change vrs. time. If you look at those graphs ive posted earlyer, you can see that it includes no time but only passes vrs file size.

Btw the file needing to be in its 'favor' does not have a very great chance on random files but still can occur. But on files with charators near each other (text files, logs, ect.) It will be very close to to its max gain per pass (10.9%). Now on average, with my last build which wasnt programmed very well came to around 1.5% to 4% per pass on a .rar file, but the current build has been heavly modifyed to get higher compressions per pass.

O and here is another graph that shows that sometimes a file can actually get larger for a run but in the end, the compression method is in the favor of getting smaller. (Have tested many files of different sizes and types this is just one).

edit - If you think these times are impossible, they are due to the file being loaded into memory and of course it does take larger files in chunks if memory is needed. Also the speed comes from the compression process consisting of a couple of if thens, the qsort routine, and then the bytes being put back into memory. I expect that a professional programmer would be able to chop these times down by a ton by using assembly or faster librarys.

Alan Anderson
08-09-2004, 15:32
In a real file, there are enough failsafes, that it may take as much as 30% to make a file unrecoverable. The same cannot be said about a compressed file.
Assuming that the compressed files are under 1k in size, it's no big deal to add "failsafes" such as error-correcting codes or even brute force redundancy after the compression. That will of course increase the resulting file size, but if an original megabyte of raw data still ends up less than a few kilobytes of damage-tolerant compressed data, it's a major net win.

Given a sufficiently restricted set of possible input files and a sufficiently large shared data base, I can achieve miraculous compression too. For example, I can "encode" any static data currently on the World Wide Web into a short string of characters: just reference it by URL. But arbitrary multimegabyte files compressed to 500-odd bytes? To say I am skeptical would be an understatement.

FizMan
08-09-2004, 15:46
We'll, I guess we'll find out in less than a week's time, eh?

PERSONALLY, and I think I speak for everyone here, I'd hope that this is the real deal.


I wonder though if it would be possible for you to give us a quick run down on the logic behind your algorithms... I mean, for example, how you manage to get around the counting problem (the 2^n-1 thingamajiggy) Are you somehow incorporating non-base 10 mathematics or something?

Ryan M.
08-09-2004, 16:06
I mean, for example, how you manage to get around the counting problem (the 2^n-1 thingamajiggy) Are you somehow incorporating non-base 10 mathematics or something?Counting problem? What do you mean?

Aalfabob
08-09-2004, 16:10
We'll, I guess we'll find out in less than a week's time, eh?

PERSONALLY, and I think I speak for everyone here, I'd hope that this is the real deal.


I wonder though if it would be possible for you to give us a quick run down on the logic behind your algorithms... I mean, for example, how you manage to get around the counting problem (the 2^n-1 thingamajiggy) Are you somehow incorporating non-base 10 mathematics or something?

See, most of these claims on great compression ratios being made are by supporting a fact that you will always gain a compression down to their so called limit. I have my therom set up so that it has an advantage of gaining compression. This does not mean that it cant gain size either, it just simply means that in the long run the file will get small. And since my therom includes using headers to define what has happened to the piece of data, it has to have a limit because the headers take up space and the data the headers describe take up space.

These headers are 2 bytes each and are just showing predefined info that the compiler already knows. This makes that part 100% reversable. Next the compressed bytes are either 7-bit, 8-bit, or 9-bit. it defines a 7 bit variable with a start of 1 then 6 bits, an 8 with 00 then 6 bits and a 9 with a 01 then 7 bits. As you can see this makes up all 256 values of the asiic set and also is easly reversable. That is a pretty big part in my compression on how it uncompresses. Thats really all i think i can show right now.

edit - I have all of the process written out and I have written it in the way a computer would actually read the files and recompress them, this isnt some theorm that I just wrote down on paper. I went step by step in the process and made sure it was 100% compressable and 100% reversable.

The more I read that document on how this is impossible, the more info im finding out on these people who have claimed that thiers worked when all they had was a mathamatical problem that they thought they could solve before even thinking about programming it or how the computer can use it.

Fulcrum2000
08-09-2004, 17:08
edit - If you think these times are impossible, they are due to the file being loaded into memory and of course it does take larger files in chunks if memory is needed.

No, I don't think these times are impossible, but I *know for sure* that you claims about achieved compression are utter <deleted>. Please stop spamming the forum with claims which are totally impossible.

I make you a deal. Compress one of the test files on my website (http://www.maximumcompression.com/) and send me the compressed file + compiled decoder. I will then decompress this file on my computer. If after decompressing original file is restored (binary identical to the original) and the size of the decoder + compressed file is less then 95% of the best compressor mentioned for that test you get $10.000 from me!. Deal?

PS Before decompressing I'm allowed to change the name of the compressed file to anything I like.

Jack
08-09-2004, 17:09
One question... Why do you choose these forums to post your *discovery*? Sure we are all nerds here and many of us are interested in this sort of thing but i am sure there are forums dedicated to compression. Why Chiefdelphi?

Just to answer this question...

He's a person who goes to my school whom I know.. and btw he's considering joining our robotics team too ;)

He was talking to me about this, and since I really have no idea, I sent him to chiefdelphi because I knew there would be people here who understood better and could either prove or disprove his claim.

I have no idea if his method works, and honestly am as skeptical as the rest of you.. but eh.. if it does work.. it'd be cool :)

yea.. i know this is borderline chit-chat.. but it's the off season still.. and it can always be moved to another forum if brandon deems necessary..

anywho.. i guess we'll just have to wait for the program to see how good this idea really is..

jack

Dave Flowerday
08-09-2004, 17:24
No, I don't think these times are impossible, but I *know for sure* that you claims about achieved compression are utter {bleep}. Please stop spamming the forum with claims which are totally impossible.
I am finding this discussion interesting, but vulgar language is not appreciated by most of us here. I find it a little interesting that someone who is completely new to this forum and obviously doesn't even understand our etiquette could accuse someone else of spamming.

Fulcrum2000
08-09-2004, 17:33
I am finding this discussion interesting, but vulgar language is not appreciated by most of us here. I find it a little interesting that someone who is completely new to this forum and obviously doesn't even understand our etiquette could accuse someone else of spamming.

Sorry for the strong words I used, but I hope this kind of language will make clear the claims made are totally impossible. I saw these kind of super compression claims many, many times. Most of them where made to get some quick cash of people so they could 'finalize their invention and patent there discoveries'. After payment the people suddenly disappear...

I agree I maybe not fully aware of the forum etiquettes, but I know something about what's possible and impossible in compression land. That's why I give him a opportunity to earn 10000 dollar real fast.

Regards,
Werner Bergmans
Eindhoven, Netherlands

Aalfabob
08-09-2004, 17:37
No, I don't think these times are impossible, but I *know for sure* that you claims about achieved compression are utter <edit>. Please stop spamming the forum with claims which are totally impossible.

I make you a deal. Compress one of the test files on my website (http://www.maximumcompression.com/) and send me the compressed file + compiled decoder. I will then decompress this file on my computer. If after decompressing original file is restored (binary identical to the original) and the size of the decoder + compressed file is less then 95% of the best compressor mentioned for that test you get $10.000 from me!. Deal?

PS Before decompressing I'm allowed to change the name of the compressed file to anything I like.

Ok let me show you how this bet is near impossible...

You said compressor + compressed file ='s less then todays best compiler. Well the .exe tested on that site is only 3,870,784 Bytes. The top scoring compressor recieved a final size of 953785 Bytes. You want me to get that down to a size of 47,689 Bytes (47 Kb @ 95%). Ok so even if my compression could knock it down to 515 bytes, that leaves 47,174 Bytes left for the program. Um I dont really know how you want me to get my program that small with out spending hundreds of hours in assembly just to save the size of the decompressor. Right now with a little over 100 lines im at 626,747 Bytes. Unless im reading your challenge wrong, it seems a little impossible...

Fulcrum2000
08-09-2004, 17:44
Ok let me show you how this bet is near impossible...
You said compressor + compressed file ='s less then todays best compiler. Well the .exe tested on that site is only 3,870,784 Bytes. The top scoring compressor recieved a final size of 953785 Bytes. You want me to get that down to a size of 47,689 Bytes (47 Kb @ 95%).

No, 95% of 953785 Bytes is about 906 Kb.
So the size of your decompressor + compressed file should be less then 906 Kb.


Right now with a little over 100 lines im at 626,747 Bytes. Unless im reading your challenge wrong, it seems a little impossible...

I think you are OK, decompressor + compressed file are almost under 609 Kb!. But if you have a problem meeting the 95% I will increase it to 98%, so the combined size should be under 934709 bytes.

ThomasTuttle
08-09-2004, 18:06
You said it can eventually compress any file to 508-515 bytes? Okay, let's assume you can do 512 for simplicity (it really doesn't matter). This can't hold true for every file, since with 512 bytes there are only so many files you can make. (1044388881413152506691752710716624382579964249047 3837803842334832839\
53907971557456848826811934997558340890106714439262 837987573438185793\
60726323608785136527794595697654370999834036159013 438371831442807001\
18559462263763188393977127456723346843445866174968 079087058037040712\
84048740118609114467977783598029006686938976881787 785946905630190260\
94059957945343282346930302669644305902501597239986 771421554169383555\
98852914863182379144344967340878118726394964751001 890413490084170616\
75093668333850551032972088269550769983616369411933 015213796825837188\
09183365675122131849284636812555022599830041234478 486259567449219461\
70238065059132456108257318353800876086221028342701 976982023131690176\
78006675195485079921636419370285375124784014907159 135459982790513399\
61155179427110683113409058427288427979155484978295 432353451706522326\
90613949059876930021229633956877828789484406160074 129456749198230505\
71642377154816321380631045902916136926708342856440 730447899971901781\
46576347322385026725305989979599609079946920177462 481771844986745565\
92501783290704731194331655508075682218465717463732 968849128195203174\
57002440926616910874148385078411929804522981857338 977648103126085903\
00130241346718972667321649151113160292078173803343 609024380470834040\
3154190336, in fact.)
So if you create a directory with every 513-byte file in existance (that's the above number times 8 files), it cannot be able to compress all of them to 512 bytes or less.
Furthermore, compressing random data is not feasible. Sure, you could try, and it might work sometimes, but, overall, any single string of bits will eventually appear, so you can't take advantage of a limited vocabulary in the file. Like if you tried to compress 2 bytes into one, it wouldn't work, since you would eventually have 65536 choices, which is back to the original 2 bytes.
In fact, I gave both gzip and bzip2 a chance, and so far, up to files of 64k, neither has averaged even one byte less than the original--in fact they are all larger!
But if you want me to test it, I would be glad to (I won't give away your program, don't worry...) I would really find it interesting if you can prove me and others wrong...

Andy Baker
08-09-2004, 18:09
Sorry for the strong words I used, but I hope this kind of language will make clear the claims made are totally impossible. I saw these kind of super compression claims many, many times.

Regards,
Werner Bergmans
Eindhoven, Netherlands

While this is understood, please choose words that don't pullute this fine site. This discussion is interesting, and your words are valuable as long as profanity is omitted.

You will be more respected and listened to if you keep things clean and respectful.

Andy B.

ThomasTuttle
08-09-2004, 18:37
See, most of these claims on great compression ratios being made are by supporting a fact that you will always gain a compression down to their so called limit. I have my therom set up so that it has an advantage of gaining compression. This does not mean that it cant gain size either, it just simply means that in the long run the file will get small. And since my therom includes using headers to define what has happened to the piece of data, it has to have a limit because the headers take up space and the data the headers describe take up space.

These headers are 2 bytes each and are just showing predefined info that the compiler already knows. This makes that part 100% reversable. Next the compressed bytes are either 7-bit, 8-bit, or 9-bit. it defines a 7 bit variable with a start of 1 then 6 bits, an 8 with 00 then 6 bits and a 9 with a 01 then 7 bits. As you can see this makes up all 256 values of the asiic set and also is easly reversable. That is a pretty big part in my compression on how it uncompresses. Thats really all i think i can show right now.

edit - I have all of the process written out and I have written it in the way a computer would actually read the files and recompress them, this isnt some theorm that I just wrote down on paper. I went step by step in the process and made sure it was 100% compressable and 100% reversable.

The more I read that document on how this is impossible, the more info im finding out on these people who have claimed that thiers worked when all they had was a mathamatical problem that they thought they could solve before even thinking about programming it or how the computer can use it.

So, here's how it works?

First, you have two constant bytes. Like the "GZ" from gzip, or the "BZ" from bzip2.

Then you have a bunch of 7-, 8-, or 9-byte strings. It works like this, I assume:
ASCII 00###### -> Compressed 1###### (8 bits to 7 bits, 25% of the time)
ASCII 01###### -> Compressed 00###### (8 bits to 8 bits, 25% of the time)
ASCII 1####### -> Compressed 01####### (8 bits to 9 bits, 50% of the time)

Given a random input file, each byte from 0-255 will appear the same number of time. Thus, the average size of a compressed byte is:

(7 * 25%) + (8 * 25%) + (9 * 50%) = 1.75 + 2 + 4.5 = 8.25.

Thus, the file expands by 1/64.

So, unless your input files contain mostly bytes that translate to your 7-bit strings, you should be seeing the file *increase* in size by 1/64 each time, not *decrease* in size.

If your program makes the files smaller when they shouldn't be, chances are it's either cheating, using very compressible input files, or losing data--have you written and tested the decompressor yet? ;-)

If I'm wrong, please correct me--I'm interested in seeing how it actually works if this isn't how it works.

See ya,

Tom

Aalfabob
08-09-2004, 21:24
So, here's how it works?

First, you have two constant bytes. Like the "GZ" from gzip, or the "BZ" from bzip2.

Then you have a bunch of 7-, 8-, or 9-byte strings. It works like this, I assume:
ASCII 00###### -> Compressed 1###### (8 bits to 7 bits, 25% of the time)
ASCII 01###### -> Compressed 00###### (8 bits to 8 bits, 25% of the time)
ASCII 1####### -> Compressed 01####### (8 bits to 9 bits, 50% of the time)

Given a random input file, each byte from 0-255 will appear the same number of time. Thus, the average size of a compressed byte is:

(7 * 25%) + (8 * 25%) + (9 * 50%) = 1.75 + 2 + 4.5 = 8.25.

Thus, the file expands by 1/64.

So, unless your input files contain mostly bytes that translate to your 7-bit strings, you should be seeing the file *increase* in size by 1/64 each time, not *decrease* in size.

If your program makes the files smaller when they shouldn't be, chances are it's either cheating, using very compressible input files, or losing data--have you written and tested the decompressor yet? ;-)

If I'm wrong, please correct me--I'm interested in seeing how it actually works if this isn't how it works.

See ya,

Tom

Yes this would have been a problem, but I had solved it already and it involves a little bit more then that. Trust me I will have the program ready for someone to test out, or more then one if everyone wishes so.

This is how the testing is going to be done, Im first going to send the decompressor to the tester. Next they send me all the files they want me to test (please try to stay under 10 megs, right now it is capable of up to 100 megs because i havent put in the memory swapping yet, but since its just a test I dont see the need for any bigger. But if you wish to try bigger ones...). Next I will compress the files they sent me and Ill send them back to the tester for them to decompressed and varifyed to see if they work. This way I cant make a program to work for only that file so it should be pretty good proff.

ThomasTuttle
08-09-2004, 21:28
Yes this would have been a problem, but I had solved it already and it involves a little bit more then that. Trust me I will have the program ready for someone to test out, or more then one if everyone wishes so.

This is how the testing is going to be done, Im first going to send the decompressor to the tester. Next they send me all the files they want me to test (please try to stay under 10 megs, right now it is capable of up to 100 megs because i havent put in the memory swapping yet, but since its just a test I dont see the need for any bigger. But if you wish to try bigger ones...). Next I will compress the files they sent me and Ill send them back to the tester for them to decompressed and varifyed to see if they work. This way I cant make a program to work for only that file so it should be pretty good proff.

Okay, I'll be glad to help... I'm skeptical, but interested.

djcapelis
08-09-2004, 22:51
Look, the math is quite simple on why this doesn't work...

Let me reduce it down for 4 bits to make things simple.

Let's say I make a program and claim that it can compress things to 4 bits. That means there are 16 possible files. Only 16 possible files. It is mathematically impossible to compress more than 16 files into 4 bits with the same procedure, no matter how many times you run it. (In the case of recursiveness you'd have to keep track of how many times which would immediately add to the file size.) This same concept applies when you make a claim to be able to compress any 1 megabyte file into 500 bytes.

It is simply absurd. Compression algorithms only work because they are good at finding patterns and taking patterned data and manipulating it. This is why you will almost never be able to compress random data.

Thanks for playing, hit the books and come back with something that's possible.

FizMan
08-09-2004, 23:21
I'd be interested to test out the decompressor as well. My email is fizixman@vgamp.com so let me know when... and send out a non-disclosure agreement or something or other if you're worried.


EDIT: Let's not forget that "Jack" here on Chiefdelphi is his friend... so Aalfabob, you should get together with Jack and show him your program working, and then you can testify Jack :D

MikeDubreuil
09-09-2004, 00:23
I'd be interested in seeing this work as well. You can contact me at DubreuilM@wit.edu.

I'm most interested in seeing how the test will handle a 5MB dump from /dev/random on a UNIX machine.

ErichKeane
09-09-2004, 00:27
Mike!! I had no idea anyone else from here went to WIT! Im a freshman on campus, tudbury mid, such a small world eh?

anyway, i think this is going to be one of those situations where someone has somethign solved in their mind supposedly, but then just "forgets" one small, simple thing that screws up the whole boat.

Elgin Clock
09-09-2004, 01:03
anyway, i think this is going to be one of those situations where someone has somethign solved in their mind supposedly, but then just "forgets" one small, simple thing that screws up the whole boat.
If that is the case, then I hope both sides of the debate that this has turned into graciously apologize to each other, and no one says "See, I told you it couldn't be done" or whatever.

I am for the most part skeptical till I see something work in front of me, but love to dream that it could.

Don't rain on this person's parade, until you absolutely see it not work (and even then, add ways that it may be able to work).



By the way, I have come up with a great idea for a perpetual motion machine, but until I can get a prototype made, I won't reveal my new idea.

One reason being that I don't want anyone else to take the credit for it, but the second reason is that if it does not work in real life like it does on paper, then I don't want the kind of responses that Aalfabob is recieving and the embarassment of letting myself down in public.

Remember those two words people. Gracious Professionalism.
Make your grandma's proud. Don't blindly say that something will or will not work until you see the proof that it does or does not.

FizMan
09-09-2004, 10:09
lolk, I love trying to invent perpetual motion machines! If you ever want someone to take a fresh look at your machine (because I find, that it often times will point out that little flaw that is so obvious... yet, so hidden) make me sign a confidentiality agreement or something.

Though lol, sometimes I feel bad for saying, "Umm... you realize blah blah blah so it won't work..." ;_;

Aalfabob
09-09-2004, 15:35
Alright heres an update on how build 5 is going:
I was going to try a new method I had came up with about a day ago cause I thought it would be a little better. After running a few tests on it, I found out that only some files ran better then the past method, and sometimes the files even got larger. I havent finished fully testing this program because when a file is being changed through a pass, it ends up with a totally different make-up inside of it which means that it could actually regain the lost bytes and compress below the original file. Although the 1st method is a little more complex, I should still be able to meet that dead line I gave before if anything doesnt 'pop-up' (Life stuff). And if something does happen, it wont delay the test by much. Im putting as much time as I can into programming it right now and dont have work for the next couple of days.

I am taking a break from that method and moving back into the first which I thought was going to be very slow, but I redesigned most of it to make it actually faster then the new method. Ive checked a couple of the files after a couple of runs by hand to see if the data is coming out right and everything seems fine. Thats about it for now.

edit - Some good news is though that I had finished the compressor before for the first method so I can basically copy most of the functions over and with some changes in file set up it should be good to go.

Fulcrum2000
09-09-2004, 15:38
Ive checked a couple of the files after a couple of runs by hand to see if the data is coming out right and everything seems fine. Thats about it for now.

Sounds good, my 10000 dollar offer is still valid...

FizMan
09-09-2004, 18:59
So Aalfa, do you take certified cheques/money order or have paypal? roffle

Aalfabob
09-09-2004, 23:20
Please correct me if im worng but I think this can help prove it a little better:

So say I have 512 bytes of data im compressing. This can contain 3.74 * 10^693 different combinations correct? So say I add a byte to that (513 Bytes), the combinations that creates are 6.16 * 10^693 or twice as much.

So say I have one byte as a header that can count the amount of times that data has been compressed. So that header can hold 256 values. So depending on which way those bytes come out (3.74 * 10^693), the counter can hold 256 different values so wouldnt adding that 1 byte for counting actually make the file have 9.57 * 10^695 combinations (256 * (3.74 * 10^693))? Now this is alot more combinations avaliable for the same amount of data. Hopefully I did that right.

Data:
512 byte combinations = 3.74 * 10^693
513 byte combinations = 6.16 * 10^693
513 with one byte being a counter = 9.57 * 10^695

If im correct i think that this can prove that that many pieces of randomly generated code can fit in that space. And plus im using a 2 byte main header which can contain 65536 runs.

MrToast
09-09-2004, 23:48
Any chance of a Mac OS X compile? Interestingly enough, I'm also collaborating with a friend in producing a new compression scheme that uses recursive compression. Don't ask me about it, though. I'm just building the interface.

mrtoast@gmail.com

MrToast

121st post! Go Rhode Warriors!

rbayer
10-09-2004, 00:03
Please correct me if im worng but I think this can help prove it a little better:

So say I have 512 bytes of data im compressing. This can contain 3.74 * 10^693 different combinations correct? So say I add a byte to that (513 Bytes), the combinations that creates are 6.16 * 10^693 or twice as much.

So say I have one byte as a header that can count the amount of times that data has been compressed. So that header can hold 256 values. So depending on which way those bytes come out (3.74 * 10^693), the counter can hold 256 different values so wouldnt adding that 1 byte for counting actually make the file have 9.57 * 10^695 combinations (256 * (3.74 * 10^693))? Now this is alot more combinations avaliable for the same amount of data. Hopefully I did that right.

Data:
512 byte combinations = 3.74 * 10^693
513 byte combinations = 6.16 * 10^693
513 with one byte being a counter = 9.57 * 10^695

If im correct i think that this can prove that that many pieces of randomly generated code can fit in that space. And plus im using a 2 byte main header which can contain 65536 runs.
It doesn't matter _what_ is stored in that extra byte. It's still just 8 bits. Adding 1 more byte, regardless of contents, always gives you 256 (that's 2^8) times as many possible combination. A bit, is a bit, is a bit, regardless of how your program interprets it. Also, keep in mind that 2^4096 (4096 is the number of bits in 512 bytes, so 2^4096 is the number of different possible 512-byte files) is still _astronomically_ smaller than 2^8388608, the number of different possible 1MB file. I really can't stress this enough: it has been _proven_ that you cannot compress every file of a given size down to a file smaller than the original (even if only by 1 bit). This has been PROVEN. Trying to do it is basically the same as trying to find an O(n) sorting algorithm (proven to be impossible) or trying to solve the halting problem (also proven impossible). While you're at it, why don't you find an integer solution to x^3 + y^3 = z^3.

Also, I _love_ math and arguing about math, etc, but _please_ use 2^x instead of 10. With binary stuff it makes it SO much easier to understand what's actually going on.

Rob

Max Lobovsky
10-09-2004, 00:20
Please correct me if im worng but I think this can help prove it a little better:

So say I have 512 bytes of data im compressing. This can contain 3.74 * 10^693 different combinations correct? So say I add a byte to that (513 Bytes), the combinations that creates are 6.16 * 10^693 or twice as much.

So say I have one byte as a header that can count the amount of times that data has been compressed. So that header can hold 256 values. So depending on which way those bytes come out (3.74 * 10^693), the counter can hold 256 different values so wouldnt adding that 1 byte for counting actually make the file have 9.57 * 10^695 combinations (256 * (3.74 * 10^693))? Now this is alot more combinations avaliable for the same amount of data. Hopefully I did that right.

Data:
512 byte combinations = 3.74 * 10^693
513 byte combinations = 6.16 * 10^693
513 with one byte being a counter = 9.57 * 10^695

If im correct i think that this can prove that that many pieces of randomly generated code can fit in that space. And plus im using a 2 byte main header which can contain 65536 runs.
No, you are not correct, there are 2^8=256 times as many 513 byte sets as 512 byte sets, not twice as many, just as many combinations as a one byte header adds... There really is absolutely no argument, you cannot consistently store a file in a space smaller than that file, otherwise you wouldn't be able to store all possible files...

Mark McLeod
10-09-2004, 10:07
The judgment might be being made among some of you that the algorithm is automatically a failure since it cannot possibly compress all possible combinations of files to 515 bytes. The algorithm doesn’t have to be equally successful with compressing every class of file, and they don’t all have to be reduced to 515 bytes to be considered successfully compressed. The most “successful” compression technology becomes so by specializing in one class of file or another, e.g., imagery, text, object, and executable files. Our friend here hasn’t yet had the opportunity to run rigorous tests on the wide variety of files within even one class yet to discover the limitations of his algorithm.

This is a learning experience in developing an algorithm. Everyone can help as independent testers, with developing test procedures, mathematical proofs, test sets, critique, etc. Just keep it positive. The mathematical reasoning posted above is very good experience for Aalfabob and for any of us for the type of proof that will be required whenever you develop a commercially viable algorithm.

Don’t be discouraged to uncover shortcomings of your method.

Aalfabob
10-09-2004, 15:09
Isnt the formula combinations = bytes ^ values

so like a 512 byte peice of data ^ 256 possible charactors = 3.742e+693 combinations?

edit - Nm got the wrong formula from someone.

Rickertsen2
10-09-2004, 16:10
Isnt the formula combinations = bytes ^ values

so like a 512 byte peice of data ^ 256 possible charactors = 3.742e+693 combinations?

edit - Nm got the wrong formula from someone.

The formula for the number of discrete combinations that can be stored in a binary number of length X bits is simply:

2^x

Regardless of how you group these bits or your program interprets them, you will always be limited by this fomula.

wun
10-09-2004, 16:40
The one thing I have been wondering is if your magical algorithm calls on outside data (EG: using a dictionary outside the "compressed" file)?

rbayer
10-09-2004, 16:48
Isnt the formula combinations = bytes ^ values

so like a 512 byte peice of data ^ 256 possible charactors = 3.742e+693 combinations?

edit - Nm got the wrong formula from someone.
Close, but not quite. If you want to think of it that way, it's actually 256 ^ 512. Think about it this way: there are 256 possiblities for the first byte, times 256 for the second, times 256 for the third, etc, etc. Thus, you get 256*256*256*... = 256 ^ 512.

piotrm
10-09-2004, 17:15
Wow, I can't believe this thread has been running for this long. From the very first posts it was evident to me that this is a scam. The subtle mentions of financial gain. The need for patent money in order to make the big greens, etc... The interesting part is that one real chief delphi person mentioned that he knows aalfabob of this wonderfull compression scheme.

However, I realize odd things can come together once upon a time and that all of these scam-signs were inadvertent and aalfabob thinks he has this algorithm. If so then I apologize. However, I am 99.999% sure that such a bold compression scheme is impossible as was probably states a few times already in this thread.

On a humorous note: take a look at this http://lzip.sourceforge.net/.

I do so register my vote (opinion) as above on this issue.

Ryan M.
10-09-2004, 19:44
Wow, I can't believe this thread has been running for this long. From the very first posts it was evident to me that this is a scam. The subtle mentions of financial gain. The need for patent money in order to make the big greens, etc... The interesting part is that one real chief delphi person mentioned that he knows aalfabob of this wonderfull compression scheme. Well, assuming he is a scammer, all we have to do is wait for his release of the program and he'll expose himself. Don't send him money and you're safe. :)

On a humorous note: take a look at this http://lzip.sourceforge.net/.

I do so register my vote (opinion) as above on this issue.Wow! Thanks for the link! My computer has sooo much room now that I compressed most of my personal files using lzip. I had done that before using gzip, but lzip was a ton better! Compression is a great thing. Why did you say it was humorous? It seems more amazing than anything to me.


NOTE: I didn't really do this. You shouldn't either.

J Flex 188
10-09-2004, 20:54
"Known issues with lzip 1.0:
Attempting to compress the lzip or lunzip programs themselves will trap your system in an infinite loop, and may cause the platter from your hard disks to shoot out of their drive bays at close to the speed of sound. Attempting to uncompress either of the exectuables will suck your computer into a minature black hole, which we believe from our benchmarks (speculatively) exits in an anitmatter universe parallel to our own. If you are interested in exploring this possibilty, please write to us once you get there. "

off of the download section of the page. he was probably referring to that. I know it gave me a chuckle. hehe. as did the whole program being gzipped. =D


Well, assuming he is a scammer, all we have to do is wait for his release of the program and he'll expose himself. Don't send him money and you're safe. :)

Wow! Thanks for the link! My computer has sooo much room now that I compressed most of my personal files using lzip. I had done that before using gzip, but lzip was a ton better! Compression is a great thing. Why did you say it was humorous? It seems more amazing than anything to me.


NOTE: I didn't really do this. You shouldn't either.

Aalfabob
10-09-2004, 22:54
I had like a 3 paragraph post on how it was putting out smaller files and stuff but the server was down right after i sent it so it was lost.

But anyways, ive decided to let someone take a look at how it works and decide if they think it will work. You guys decide who you think would do the best job and the one I can trust.

sanddrag
10-09-2004, 23:29
You guys decide who you think would do the best job and the one I can trust.You can trust me (look at rep and post count)but I am no programmer. Off of the top of my head (please don't get personal with this, I'm just throwing out names) some names I would trust Andy Baker and Andy Brockway but I don't believe they are programmers. But who knows, maybe a non programmer average computer user would be a prime candidate for the test.

The only one thing I'm worried about is trusting you, what if you sent one of us something that would be potentially damaging to our computers.

Don't get me wrong, I respect you, your ideas, and your claims, but I have a hard time trusting any stranger - you know how it is I'm sure. It's not you personally, it is any stranger to me.

Aalfabob
10-09-2004, 23:47
You can trust me (look at rep and post count)but I am no programmer. Off of the top of my head (please don't get personal with this, I'm just throwing out names) some names I would trust Andy Baker and Andy Brockway but I don't believe they are programmers. But who knows, maybe a non programmer average computer user would be a prime candidate for the test.

The only one thing I'm worried about is trusting you, what if you sent one of us something that would be potentially damaging to our computers.

Don't get me wrong, I respect you, your ideas, and your claims, but I have a hard time trusting any stranger - you know how it is I'm sure. It's not you personally, it is any stranger to me.

I understand, Ive sent someone the ideas already, and he doesnt really see anything wrong with the idea ( I dont think ) but he wants to check the source for any bugs or errors that could create a false output. The actual method isnt to complex so I should have news on this by tommarow or the day after.

rbayer
11-09-2004, 00:54
If you want a trusted party, please feel free to use me. I'm a CS and math major at Carnegie Mellon, so I pretty well know my stuff and would be willing to give an honest opinion. Also, my opinion tends to be pretty well-respected when it comes to programming issues around these parts. In any event, send me an email (FIRSTprograms@mn.rr.com) with the compressor/decompressor and I'll test it and post results to CD within a few hours.

Rob

FizMan
11-09-2004, 01:03
You ummm really have no reason to trust me... no good rep... half decent programmer; not great like these guys... I can say you can trust me, but you can't really trust that.

I have nothing good to my name... well, except for this:
http://vgamp.com/storage/fizixmanbig.jpg

COMON! I'm a saxy beast, you GOTTA trust me now to do a good job with your code! <3

(plus your avatar is teh pwn... I'm going to steal it *stoled*)

Andy Baker
11-09-2004, 11:44
If you want a trusted party, please feel free to use me. I'm a CS and math major at Carnegie Mellon, so I pretty well know my stuff and would be willing to give an honest opinion. Also, my opinion tends to be pretty well-respected when it comes to programming issues around these parts. In any event, send me an email (FIRSTprograms@mn.rr.com) with the compressor/decompressor and I'll test it and post results to CD within a few hours.

Rob


For what it is worth, my vote is to let Rob try this thing out. He is not only a long-time FIRSTer, but he has done tireless work to develop things for the FIRST community. He sees beyond his team and his personal gain. His efforts deserve the FIRST community's trust.

Andy B.

Jeff Rodriguez
11-09-2004, 12:46
/me holds up parchment with 'Rob' scribbled on it.

Rob. He's got the know-how.

/me puts parchment in jug and walks away.

Aalfabob
11-09-2004, 17:35
If you want a trusted party, please feel free to use me. I'm a CS and math major at Carnegie Mellon, so I pretty well know my stuff and would be willing to give an honest opinion. Also, my opinion tends to be pretty well-respected when it comes to programming issues around these parts. In any event, send me an email (FIRSTprograms@mn.rr.com) with the compressor/decompressor and I'll test it and post results to CD within a few hours.

Rob

Ill send you a power point presentation of it when I finish it. Shouldnt take to long. But the way it works seems pretty simple to reverse for me, but im going to have a couple of programs output some graphs of data and what not to show all parts of the compression and decompression.


(plus your avatar is teh pwn... I'm going to steal it *stoled*)

Lol :D

rbayer
11-09-2004, 18:22
Ill send you a power point presentation of it when I finish it. Shouldnt take to long. But the way it works seems pretty simple to reverse for me, but im going to have a couple of programs output some graphs of data and what not to show all parts of the compression and decompression.

A power point presentation? Why not just the decompressor/compressor programs. Or at least one of them. Based on your earlier posts/graphs, I'm guessing you've already written the compressor and decompressor, so if you could just attach them to an email, that would be great.

More on why you can trust me not to steal your ideas:
1. Eagle Scout
2. Well-respected around here
3. Other teams have trusted me (on numerous occasions) with their top-secret code for their robot in order to get some feedback on it. In ALL cases, I respected their wishes and never divulged any details. In fact, even if a team came up with a really cool idea (which many did), I didn't even incorporate it into my own team's code out of respect for the original programmers wishes.

Please, if you want people to believe you about this new compression scheme, I'm your guy. Send me a working compressor and/or decompressor and I promise to provide an honest opinion as well as keeping everything completely confidential. I will NOT, however, endorse a compression scheme simply because you know how to use Excel and make lots of pretty graphs.

Also, if all you haven't actually programmed the compressor/decompressor yet, then just send me a short (1-2 paragraph) description of how you plan to do it and I will give you as much feedback as I can on the feasibility of your scheme.

Rob

Aalfabob
11-09-2004, 18:29
A power point presentation? Why not just the decompressor/compressor programs. Or at least one of them. Based on your earlier posts/graphs, I'm guessing you've already written the compressor and decompressor, so if you could just attach them to an email, that would be great.

More on why you can trust me not to steal your ideas:
1. Eagle Scout
2. Well-respected around here
3. Other teams have trusted me (on numerous occasions) with their top-secret code for their robot in order to get some feedback on it. In ALL cases, I respected their wishes and never divulged any details. In fact, even if a team came up with a really cool idea (which many did), I didn't even incorporate it into my own team's code out of respect for the original programmers wishes.

Please, if you want people to believe you about this new compression scheme, I'm your guy. Send me a working compressor and/or decompressor and I promise to provide an honest opinion as well as keeping everything completely confidential. I will NOT, however, endorse a compression scheme simply because you know how to use Excel and make lots of pretty graphs.

Also, if all you haven't actually programmed the compressor/decompressor yet, then just send me a short (1-2 paragraph) description of how you plan to do it and I will give you as much feedback as I can on the feasibility of your scheme.

Rob

I have a full running compressor, but it was really messy, slow, and had no options as in selecting a file name and what not. I have been working on a new compressor portion that used bit operations that i really hadnt thought about when making the first one. So the new one is like 500 lines shorter and alot faster but not finished yet because it does not output it to a file to be recompressed.

I can give you build 1 of the compressor and a word document of how it works if you like. The document pretty much shows how it can reverse and how it compresses / uncompresses. Ill try to send you it by tonight because I have to leave in 30 minutes for a couple of hours but when i come back I should have plenty of time.

rbayer
11-09-2004, 18:34
I have a full running compressor, but it was really messy, slow, and had no options as in selecting a file name and what not. I have been working on a new compressor portion that used bit operations that i really hadnt thought about when making the first one. So the new one is like 500 lines shorter and alot faster but not finished yet because it does not output it to a file to be recompressed.

I can give you build 1 of the compressor and a word document of how it works if you like. The document pretty much shows how it can reverse and how it compresses / uncompresses. Ill try to send you it by tonight because I have to leave in 30 minutes for a couple of hours but when i come back I should have plenty of time.
Sounds great. If you need anything from me between now and then, either email me (FIRSTprograms@mn.rr.com) or IM me (FIRSTisLife) or PM me here. For the sake of everyone else reading these boards, let's try to avoid just posting to the forum if it's something that only concerns you and me.

Rob

Adrian Wong
11-09-2004, 23:47
...and the plot thickens...

rbayer
12-09-2004, 14:38
An update:

Last night, Aalfabob sent me his compressor along with a short description of how it works. I then proceeded to take 5 files from my computer (1 text file, 1 JPG image, 1 exe, 1 rar file, and 1 file of random data) and run them through his compressor. All files ended up being right around 800 bytes. I then emailed the compressed files back to him, and am currently waiting for him to decompress them and send them back to me. I'll let you guys know if they come back the same as the originals as soon as I get them.

-Rob

MikeDubreuil
12-09-2004, 15:15
I'll let you guys know if they come back the same as the originals as soon as I get them.
It would be a good idea to use an MD5 checksum to determine if the files decompressed are identical to the originals.

Fulcrum2000
12-09-2004, 15:33
It would be a good idea to use an MD5 checksum to determine if the files decompressed are identical to the originals.

Also make sure you rename(!) or move the original files before receiving the decompressed ones from Aalfabob... It would be even better to compare the files on a different PC then the one you used for compression.

Rafi A
12-09-2004, 15:34
...and the plot thickens...
then it congeals

Ryan M.
12-09-2004, 17:24
An update:

Last night, Aalfabob sent me his compressor along with a short description of how it works. I then proceeded to take 5 files from my computer (1 text file, 1 JPG image, 1 exe, 1 rar file, and 1 file of random data) and run them through his compressor. All files ended up being right around 800 bytes. I then emailed the compressed files back to him, and am currently waiting for him to decompress them and send them back to me. I'll let you guys know if they come back the same as the originals as soon as I get them.

-RobHow large where each of the respective files? 801 bytes? ;)

Aalfabob
12-09-2004, 19:12
Alright theres something going on with my C++,
A variable is changing after a certain amount of time when im not even referring to it in the hole program except for the start. Ive checked to see if I misplaced a line somewhere and I havent, the compiler says that the variable doesnt even come up in the area where its changing. Any ideas? It seems very odd that its doing this unless another program is somehow editing the memory when its running.

Ryan M.
12-09-2004, 19:22
Alright theres something going on with my C++,
A variable is changing after a certain amount of time when im not even referring to it in the hole program except for the start. Ive checked to see if I misplaced a line somewhere and I havent, the compiler says that the variable doesnt even come up in the area where its changing. Any ideas? It seems very odd that its doing this unless another program is somehow editing the memory when its running.It sounds like a stray pointer to me, but it's hard to know for sure. Does it happen consistently every time (IE is this variable unexpently changed every time, or just once in a while) and does it get changed in the same way? If there are any pointers, make sure you initialize each one before use. I know it sounds obvious, but it happens.

Aalfabob
12-09-2004, 19:32
It sounds like a stray pointer to me, but it's hard to know for sure. Does it happen consistently every time (IE is this variable unexpently changed every time, or just once in a while) and does it get changed in the same way? If there are any pointers, make sure you initialize each one before use. I know it sounds obvious, but it happens.

It happens every time and changes to a different value. I was using this variable to hold the file size and it is not used in any functions. It also seems that it is happening at about the same time from the start of the program. I ran a debug and nothing seems to be wrong but the one variable changing. The variable stays the same through the hole program (Which runs in a loop) for a while and then it just changes.

Aalfabob
12-09-2004, 21:29
RBayer checked over my code a little and found a little problem in it. In one of my if statements i put an = instead off an == which somehow through off the results. With this fixed it seems that the files are getting larger. But in another build ive made without this error they seem to get smaller. There probebly is an error in that one to, but ill check it another day just incase it can get some files to really small sizes.

The problem wasnt really with the ability to uncompress the data to normal state, but the way the data comes out as when it compresses each time. So say you compress it once and it gains compression, the way it is put back into the file makes it impossible to gain any more compression, and will ussually loss some. I dont know how this actually happens but there must be a law to keep random data from being compressed (Not the way stated before with the n-1 or whatever it was).

And as stated before it is impossible to do this lol, if anyone wishes to take a look or anything about how I tryed to do this just leave me your email and ill send you a .doc or I could just post it here if anyone wishes. Hopefully noone has wasted to much time with this thread.

FizMan
12-09-2004, 21:39
Post it here please!

Alan Anderson
12-09-2004, 22:00
RBayer checked over my code a little and found a little problem in it. In one of my if statements i put an = instead off an == which somehow through off the results.
That single feature of C syntax has probably caused more programs to fail than any other bug in the history of computing.
...there must be a law to keep random data from being compressed (Not the way stated before with the n-1 or whatever it was).
I actually thought that explanation was quite well presented.

The "law" is pretty simple: you need a certain number of bits in order to represent a certain amount of information. If a file has redundancy, or repeating patterns, it contains less information than if it had no such patterns. The redundancy can be removed and the file can be represented in a smaller form. Text tends to exhibit statistical patterns. Executable programs and pictures and audio have other patterns. Truly random data by definition has no redundancy, and thus can't be compressed without losing information.

Max Lobovsky
12-09-2004, 22:15
If a file has redundancy, or repeating patterns, it contains less information than if it had no such patterns. The redundancy can be removed and the file can be represented in a smaller form. Text tends to exhibit statistical patterns. Executable programs and pictures and audio have other patterns. Truly random data by definition has no redundancy, and thus can't be compressed without losing information.
That isn't entirely true. Of course, as stated many times in this thread, no compression scheme can reduce the size of all possible files, or, alternately, consistently reduce the size of a stream of random data. The problem with Mr. Anderson's explanation is that the idea of "truly random data" doesn't really have any effect on the matter. Given a finite piece of data, how does one decide if its random or has patterns? Well, it's impossible, but one way we can sort finite pieces of data by randomness is to choose a compression scheme and check if it can compress the piece of data or not. Of course, this will not yield the same classification of randomness as with any other compression scheme.

The reason we can sort of decide if data is random is that we can easily some types of patterns and use that as our "compression scheme."

Aalfabob
12-09-2004, 22:34
Alright here is a little paper I wrote on how it works, I hope it explains it well:

The first step in this process is just to load the file, this can be done right now which will save a ton of time or it can be done for every 128 bytes of the file when they are needed.
After the data is loaded, the compressor works with a 128 byte chunk or less at a time. It is only less at the end of the file because sometimes it can not always get a 128 byte chunk. More on why it breaks it into chunks later on.

The range method listed in the outline works like this:
1) It takes in 2 bytes, for this example they will be named byte1 and byte2.
2) It then uses simple math in saying, if byte1 > byte2 then bit1 = 1 else bit1 = 0.
3) After the first compare it finds the current range in where byte2 can lie in the asiic values of 0 – 255.
4) It then takes the middle of the current range and compares it to byte2 again, logging it with the > or <= (1 or 0).
5) It repeats this until only 1 integer can fit in the range which will be byte2.
Now for this to actually be able to get back byte2 with only the binary number (> or <= in the bits) it needs byte1 to be with it in the file. So when it is outputted, it saves it like this: byte1, byte2’s range in bits.

------------------------------------------------------------------------------------------------------------

From all of the combinations that can be made in the range compression method, 15.5% of them come out with 7 or less bits, 37.5% are 9 bits, and 50.0% are 8 bits.

To make use of this, bit operations are used on a entire 128 byte section to put them into the 15.5% range. Sometimes only some of them can fit in the section which can sometimes make that section (128 Bytes) into a larger size, but not by much.

The bit operations I use are as following:
1) Every 2 bit rotation.
2) Every 4 bit rotation
3) Every 8 bit rotation
4) Opposite values of all bits in a byte
5) Opposite values of the odd bits in a byte
6) Opposite values of the even bits in a byte
7) Every other two bits of a byte are opposite.
8) Undefined for now

So say you put in byte1 and byte2 into the range function to get byte2’s binary range value. You would do all 128 bytes of the section like this and add up the average loss / gain of bits. As you know it could gain bits from this, so I use these operations to help them loss bits since they become totally new combinations after them. So it runs through all 256 values of the bit flips to find the best (The operations are stored in a byte before that section, for example say bit op. 1 and 5 were used, the header would be 10001000 and runs them in that order on decompression.)

Now instead of gaining bits it can actually lose some to gain a small amount of compression.

------------------------------------------------------------------------------------------------------------

So this is how the file setup would work:

Main Run Counter Header = HH;
128 Byte Section Operation Log = L;
Uncompressed byte (byte1) = U;
Compressed byte (byte2) = C;

HH , L , U , C , U , C , U , C, … , L , U , C , …
the first ... ='s The rest of the 128 byte section.
------------------------------------------------------------------------------------------------------------

That’s about it, you should now be able to see that this process is reversible because it uses predefined variables. This includes the byte range method, it reads in 1 bit at a time and calculates the range the same way it did the first time but backwards.

Hopefully this helps.

ahecht
13-09-2004, 00:55
Alfabob, I'm sorry I assumed the worst about you, and thank you for admitting your mistake not just making endless excuses or disappearing. It takes a strong person to admit that they were wrong.

Rickertsen2
13-09-2004, 07:20
Alfabob, I'm sorry I assumed the worst about you, and thank you for admitting your mistake not just making endless excuses or disappearing. It takes a strong person to admit that they were wrong.
I agree.

FizMan
13-09-2004, 09:12
$@#$@#$@#$@#... now I won't be getting any stake in the money... ;_;

Tough luck Aalfabob... but look on the brightside, it was a good learning experience for you!

Fulcrum2000
13-09-2004, 13:27
Alfabob, I'm sorry I assumed the worst about you, and thank you for admitting your mistake not just making endless excuses or disappearing. It takes a strong person to admit that they were wrong.

Have to agree also :)
Sorry for my strong reaction, but I have seen it happen before. In many cases people tried to make money out of their great 'super-compression' discoveries, but you obviously thought what you where doing was possible.

suneel112
13-09-2004, 14:25
I also agree, and (being an aspiring inventor), I know the feeling of apathy and displeasure at my designs very well. I just hope you continue, change any mistakes, and make a great product (don't sell it to MS though).

Aalfabob
13-09-2004, 14:46
Np about the assuming stuff. But one good thing that came from this is that I now know c++ pretty well, which should help a ton when it gets time to start programming those robots.

ThomasTuttle
25-09-2004, 13:31
Aalfabob, your compression scheme is not so bad. I would guess that it works very well on uncompressed photo files with large gradient areas, audio files, and other things where byte values do not jump around a lot. If I understand right, you are basically encoding the difference between two bytes? Since you subdivide the range of values for byte2, very low or very high bytes probably yield a better result than values near 127/128, since there's a smaller range to encode. I have one suggestion: rather than doing: U C U C U C ... and encoding only every other byte, output just the first byte and then encode each next byte in relation to the previous one. So the decompressor knows the first byte and can find each next byte from the compressed data. Also think about alternate ways of encoding the differences between bytes...

Now that we've heard the guts of two different versions of the program explained, perhaps it is time to share a copy of it for people to try? I would be glad to help...

Just my $0.02

ThomasTuttle
03-10-2004, 14:32
Why has this thread suddenly become so quiet?

ErichKeane
03-10-2004, 16:21
Because the topic ended...?