New compression method

I made a new way to compress files a couple of days ago, ive made the program and have tested them. The only problem is that most people will not believe that what i have done is possible and i cant actually send them how it works because i havent patented it yet. But heres my question, If you are able to compress the same file over and over again would it be possible to reach very small sizes? Ive tryed a 1 Mb file so far and it reached 515 bytes, and it is fully reversable.

It’s possible that recompressing a file can make it smaller, depending on the compression scheme used, but, <edit>I think</edit> most modern compression schemes compress as much as they can on there first pass (sometimes because there first pass really includes several passes at compressing the data).

Anyway, compressing a file from 1Mb to 515 bytes doesn’t really say anything about your compression scheme. If you give me a file of any size, I can very simply write a compression scheme to compress it to 0 bytes. If you can take arbitrary files and consistently compress them to a small size, then you have a compression scheme of merit.

I don’t know. When a brand new member with no team number comes in here and, in their very first post, posts the equivalent of digital snake oil, I become a bit suspicious. I’m not sure what this guy is trying to pull (such as whether he is an existing member who set up a dummy account for this or just a random troll), but I can assure people that something is up.

Besides, with most compression methods, compressing an already compressed file results in a slightly larger file size.

I agree that it is a little unbelievable. Alaphabob, I’d trust almost anyone here with something that isn’t protected yet. Just ask anyone with 5+ “rep(utation) points” to look over your algorithm. (Those are the green dot’s on the left of the darker-grey bar above their post.) I’m not saying that less rep’d people aren’t trustworthy, but it’s something to reasure you.

That’s because, with all most all compression schemes, they work by creating a library, at some point in the file, and put multi-byte strings into it. The rest of the file is then encoded by putting a one byte key in place of the original string. If this string happens more than once, then you save space. For instance, if you have a string in a text file like “happy” three times, then you put “happy” in your library once and 3 one byte markers in the text for a total of around 8 bytes (there’s probably also buts separating different items in the library, etc, which is why I say “around”). The original three happies took up 15 bytes.

When you recompress the file, it ends up compressing a file with no, or very few, redundencies, which are what make the library method work so well.

EDIT:
Why the heck did I choose happy? Why not something cool, like FIRST? :smiley:

Alright, first off I’m not trying to pull anything. Why would I create an account and waste my time to mess around with people, a friend gave me this site to ask a few people if they think it would be possible. Second off, my compression theorm isnt like the others, it doesnt run on how many times certain charactors show up in a file as a hole. This makes it able to recompress the same file over and over again almost always gaining a compression. This also means that it can work on any type of file, .zip, .exe, .jpg, ect. But it does reach a limit to the file sizes it can reach, with the current program I have made it can compress any file type and size down to 508 bytes and ussually fluctuates around 508 and 515 bytes. Because a file is larger then another doesnt mean it can not hit this limit, it just means that more attemps must be made to reach it. I have some data charts if anyone wishes to see them.

Is it a lossy or lossless?

Lossless, what would be the point of compressing a data file if it was corrupted when uncompressed?

I am going to see if any large buisnesses are interested in this and if not I will make it open source, this is one of the reasons why I am trusting noone. Even if someone is trustful, there is still always that very small chance of it getting out.

Well, why did you post anything at all if you are trusting no one? Without your algorithm it is very difficult to help you.

Compression is like factoring. In factoring, you take a complex equation and can define it simply by its solutions. In compression you do a similar thing. However, you will eventually run into a floor no matter how good a compression system you use. This is due to the fact that the information is still there, just in a compressed format. I am guesing your algorithm has some sort of system for recording what it has done in order that it can be undone. This file created requires space. The more times you compress, the closer the file is to becoming “prime” toward the algorithm. Eventually you reach a point where the information to expand the file makes the file large enough that compression will not make the whole set any smaller.

So basically what ryan morehart said, but in more generic terms.

I understand the problem of you not being able to help me because im not releasing how it works. Are there any ways that I would be able to make it open source for a little while till I am able to open it up for commericial uses? I want to be able to keep the theorm if I ever decide to make money with it and I want to make sure noone can steal it. Would making it open source save it from being stolen? Ive looked into patents but there is no way I can afford the $4000 to get one and the $1500 to keep them updated every couple of years. If anyone has a link or something to help me out here please post it.

Ill be happy to post all the information needed about it as soon as its safe. And I do understand that this seems impossible but trust me its not :slight_smile: .

Well, here’s a site which lists the most commonly used open source licenses. Read through them and see what you like. Make sure you choose one which prevents the commercial reuse of the source code.

Edit:
Hm, actually, according to them, “open source” licenses do not prevent commercial use. Whatever… :rolleyes:

Go to www.maximumcompression.com and run your utility against their test files. You’ll be able to compare your results against a fairly large set of benchmarks. Post your results. If you really beat those benchmarks then you’ll need to have a few volunteers verify your results. For that you can distribute a binary without source and a non-disclosure.

Alright let me rebuild my program (7 days max) because i have a couple new ideas i would like to try out with it. And i will post up the scores by then or earlyer. Hopefully I will be able to get it done alot sooner but it depends on how much work I have.

I’m suprised that site doesn’t have a file full of pseudo-random data. While very complete in testing different programs, the files it chooses seem rather arbitrary.

I’m not sure what you mean by “pseudo”, but it’s mathematically impossible for a compressor to consistently compress random data. I don’t know much information theory at all, but i know that random data is essentially pure information, and there is no way to encode pure information into a smaller amount of information. (This is incidentally why already compressed files (including lossy media compression for images, sound, etc) don’t compress well. They have already converted the data to near pure information.)

The site’s choice of files seems pretty logical to me. It has a selection of files that are commonly compressed. It might be interesting if they tried recompressing some very common compressed file(s), maybe like the Fedora Linux distribution, or Microsoft Office 2k3.

Well see :slight_smile: , sometimes the impossible can be done.

Turns out this entire time Aalfabob has been compressing files with the same sequence of information repeated over and over again :wink:

j/k So are you consistently compressing files to around 508-515 bytes despite major ranges in their uncompressed format? A 5 meg file would compress to 514 bytes and a 500kb file would also compress to say, 510b? I find that very fascinating…

Yep. the only thing that changes when changing the files size is how many times it needs to be ran through the compressor to get the same results as say a file half its size. To keep track of this, 2 bytes are put as the main header of the file to keep track of how many times its been compressed (Up to 65535 times).

The last time I ran it on a 1 meg file, it had taken around 818 runs to get down to the 515 Bytes, but most of these runs were spent between 2500 bytes and 515 bytes due to only gaining 1 - 10 bytes per try. I made some graphs from the logs put out by that program. If i can get my friend to give me some space again on his server i can post them.

Right now I’m reprogramming the compression portion of the program because the last one was pretty slow from the way i was reading from the file. The new one pretty much just has to run a couple of if then statements and change a byte. Hopefully it will beat the common compressors today in speed also but ill have to see how well i can get it programmed. This next way should also be able to chop of around 2% - 5% of the file every pass.

edit - Ill just put up the graphs here to give you an idea what im talking about. Also, the second chart actually loses some compression some times but the next pass ussually gains it back.







In case any of you are still in doubt about this “new compression scheme”, I encourage you to read this discussion of exactly this matter, (comp.compression Frequently Asked Questions (part 1/3)Section - [9] Compression of random data (WEB, Gilbert and others))

A quote from this document:

It is mathematically impossible to create a program compressing without lossall files by at least one bit (see below and also item 73 in part 2 of this FAQ). Yet from time to time some people claim to have invented a new algorithm for doing so. Such algorithms are claimed to compress random data and to be applicable recursively, that is, applying the compressor to the compressed output of the previous run, possibly multiple times. Fantastic compression ratios of over 100:1 on random data are claimed to be actually obtained.

Wow, had no clue anyone else was claiming that they could do this. Ive read the paper and from what I understand, my method does not fit his arguement for the most part. It uses no fancy math, no special numbers which im guessing that most of the people try to work with because their special and really had no proof or ideas that they would actually do something. But I have to admit, alot of the processes he talks about that are flawed I had thought about at one time and I turned them down because it was easly noticed that they were flawed. I check my ideas from start to finish before I go around and claiming that I invented something that works.

Another arguement that he made was that some of these methods used groups of bits to show compression. Most of these methods are flawed majorly because they had no proff that they could ever be reversed. But I do know for a fact that a computer can tell if a start of a group of bits are 1, 00, or 01 which makes them easly seperated. There is also another method which I had made and it also is a way to seperate the bytes but Ill explain that in my post when my program is finished (Was thrown away due to me finding a better faster way).

If this is truly just about if I am lieing about this, just give me 7 days as I had posted earlyer and the new program will be finished. But just because some guy writes a paper on if this is possible or not does not mean he knows 100% what hes talking about. I am positive that he has not tested every single way you can compress a file which makes his arguement invalid about there being no possible way. Every once in a while something impossible happens, Im sure you can think of hundreds of examples, but please just give me the time to prove this. It think the short wait of only 7 days isnt going to kill you. And if you still really think its so impossible thats there is no chance at all, you can exit this post and forget all about it.

edit - Btw when i finish the program I will post a link to a video of the program running, and if thats not enough because some people are going to say, “Ahh that can be easly faked” Ill post up some other way that it can be proven unless i somehow get the money for a temporary patent, then ill just put the source up for everyone and the method.

You have piqued my intrest but i won’t believe anyhting till i see it. Honestly jsut think about what you are claiming and give us reason why we should believe you and not jsut think you a nut? We have been given lots of empty promises but no hard evidence. One question… Why do you choose these forums to post your discovery? Sure we are all nerds here and many of us are interested in this sort of thing but i am sure there are forums dedicated to compression. Why Chiefdelphi? I truly would like to believe that your claims are real but until i see proof you are a nut in my book. This reminds my of the human cloning ppl a while back who suddenly vanished into the ethersphere when asked to prove what they had done. I have seen too many things like this that all turn out to be nothing. If i were claiming what you are claiming you would think i was nuts too so you can’t realyl blame us. I’ll admit that many things that have in the past been dismissed at utterly impossible are now intergral parts of our everyday lives. Maybie this is one of them but somehow i am doubtful. I sincerely hope you prove me wrong.

I’m not really sure what i talkign about and i may be completely wrong, but could this possible fall under a copyright which is a much easier process. I think there is even some sort of implied copyright on everythign that you don’t actualyl have to file for but it won’t hold up in court very as well as an official one. I looked into this at one point a loong time ago but i don’t really remember much.