regards

]]>I have gone through your articles, I had similar idea but not as clear as of yours and in depth described. Using your statically generated symbol table I have reached to point where:

Total Symbols : 162

Hard Symbols : 90 //that are not duplicate

Max position of duplicate : 227

Max duplicates of single symbol : 4

Out Length bits : 1610 bits without encoding duplicates

Out Length Bytes : 201.25 bytes

————————————–

Task : 94 bytes to be encoded in 63

Now the task is encoded only 94 duplicates that are definitely known. You mentioned “If we could represent unique data set <=202 byte(worst case), we can use remaining 54 byte to represent duplicates.". Please explain, do you propose huffman codes or among statically generated that you posted, and remember total number of symbols in duplicates are 72 that is because total symbols are 162 and 90 are hard symbols that don't repeat. As per your bitdiffmap here is proposed codes for 72 symbols:

Pos

regards

]]>About storing position I mentioned

“Read 256 block of data, mark the duplicates, remember these relative to current position of the number (so we don’t require 8bits for position)”

It is the same thing you are referring, i.e if the third item is duplicate, then that value present in earlier two numbers, so we need just 1 bit to store the position.

About the implementation I already have a working copy with max 42 duplicates can be compressed, I want to fine tune further and announce in this blog next week.

Thanks & regards

Keshav K Shetty

Regarding: “How to use above theory when all numbers are not unique?”.

There are some more effective solution to store information about unique/notunique number positions.

You can use combinatorics (I use combinadic – wikipedia have nice rticle about that).

Let’s say 100 notunique and 156 unique values: it will take much less bits than 256 – but there must be specified count of unique numbers which takes 7 or 8 bits.

The way to store information about repeated values. You must take unique value positions to represent notunique number.

Let’s say we have sequence (0-255):

4 6 10 32 6 85 10 2 …

we know which number is unique – to represent this I’ll use bit sequence to show unique numbers:

1 1 1 1 0 1 0 1 …

all zeroes (not unique) could be stored in using much less bits than originally. In this example second number 6 could be stored using 2 bits, because there are only 4 unique values, and so further…

The only thing I did not use is sorting for data storing, but this field is very interesting for me.

The only one advice – try to implement (program) this, before go crazy about good results on paper 😉

Best regards!

Raimonds

]]>