About storing position I mentioned

“Read 256 block of data, mark the duplicates, remember these relative to current position of the number (so we don’t require 8bits for position)”

It is the same thing you are referring, i.e if the third item is duplicate, then that value present in earlier two numbers, so we need just 1 bit to store the position.

About the implementation I already have a working copy with max 42 duplicates can be compressed, I want to fine tune further and announce in this blog next week.

Thanks & regards

Keshav K Shetty

Regarding: “How to use above theory when all numbers are not unique?”.

There are some more effective solution to store information about unique/notunique number positions.

You can use combinatorics (I use combinadic – wikipedia have nice rticle about that).

Let’s say 100 notunique and 156 unique values: it will take much less bits than 256 – but there must be specified count of unique numbers which takes 7 or 8 bits.

The way to store information about repeated values. You must take unique value positions to represent notunique number.

Let’s say we have sequence (0-255):

4 6 10 32 6 85 10 2 …

we know which number is unique – to represent this I’ll use bit sequence to show unique numbers:

1 1 1 1 0 1 0 1 …

all zeroes (not unique) could be stored in using much less bits than originally. In this example second number 6 could be stored using 2 bits, because there are only 4 unique values, and so further…

The only thing I did not use is sorting for data storing, but this field is very interesting for me.

The only one advice – try to implement (program) this, before go crazy about good results on paper 😉

Best regards!

Raimonds

]]>