[9 Jan 2011 by Keshav Shetty | 3 Comments | 5,597 views]
Random data compression – Lowerband & upper band filtering

Wish you all happy new year 2011.
in this article I will explain how a set of unique elements can be represented with less bits using filtering elements into two sets of upper and lower band.

Few users suggested to change the title, because title says “Random data” where as article describes about unique elements. Let me clarify that million random digit contains average 90-110 bytes of duplicate or non unique for every 256 bytes. If we could represent unique data set <=202 byte(worst case), we can use remaining 54 byte to …

Read the full story »

Data compression, Featured, Headline »

[9 Jan 2011 by Keshav Shetty | 3 Comments | 5,597 views]
Random data compression – Lowerband & upper band filtering

Wish you all happy new year 2011.
in this article I will explain how a set of unique elements can be represented with less bits using filtering elements into two sets of upper and lower band.

Few users suggested to change the title, because title says “Random data” where as article describes about unique elements. Let me clarify that million random digit contains average 90-110 bytes of duplicate or non unique for every 256 bytes. If we could represent unique data set <=202 byte(worst case), we can use remaining 54 byte to …

Data compression, Featured, Headline »

[3 Oct 2010 by Keshav Shetty | 2 Comments | 5,286 views]
Random data compression – Risk based selection

In my previous article I mentioned that once tree structure is ready we can fill and generate the random input along with one bit diff encoding.
As I observed as tree reaches mid levels, available options increases to 64+ elements, which will result into minimum 6bits per selection.
With respect to million random digit this approach will result into not more than 27 duplicates can be accommodated.
We need a better approach to selecting elements instead of one bit dif encoding.
I used the modified version of one bit dif encoding, I call it …

Data compression, Featured, Headline »

[3 Oct 2010 by Keshav Shetty | 6 Comments | 9,925 views]
Random data compression – Using Tree

In this article I will explain how a B Tree can be used to model the unique numbers. This will be interesting article as best case scenario requires only 64bytes.
As you are aware B Tree or a balanced tree contains maximum two child for each node, all nodes left side will be lower than current node and all right side nodes will be higher than current node.
How can we use B Tree to model the random unique numbers?
In case of 256 unique numbers we will have 256 nodes, left most …

Data compression, Featured, Headline »

[2 Oct 2010 by Keshav Shetty | 4 Comments | 5,583 views]
Random data compression – diff encoding with merge sort

Last article didn’t go well with my readers, so I decided to elaborate with example.
Lets assume we have eight random input say 5, 1, 3, 1, 2, 3, 0, 5
In this input 4th, 6th and 8th element are non unique i.e they already appeared at least once.
For each number we need a identifier or a flag to identify it as unique or duplicate. So we will have bit info like 0 0 0 1 0 1 0 1
When 4th element appears which is duplicate of one of the past unique, …

Data compression, Featured, Headline »

[1 Oct 2010 by Keshav Shetty | 8 Comments | 5,076 views]
Random data compression – One bit diff encoding continued

This is continuation of my previous article on Random data compression-One bit dif encoding.
Sorry for the long gap between this and last article. I was too busy.
After my my previous article many had asked how I achieved compression up to 42 bytes duplicates.
Here we go with the algorithm I used.
1. Read 256 bytes from the input.
2. Sequential process and mark unique and duplicates, we need 256 bits or 32 bytes. (Can’t stop this loss)
3. For each marked as non unique data, remember the position using one bit diff encoding. (Details …