## Articles tagged with: lossless data compression

### Data compression, Featured, Headline »

Wish you all happy new year 2011.

in this article I will explain how a set of unique elements can be represented with less bits using filtering elements into two sets of upper and lower band.

Few users suggested to change the title, because title says “Random data” where as article describes about unique elements. Let me clarify that million random digit contains average 90-110 bytes of duplicate or non unique for every 256 bytes. If we could represent unique data set <=202 byte(worst case), we can use remaining 54 byte to …

### Data compression, Featured, Headline »

In my previous article I mentioned that once tree structure is ready we can fill and generate the random input along with one bit diff encoding.

As I observed as tree reaches mid levels, available options increases to 64+ elements, which will result into minimum 6bits per selection.

With respect to million random digit this approach will result into not more than 27 duplicates can be accommodated.

We need a better approach to selecting elements instead of one bit dif encoding.

I used the modified version of one bit dif encoding, I call it …

### Data compression, Featured, Headline »

In this article I will explain how a B Tree can be used to model the unique numbers. This will be interesting article as best case scenario requires only 64bytes.

As you are aware B Tree or a balanced tree contains maximum two child for each node, all nodes left side will be lower than current node and all right side nodes will be higher than current node.

How can we use B Tree to model the random unique numbers?

In case of 256 unique numbers we will have 256 nodes, left most …

### Data compression, Featured, Headline »

Last article didn’t go well with my readers, so I decided to elaborate with example.

Lets assume we have eight random input say 5, 1, 3, 1, 2, 3, 0, 5

In this input 4th, 6th and 8th element are non unique i.e they already appeared at least once.

For each number we need a identifier or a flag to identify it as unique or duplicate. So we will have bit info like 0 0 0 1 0 1 0 1

When 4th element appears which is duplicate of one of the past unique, …

### Data compression, Featured, Headline »

This is continuation of my previous article on Random data compression-One bit dif encoding.

Sorry for the long gap between this and last article. I was too busy.

After my my previous article many had asked how I achieved compression up to 42 bytes duplicates.

Here we go with the algorithm I used.

1. Read 256 bytes from the input.

2. Sequential process and mark unique and duplicates, we need 256 bits or 32 bytes. (Can’t stop this loss)

3. For each marked as non unique data, remember the position using one bit diff encoding. (Details …

### Data compression, Headline »

This is continuation of my previous article on Random data compression-How to use merge sort?.

As I mentioned earlier – it is hard or impossible(as of now) to compress million random digit. As per my analysis million random digit file contains around 90 – 110 bytes duplicates or repeated number within every block of 256 bytes. When input data is very pure or near pure(I mean uniqueness) or when input data is highly polluted or noise(I mean duplicates), then it is easy to compress. May be we can borrow the idea …

### Data compression, Headline »

This is continuation of my previous article on Random data compression possibilities.

Some of the readers asked how reverse merge sort (merge unsort) can be used to represent 256 unique values using 128bytes(best case) or 224.25(worst case).

As illustrated in previous article using merge sort we sort the random input and store bit information of the list from where we picked the smaller number. Actually we are reshuffling the original position and stored bit information represents how a input changed its position from original list to sorted list. Lets take an example …

### Data compression, Headline »

This is continuation of my previous article on Random data compression possibilities.

Compression of unique values

As I indicated in previous article if for every 256 bytes each value appears only ones, then we can achieve the compression using any of the below techniques.

1. Using insertion sort – by remembering the position. Using this technique for every 256 byte – we can save exactly 32 bytes, i.e for first element no need to remember any position, for next value one bit sufficient to remember weather current element inserted after or before previous …

### Data compression, Featured, Headline »

Random data compression lossless – Is it possible?

The answer is NO (…. and yes!!!)

This is little lengthy blog to show and prove the difficulties and possible ways. This article divided into three section.

(Before you read further I suggest you to read Mark Nelson “The Million Random Digit Challenge“)

1. Why it is not possible to compress random data. (brief introduction to Kolmogorov theory)

2. Why it is possible (Quantum theory, matter and anti matter introduction)

3. The future for compression and possible solution with transition representation and unsorting techniques

If you want to skip …