Home » Data compression, Featured, Headline

Random data compression – Using Tree

3 October 2010 by Keshav Shetty 10,374 views 6 Comments

In this article I will explain how a B Tree can be used to model the unique numbers. This will be interesting article as best case scenario requires only 64bytes.

As you are aware B Tree or a balanced tree contains maximum two child for each node, all nodes left side will be lower than current node and all right side nodes will be higher than current node.

How can we use B Tree to model the random unique numbers?

In case of 256 unique numbers we will have 256 nodes, left most node starts with zero and right most node 255. For each node we need two flags to identify weather it contains left node and right node.

Recursively we can build the tree structure. Once tree structure is ready we can fill each node with value by using pre-order traverse.

Lets take an example, Assume we have random 16 inputs (7, 12, 9, 3, 13, 6, 8, 15, 1, 4, 2, 11, 0, 14, 5, 10)

Respective tree will be


If we knew tree structure we can refill values of each node by starting from left most to rightmost.

However after building tree we need addition info find which child node appears next.

In the above example we are sure that 7 appeared first in the input array. Next number either 3 or 12 (one of the open child).

We need one bit to select either 3 or 12(That bit will be “1”). we end up with 12 as next input. Now as next input should be within 3, 9, 13. We can go on applying one bit dif encoding to select proper child.

Whenever a child is selected its children are added to available options for selecting next node.

This way we need 256bits(for node flag) + one bit dif encoding info of each child.

Best case and worst case scenario

Using tree in best case scenario i.e each node has maximum only one node, that happens when input is in ascending order we need just 2^(n+1) bits i.e for 256 input we need 512bits or 64 bytes. JUST 64 byte for 256 unique values!!!.

In worst case scenario we need 64byte + bit diff info of worst case = 2047bits (i.e ~256 bytes) No saving.

Worst case scenario happens when child node added to available options is consumed at the end.

Using this technique by rearranging million random digit I could compress up to 27bytes non unique numbers. Worst than merge sort. This technique will be effective provided we find a way to increase the level of tree which results into nodes with single child more. I will write more about this later.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)


  • Jeff said:

    You claim worst case is 2047 bits = 255.875 bytes < 256 bytes! If you can ever come up with something that saves even just 1 bit every time, you've solved the problem.

  • Jeff said:

    Oh, nevermind. Forgot these are unique elements, not random.

  • R said:

    The structure described is a binary tree, not B-Tree!

  • Traverse Blog said:

    Traverse Keshav…

    […] find a way to increase the level of tree which results into nodes with single c […]…

  • Ibrahim said:

    Is a binary tree, please correct it is not btree.


  • Keshav Shetty (author) said:

    Yes, It is not btree, we can use balanced tree, I will amend the contents.

Leave your response!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.