Random data compression – Using Tree
In this article I will explain how a B Tree can be used to model the unique numbers. This will be interesting article as best case scenario requires only 64bytes.
As you are aware B Tree or a balanced tree contains maximum two child for each node, all nodes left side will be lower than current node and all right side nodes will be higher than current node.
How can we use B Tree to model the random unique numbers?
In case of 256 unique numbers we will have 256 nodes, left most node starts with zero and right most node 255. For each node we need two flags to identify weather it contains left node and right node.
Recursively we can build the tree structure. Once tree structure is ready we can fill each node with value by using pre-order traverse.
Lets take an example, Assume we have random 16 inputs (7, 12, 9, 3, 13, 6, 8, 15, 1, 4, 2, 11, 0, 14, 5, 10)
Respective tree will be

If we knew tree structure we can refill values of each node by starting from left most to rightmost.
However after building tree we need addition info find which child node appears next.
In the above example we are sure that 7 appeared first in the input array. Next number either 3 or 12 (one of the open child).
We need one bit to select either 3 or 12(That bit will be “1″). we end up with 12 as next input. Now as next input should be within 3, 9, 13. We can go on applying one bit dif encoding to select proper child.
Whenever a child is selected its children are added to available options for selecting next node.
This way we need 256bits(for node flag) + one bit dif encoding info of each child.
Best case and worst case scenario
Using tree in best case scenario i.e each node has maximum only one node, that happens when input is in ascending order we need just 2^(n+1) bits i.e for 256 input we need 512bits or 64 bytes. JUST 64 byte for 256 unique values!!!.
In worst case scenario we need 64byte + bit diff info of worst case = 2047bits (i.e ~256 bytes) No saving.
Worst case scenario happens when child node added to available options is consumed at the end.
Using this technique by rearranging million random digit I could compress up to 27bytes non unique numbers. Worst than merge sort. This technique will be effective provided we find a way to increase the level of tree which results into nodes with single child more. I will write more about this later.










You claim worst case is 2047 bits = 255.875 bytes < 256 bytes! If you can ever come up with something that saves even just 1 bit every time, you've solved the problem.
Oh, nevermind. Forgot these are unique elements, not random.
The structure described is a binary tree, not B-Tree!
Traverse Keshav…
[...] find a way to increase the level of tree which results into nodes with single c [...]…
Leave your response!
About
Check my other blog at:
Subscribe
Get this Wordpress newsletter widget
for newsletter software
User menu
Pages
Categories
Blogroll
Recent Posts
Recent Comments
Most Commented
Most Viewed