in this article I will explain how a set of unique elements can be represented with less bits using filtering elements into two sets of upper and lower band.

Few users suggested to change the title, because title says “Random data” where as article describes about unique elements. Let me clarify that million random digit contains average 90-110 bytes of duplicate or non unique for every 256 bytes. If we could represent unique data set <=202 byte(worst case), we can use remaining 54 byte to represent duplicates. Refer previous article here |

Coming to filtering technique, In every set of size 2^n, we will have exactly 2 set where half elements belongs to upper part and half elements belongs to lower part. e.g: In a set 256 unique values we can filter first or upper array containing element between 0 to 127 and lower array 128 to 255. By parsing and remembering using a flag(false for upper and true for lower set) weather element moved to upper or lower set we can locate the original position. Apply this method for newly created set recursively.

Lets take the same the input used in previous article of size 16, i.e: (7, 12, 9, 3, 13, 6, 8, 15, 1, 4, 2, 11, 0, 14, 5, 10)

Element 0-7 belongs to upper set and 8-15 belongs to lower set. Upper set or lower set will have equal number of elements of size (input set size)/2. When a element moves to upper set we can remember position as false and if element moves to lower set we can remember position as true. Once either upper or lower set size touches the size of the element in new set (i.e actually half of the original input size) we can stop parsing further.

In the above example – Total number of element=16, Size of new subset = 8, Elements in first/upper set is 0-7 and elements in lower set 8-15. After applying first time filter we get following output.

This method is similar to reverse merge sort, difference is here we use top to bottom approach, whereas merge sort uses bottom to up.

Using this method we can represent 256 unique elements and save 128 byte(best case) to 31.75 byte (worst case) as illustrated below.

Worst case happens when last element of each set belongs to other set and vice versa.

As I said beginning this method not sufficient to address million random digit, where I need to save at least 54 byte for every 256 byte.

In the next article I will write about using unidirectional graph how we can represent set of unique elements. Till then please pass your comments.

]]>As I observed as tree reaches mid levels, available options increases to 64+ elements, which will result into minimum 6bits per selection.

With respect to million random digit this approach will result into not more than 27 duplicates can be accommodated.

We need a better approach to selecting elements instead of one bit dif encoding.

I used the modified version of one bit dif encoding, I call it as risk based selection.

Lets assume that I have possible available selection (4, 9, 40, 102, 150, 165, 172, 197, 245) and actual selection was 245. Using one bit dif encoding I need 3 bit i.e “111” to select 245 among these input.

In risk based selection I will try to make two list with maximum numbers in one list and remaining in another list based on particular bit.

If you analyze below image you will understand that 4th bit in available option results into 7 zeros, If I knew 4th bit of number which I am selecting is “1” then I will left with only two remaining numbers.

i.e in available number 4th bit has maximum variance. (As I am aware there is other 6th bit as 1 as maximum or 7th bit as maximum zero, lets select first possibility)

The number which I am going to select i.e 245 has 4th bit as “1”, If I store this bit my next available options reduces to two i.e 150, 245.

As you can see in the image red marked bit are selected for next round. I repeat same procedure for remaining options until I am left with only one option.

Green box indicates 2nd level filtering. After second bit selection I will left with 245 which is the number I am looking for. This way I need only two bit (i.e “11”) to select particular node.

This selection is more effective than bit dif encoding which requires 3 bit compared to 2 bit.

Note: In worst case scenario we may end up with 8bits for each node. You can see yourself by applying this for below example

Assume we have these available option (1,2,4,8,16,32,64,128) and number to be selected is 1. We will end up with 8 bits.

]]>As you are aware B Tree or a balanced tree contains maximum two child for each node, all nodes left side will be lower than current node and all right side nodes will be higher than current node.

How can we use B Tree to model the random unique numbers?

In case of 256 unique numbers we will have 256 nodes, left most node starts with zero and right most node 255. For each node we need two flags to identify weather it contains left node and right node.

Recursively we can build the tree structure. Once tree structure is ready we can fill each node with value by using pre-order traverse.

Lets take an example, Assume we have random 16 inputs (7, 12, 9, 3, 13, 6, 8, 15, 1, 4, 2, 11, 0, 14, 5, 10)

Respective tree will be

If we knew tree structure we can refill values of each node by starting from left most to rightmost.

However after building tree we need addition info find which child node appears next.

In the above example we are sure that 7 appeared first in the input array. Next number either 3 or 12 (one of the open child).

We need one bit to select either 3 or 12(That bit will be “1”). we end up with 12 as next input. Now as next input should be within 3, 9, 13. We can go on applying one bit dif encoding to select proper child.

Whenever a child is selected its children are added to available options for selecting next node.

This way we need 256bits(for node flag) + one bit dif encoding info of each child.

**Best case and worst case scenario**

Using tree in best case scenario i.e each node has maximum only one node, that happens when input is in ascending order we need just 2^(n+1) bits i.e for 256 input we need 512bits or 64 bytes. JUST 64 byte for 256 unique values!!!.

In worst case scenario we need 64byte + bit diff info of worst case = 2047bits (i.e ~256 bytes) No saving.

Worst case scenario happens when child node added to available options is consumed at the end.

Using this technique by rearranging million random digit I could compress up to 27bytes non unique numbers. Worst than merge sort. This technique will be effective provided we find a way to increase the level of tree which results into nodes with single child more. I will write more about this later.

]]>Lets assume we have eight random input say 5, 1, 3, 1, 2, 3, 0, 5

In this input 4th, 6th and 8th element are non unique i.e they already appeared at least once.

For each number we need a identifier or a flag to identify it as unique or duplicate. So we will have bit info like 0 0 0 1 0 1 0 1

When 4th element appears which is duplicate of one of the past unique, that should exist within (5, 1, 3)

Similarly 6th element should exist within (5, 1, 3, 2) and same applies to 8th element which should exist within (5, 1, 3, 2, 0)

Lets go back to the algorithm mentioned in my previous article.

1. Read 256 bytes from the input (In the above example we will read 8 input of each 3 bit)

2. Sequential process and mark unique and duplicates, that results into 0 0 0 1 0 1 0 1. We need to store this info to be used at the time of decompression. For above 8 input we need 8bits (for 256 input we need 256bits).

3. For each marked as non unique data, remember the position using one bit diff encoding.

i.e for 4th element which is duplicate of 2nd element, using one bit diff encoding we get “01” and for for 6th element we get “10” for 8th element we get “000”. The complete table of one bit diff encoding available here

4. Remove non unique data and rearrange the list by moving forward, at the end empty space fill with missing numbers. Refer above image bottom array is rearranged one.

Now the array contains only unique elements, we can apply reverse merge sort to sort and store these bit info.

So compressed file contains unique or duplicate flags generated in step(2) + one bit diff encoding info of duplicate numbers generated in above step(3) + merge sort bit info.

While decompressing use merge sort bit info to re create unique array and use unique or duplicate flags to note which was duplicate and then use bit dif info refill duplicates.

I hope this example clears many of your doubts.

Next article will be interesting one, which is based on B Tree.

]]>Sorry for the long gap between this and last article. I was too busy.

After my my previous article many had asked how I achieved compression up to 42 bytes duplicates.

Here we go with the algorithm I used.

1. Read 256 bytes from the input.

2. Sequential process and mark unique and duplicates, we need 256 bits or 32 bytes. (Can’t stop this loss)

3. For each marked as non unique data, remember the position using one bit diff encoding. (Details in previous article) Requires much lesser bits than 8bits depending on non unique data appearance. We will consume lesser bits if the non unique data appears at the beginning rather than at the end of the list.

4. Remove non unique data and rearrange the list by moving forward, at the end empty space fill with missing numbers.

5. Apply merge sort and store bit information of sorting data. (Refer how to use merge sort article)

While decompressing

1. Use merge sort data to recreate the unique data list.

2. Apply non unique flags to rearrange the list.

3. Use one bit dif encoding to fill duplicate or non unique numbers.

Using above technique I could compress random data up to 42 bytes. I repeat up to, not always. I mean if the non unique appears at the beginning of the list we can accommodate up to 42 bytes, whereas if the non unique appears at the end of the list we can hardly compress data up to 12~14bytes.

Note: This method is not effective for Million Random Digit from RAND. We need different approach.

]]>As I mentioned earlier – it is hard or impossible(as of now) to compress million random digit. As per my analysis million random digit file contains around 90 – 110 bytes duplicates or repeated number within every block of 256 bytes. When input data is very pure or near pure(I mean uniqueness) or when input data is highly polluted or noise(I mean duplicates), then it is easy to compress. May be we can borrow the idea from chemistry. i.e if you refer periodic table lighter elements can be combined using nuclear fusion(Stars) or higher elements can be broken using nuclear fission(Uranium). In both cases energy released which we can relate to compression, however if you refer periodic table you will understand that elements with highly stable nucleus hard to break or fuse. e.g: Lead.

Same pattern I can see in million random digit where if uniqueness reaches extreme high or extreme low we can easily compress the data.

My so far developed compressor could achieve compression with doubles less than 42 or more than 156 in every 256 bytes.

Before I go further about my compression theory, I have to write one more article about one bit difference variable encoder. This is one of the known algorithm based on probability or possibilities.

About bijective you might have observed it is one on one mapping, however this fails to address relation within elements of donor domain set or acceptor domain set.

The one bit variant encoding based on the possibilities. i.e if we have 4 unique possibilities then we need 2 bits (00,01,10,11) to represent all possible values. How many bits do we require when we have 5 possibilities or say 6 possibilities?

In one of my earlier article I mentioned usage of insertion sort or bubble sort stating using this technique we can represent for every possible combination we will have 1792 bits or 224bytes. But I need to re factor this statement. We need much lesser bits.

Assume that you have a sorted list of 256 all unique values, when a random input comes lets say 26, you pick from the initialized list, actually you pick the position, Since it is first input and there are 256 possibilities we need eight bit to represent the position. For next random input we need to chose among remaining 255 possibilities, in this case exception for last position we need eight bit. Here it depends on the position which we pick from remaining elements. We can calculate required bits for best case scenario(when input are in sorted order) and worst case(when input are in reverse sorted order) as below.

If you refer the below illustration you will quickly understand that as possibilities changes required bits changes by maintaining uniqueness of bit stream.

The cells in yellow color are one bit diff encode based first entry in each row.

Using above calculation you can represent using 1545-1793bits(193.125-224.125bytes) for every 256 unique bytes.

Next article I will explain how I achieved compression for 42doubles.

]]>Some of the readers asked how reverse merge sort (merge unsort) can be used to represent 256 unique values using 128bytes(best case) or 224.25(worst case).

As illustrated in previous article using merge sort we sort the random input and store bit information of the list from where we picked the smaller number. Actually we are reshuffling the original position and stored bit information represents how a input changed its position from original list to sorted list. Lets take an example of a random array of 16 numbers varying from 0…15.

As shown in above image after sorting, the position are reshuffled and stored bit information exactly represent from where the each element moved.

The encoding code is recursive function, to start applying merge from two list of size 2 to merge the two list of 128. You can refer the mergesort in action here.

The merge sort encoding code given below(C++).

void Transform::mergeSort(unsigned int beginPosition, unsigned int endPosition) { unsigned int midPosition = beginPosition + (endPosition-beginPosition)/2; if (midPosition>beginPosition && midPosition<endPosition) { // Recursive call until list is of size 2 mergeSort(beginPosition,midPosition); mergeSort(midPosition+1,endPosition); } unsigned int elementsInEachList = (endPosition-beginPosition+1)/2; unsigned char list1[256]; unsigned char list2[256]; for(unsigned int i=beginPosition;i<beginPosition+elementsInEachList;i++) { list1[i] = transformedByte[i]; } for(unsigned int i=midPosition+1;i<midPosition+1+elementsInEachList;i++) { list2[i] = transformedByte[i]; } unsigned int pendingInList1 = elementsInEachList; unsigned int list1Pointer = beginPosition; unsigned int pendingInList2 = elementsInEachList; unsigned int list2Pointer = midPosition+1; unsigned int mainArrayPointer = beginPosition; while(pendingInList1>0 && pendingInList2>0) { if (list1[list1Pointer]<list2[list2Pointer]) { transformedByte[mainArrayPointer] = list1[list1Pointer]; mainArrayPointer++; list1Pointer++; pendingInList1--; bitWriter.write(false); // Store bit information 0 as item picked from first list } else { transformedByte[mainArrayPointer] = list2[list2Pointer]; mainArrayPointer++; list2Pointer++; pendingInList2--; bitWriter.write(true); // Store bit information 1 as item picked from second list } } if (pendingInList1>0) { while(pendingInList1>0) { transformedByte[mainArrayPointer] = list1[list1Pointer]; mainArrayPointer++; list1Pointer++; pendingInList1--; } } else { while(pendingInList2>0) { transformedByte[mainArrayPointer] = list2[list2Pointer]; mainArrayPointer++; list2Pointer++; pendingInList2--; } } }

As you observed above when items in one list gets over (either first list or second list) for remaining element we don’t require sorting bit information.

Please note in above code random input is stored in transformedByte variable & initilized in separate method for each 256byte input. The object bitWriter is a separate class of type BitWriter(Utility class) which is initialized in constructor. BitWriter will accumulates bit information and writes to output file when accumulated bits crosses 1 byte.

The merge sort decoding code given below(C++).

void Transform::reverseMergeSort(unsigned int beginPosition, unsigned int endPosition) { unsigned int midPosition = beginPosition + (endPosition-beginPosition)/2; if (midPosition>beginPosition && midPosition<endPosition) { // Recursive call until list is of size 2 mergeSort(beginPosition,midPosition); mergeSort(midPosition+1,endPosition); } unsigned int elementsInEachList = (endPosition-beginPosition+1)/2; unsigned char list1[256]; unsigned char list2[256]; for(unsigned int i=beginPosition;i<beginPosition+elementsInEachList;i++) { list1[i] = inputPositionArray[i]; } for(unsigned int i=midPosition+1;i<midPosition+1+elementsInEachList;i++) { list2[i] = inputPositionArray[i]; } unsigned int pendingInList1 = elementsInEachList; unsigned int list1Pointer = beginPosition; unsigned int pendingInList2 = elementsInEachList; unsigned int list2Pointer = midPosition+1; unsigned int mainArrayPointer = beginPosition; while(pendingInList1>0 && pendingInList2>0) { bool nextBit = bitReader.genNextBit(); // Read bit information from sorted encoding if (nextBit==false) { inputPositionArray[mainArrayPointer] = list1[list1Pointer]; mainArrayPointer++; list1Pointer++; pendingInList1--; } else { inputPositionArray[mainArrayPointer] = list2[list2Pointer]; mainArrayPointer++; list2Pointer++; pendingInList2--; } } if (pendingInList1>0) { while(pendingInList1>0) { inputPositionArray[mainArrayPointer] = list1[list1Pointer]; mainArrayPointer++; list1Pointer++; pendingInList1--; } } else { while(pendingInList2>0) { inputPositionArray[mainArrayPointer] = list2[list2Pointer]; mainArrayPointer++; list2Pointer++; pendingInList2--; } } }

Before calling reverseMergeSort inputPositionArray is initialized with 0,1,2.. as default initial position. The object bitReader is a separate class of type BitReader(Utility class) which is initialized in constructor and provides API to read a bit at a time(not byte), which internally reads a byte and gives a bit at a time. These utility classes are developed by me exclusively for above purpose.

After exiting reverseMergeSort the inputPositionArray does not represent original input, instead it represent the positions of input in original data. After exiting we have to go thru one more loop to construct original input as below.

for(int i=0; i<256;i++) { inputData[inputPositionArray[i]] = i; }

Now inputData contains original data.

I will inform in next article how I compressed random data in which we have maximum 42 duplicates(non unique) for every 256 input byte.

]]>An oxymoron is usually defined as a phrase in which two words of contradictory meaning are brought together:

Read till end

1) Clearly misunderstood

2) Exact Estimate

3) Small Crowd

4) Act Naturally

5) Found Missing

6) Fully Empty

7) Pretty ugly

8) Seriously funny

9) Only choice

10) Original copies

And the Mother of all……

.

.

.

.

.

.

11) Happily Married

]]>Visit Bangalore city traffic police

Check your vehicle traffic violation fines here “Search for Traffic Violations”

Understand and follow all traffic sign in Bangalore and India – Traffic sign

Various spot fine – List of Traffic offences, section of law and fine amount

Traffic Provisions – Traffic Do’s & Don’ts

Traffic Road Markings – Traffic Do’s & Don’ts

Traffic Rules and Regulations – Traffic Do’s & Don’ts

Traffic Lights – Traffic Do’s & Don’ts

Traffic Advice – Traffic Do’s & Don’ts

Tips on Good Driving – Traffic Do’s & Don’ts

]]>Step (8) is my findings.

You can download pdf version here

Introduction

The Rubik’s Cube (also spelled Rubick’s Cube, or Rubix Cube) is one of the most puzzling toys of all time. It ranks as one of the most cherished 80s icons and

few people have ever claimed to solve it on their own. In fact, most solutions have come from mathematicians using group theory. But it is solvable. Trial and

error will get you nowhere, so below is a 7 step process that will get you to sweet, sweet resolution.

Using this method, even an idiot can solve a cube in seven basic steps. Note the diagram below. Each face of the cube is assigned a letter. During the following

steps, specific faces require a sequence of twists (or quarter turns). The letter “i” means inverse, or counter-clockwise twists. For example, in the following

sequence

Ri U Fi Ui, you would rotate the Right face counterclockwise a quarter turn, the Upper face clockwise a quarter turn, the Front face

counterclockwise a quarter turn, and the Upper face counterclockwise a quarter turn. Before you start each move, make sure your thumbs are on the F side of

the cube to ensure consistent orientation for all the sequences. To turn a face in the right direction, imagine that you are facing that side of the cube. If you mess

up along the way, just restart from Step 1. Let’s begin.

STEP 1: Solve the Upper Green Cross

To solve the green cross, you have to solve the green edge pieces on their own. This should be easy to figure out on your own. Should you ever have an edge piece

in the correct place but flipped the wrong way, use this step to flip it without affecting the other three green edges: Hold the cube with the piece in the

upper-right position as in the diagram, and do the sequence Ri U Fi Ui. The edge piece should now be solved, and you can work the next edge piece until the

cube looks like the right diagram below.

STEP 2: Solve the Green Corners

Find the corner piece in the bottom layer that belongs on top. Turn the bottom layer until the piece is directly below its home in the top layer. Hold the cube with

the piece on the lower-front-right and its home at the upper-front-right and then do the sequence Ri Di R D 1,3, or 5 times until that corner is solved. If you find

a corner piece that’s already in the top layer but the wrong spot or flipped the wrong way, hold the cube with the piece in the upper-front-right position and do Ri

Di R D once. Now the piece is on the bottom and ready to be solved using the Ri Di R D sequence.

STEP 3: Solve the Middle Layer Edges

Flip the cube so green is on the bottom. Find the yellow-red edge piece. If it’s on top, turn it so it matches one of the diagrams below. Then do the corresponding

sequence to solve it. If the red-yellow edge is somewhere in the middle layer but it’s in the wrong place or flipped the wrong way, hold the cube so the edge is in

the front-right position and do either sequence once: U R Ui Ri Ui Fi U F or Ui Fi U F U R Ui Ri (This may require that you rotate the cube to a new face).

After the move, the piece is in the top layer and you can solve it as described above. Repeat for the other 3 middle-layer edges.

STEP 4: Solve the Upper Blue Cross

Turn the top layer until the edges match one of the diagrams. Repeat the following sequence as many times as it takes to get a blue cross: F R U Ri Ui Fi.

STEP 5: Solve to Top Edges

Hold the cube with red in front. Turn the top layer until the red and blue edge piece is solved as in the diagram, and then repeat the following sequence until the

yellow and blue edge piece is also solved on the right side: R U Ri U R U U Ri. Now turn the whole cube so that white is the front face. If the top white edge

isn’t solved, do the sequence again followed by and extra U.

STEP 6: Solve the Top Corners

Find a corner piece that’s in the right place and hold the cube with that piece above your right thumb. Don’t turn the top layer at all, as it well mess up all the

effort from Step 5. Do the following sequence once or twice to put the rest of the corners into place: U R Ui Li U Ri Ui L. If you can’t find a corner piece in the

right place, just do the sequence once before you start.

STEP 7: Solve the Top Corners (Again)

You’re almost done, you beautiful mind, you. Hold the cube with the red in front. Keep turning the top layer until the upper-front-right-corner needs to be flipped

to have blue on top like in the diagram. Do the sequence Ri Di R D 2 or 4 times to get blue into position. The cube will get scrambled in the process, but don’t

worry. With red still in front, keep turning the top layer and repeating the sequence to flip the upper-right-corner pieces.

STEP 8: Solve the Top Corners (Again – Special case)

Try this if you land up with side two corner piece in right direction, but wrong position. Check the ‘before’ image

]]>