<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Adityon &#187; Data compression</title>
	<atom:link href="http://blog.adityon.com/category/data-compression/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.adityon.com</link>
	<description>Simpler solution for complex problem. Think different - Keshav Shetty</description>
	<lastBuildDate>Sat, 04 Jun 2011 17:32:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Random data compression &#8211; Lowerband &amp; upper band filtering</title>
		<link>http://blog.adityon.com/2011/01/random-data-compression-lowerband-upper-band-filtering/</link>
		<comments>http://blog.adityon.com/2011/01/random-data-compression-lowerband-upper-band-filtering/#comments</comments>
		<pubDate>Sun, 09 Jan 2011 15:12:25 +0000</pubDate>
		<dc:creator>Keshav Shetty</dc:creator>
				<category><![CDATA[Data compression]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Headline]]></category>
		<category><![CDATA[comp.compression]]></category>
		<category><![CDATA[infinity compression]]></category>
		<category><![CDATA[lossless data compression]]></category>
		<category><![CDATA[lossless random data compression]]></category>
		<category><![CDATA[magic compressor]]></category>
		<category><![CDATA[million random digit]]></category>
		<category><![CDATA[random data compression]]></category>
		<category><![CDATA[unsorting algorithm]]></category>

		<guid isPermaLink="false">http://blog.adityon.com/2011/01/random-data-compression-lowerband-upper-band-filtering/</guid>
		<description><![CDATA[Wish you all happy new year 2011.
in this article I will explain how a set of unique elements can be represented with less bits using filtering elements into two sets of upper and lower band.



Few users suggested to change the title, because title says &#8220;Random data&#8221; where as article describes about unique elements. Let me clarify that million random digit contains average 90-110 bytes of duplicate or non unique for every 256 bytes. If we could represent unique data set &#60;=202 byte(worst case), we can use remaining 54 byte to ...]]></description>
			<content:encoded><![CDATA[<p>Wish you all happy new year 2011.</p>
<p>in this article I will explain how a set of unique elements can be represented with less bits using filtering elements into two sets of upper and lower band.</p>
<table cellpadding="1" width="573" style="WIDTH: 573px; HEIGHT: 116px" border="1" cellspacing="1">
<tbody>
<tr>
<td>Few users suggested to change the title, because title says &#8220;Random data&#8221; where as article describes about unique elements. Let me clarify that million random digit contains average 90-110 bytes of duplicate or non unique for every 256 bytes. If we could represent unique data set &lt;=202 byte(worst case), we can use remaining 54 byte to represent duplicates. Refer previous article <a href="http://blog.adityon.com/2009/12/random-data-compression-is-it-possible-part-2/" target="_blank">here</a></td>
</tr>
</tbody>
</table>
<p>Coming to filtering technique, In every set of size 2^n, we will have exactly 2 set where half elements belongs to upper part and half elements belongs to lower part. e.g: In a set 256 unique values we can filter first or upper array containing element between 0 to 127 and lower array 128 to 255. By parsing and remembering using a flag(false for upper and true for lower set) weather element moved to upper or lower set we can locate the original position. Apply this method for newly created set recursively.</p>
<p>Lets take the same the input used in previous article of size 16, i.e: (7, 12, 9, 3, 13, 6, 8, 15, 1, 4, 2, 11, 0, 14, 5, 10)</p>
<p>Element 0-7 belongs to upper set and 8-15 belongs to lower set. Upper set or lower set will have equal number of elements of size (input set size)/2. When a element moves to upper set we can remember position as false and if element moves to lower set we can remember position as true. Once either upper or lower set size touches the size of the element in new set (i.e actually half of the original input size) we can stop parsing further.</p>
<p>In the above example &#8211; Total number of element=16, Size of new subset = 8, Elements in first/upper set is 0-7 and elements in lower set 8-15. After applying first time filter we get following output.</p>
<p><img src="http://blog.adityon.com/wp-content/uploads/2011/01/filter.png" style="BORDER-BOTTOM: #000000 1px solid; BORDER-LEFT: #000000 1px solid; WIDTH: 600px; DISPLAY: inline; HEIGHT: 290px; BORDER-TOP: #000000 1px solid; BORDER-RIGHT: #000000 1px solid" height="290" alt="filter.png" width="600"/></p>
<p>This method is similar to <a href="http://blog.adityon.com/2009/12/random-data-compression-is-it-possible-part-2/" target="_blank">reverse merge sort</a>, difference is here we use top to bottom approach, whereas merge sort uses bottom to up.</p>
<p>Using this method we can represent 256 unique elements and save 128 byte(best case) to 31.75 byte (worst case) as illustrated below.</p>
<p><img src="http://blog.adityon.com/wp-content/uploads/2009/12/mergesort22.png" style="BORDER-BOTTOM: #000000 1px solid; BORDER-LEFT: #000000 1px solid; WIDTH: 540px; DISPLAY: inline; HEIGHT: 234px; BORDER-TOP: #000000 1px solid; BORDER-RIGHT: #000000 1px solid" height="234" alt="mergesort2.png" width="540"/></p>
<p>Worst case happens when last element of each set belongs to other set and vice versa.</p>
<p>As I said beginning this method not sufficient to address million random digit, where I need to save at least 54 byte for every 256 byte.</p>
<p>In the next article I will write about using unidirectional graph how we can represent set of unique elements. Till then please pass your comments.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.adityon.com/2011/01/random-data-compression-lowerband-upper-band-filtering/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Random data compression &#8211; Risk based selection</title>
		<link>http://blog.adityon.com/2010/10/random-data-compression-risk-based-selection/</link>
		<comments>http://blog.adityon.com/2010/10/random-data-compression-risk-based-selection/#comments</comments>
		<pubDate>Sun, 03 Oct 2010 15:39:18 +0000</pubDate>
		<dc:creator>Keshav Shetty</dc:creator>
				<category><![CDATA[Data compression]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Headline]]></category>
		<category><![CDATA[comp.compression]]></category>
		<category><![CDATA[infinity compression]]></category>
		<category><![CDATA[lossless data compression]]></category>
		<category><![CDATA[lossless random data compression]]></category>
		<category><![CDATA[magic compressor]]></category>
		<category><![CDATA[million random digit]]></category>
		<category><![CDATA[random data compression]]></category>
		<category><![CDATA[unsorting algorithm]]></category>

		<guid isPermaLink="false">http://blog.adityon.com/2010/10/random-data-compression-risk-based-selection/</guid>
		<description><![CDATA[In my previous article I mentioned that once tree structure is ready we can fill and generate the random input along with one bit diff encoding.
As I observed as tree reaches mid levels, available options increases to 64+ elements, which will result into minimum 6bits per selection.
With respect to million random digit this approach will result into not more than 27 duplicates can be accommodated.
We need a better approach to selecting elements instead of one bit dif encoding.
I used the modified version of one bit dif encoding, I call it ...]]></description>
			<content:encoded><![CDATA[<p>In my previous article I mentioned that once tree structure is ready we can fill and generate the random input along with one bit diff encoding.</p>
<p>As I observed as tree reaches mid levels, available options increases to 64+ elements, which will result into minimum 6bits per selection.</p>
<p>With respect to million random digit this approach will result into not more than 27 duplicates can be accommodated.</p>
<p>We need a better approach to selecting elements instead of one bit dif encoding.</p>
<p>I used the modified version of one bit dif encoding, I call it as risk based selection.</p>
<p>Lets assume that I have possible available selection (4, 9, 40, 102, 150, 165, 172, 197, 245) and actual selection was 245. Using one bit dif encoding I need 3 bit i.e &#8220;111&#8243; to select 245 among these input.</p>
<p>In risk based selection I will try to make two list with maximum numbers in one list and remaining in another list based on particular bit.</p>
<p>If you analyze below image you will understand that 4th bit in available option results into 7 zeros, If I knew 4th bit of number which I am selecting is &#8220;1&#8243; then I will left with only two remaining numbers.</p>
<p>i.e in available number 4th bit has maximum variance. (As I am aware there is other 6th bit as 1 as maximum or 7th bit as maximum zero, lets select first possibility)</p>
<p>The number which I am going to select i.e 245 has 4th bit as &#8220;1&#8243;, If I store this bit my next available options reduces to two i.e 150, 245.</p>
<p><img src="http://blog.adityon.com/wp-content/uploads/2010/10/riskbasedSelection.png" style="BORDER-BOTTOM: #000000 1px solid; BORDER-LEFT: #000000 1px solid; WIDTH: 434px; DISPLAY: inline; HEIGHT: 399px; BORDER-TOP: #000000 1px solid; BORDER-RIGHT: #000000 1px solid" height="399" alt="riskbasedSelection.png" width="434"/></p>
<p>As you can see in the image red marked bit are selected for next round. I repeat same procedure for remaining options until I am left with only one option.</p>
<p>Green box indicates 2nd level filtering. After second bit selection I will left with 245 which is the number I am looking for. This way I need only two bit (i.e &#8220;11&#8243;) to select particular node.</p>
<p>This selection is more effective than bit dif encoding which requires 3 bit compared to 2 bit.</p>
<p>Note: In worst case scenario we may end up with 8bits for each node. You can see yourself by applying this for below example</p>
<p>Assume we have these available option (1,2,4,8,16,32,64,128) and number to be selected is 1. We will end up with 8 bits.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.adityon.com/2010/10/random-data-compression-risk-based-selection/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Random data compression &#8211; Using Tree</title>
		<link>http://blog.adityon.com/2010/10/random-data-compression-using-tree/</link>
		<comments>http://blog.adityon.com/2010/10/random-data-compression-using-tree/#comments</comments>
		<pubDate>Sun, 03 Oct 2010 11:14:12 +0000</pubDate>
		<dc:creator>Keshav Shetty</dc:creator>
				<category><![CDATA[Data compression]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Headline]]></category>
		<category><![CDATA[comp.compression]]></category>
		<category><![CDATA[infinity compression]]></category>
		<category><![CDATA[lossless data compression]]></category>
		<category><![CDATA[lossless random data compression]]></category>
		<category><![CDATA[magic compressor]]></category>
		<category><![CDATA[million random digit]]></category>
		<category><![CDATA[random data compression]]></category>
		<category><![CDATA[unsorting algorithm]]></category>

		<guid isPermaLink="false">http://blog.adityon.com/2010/10/random-data-compression-using-tree/</guid>
		<description><![CDATA[In this article I will explain how a B Tree can be used to model the unique numbers. This will be interesting article as best case scenario requires only 64bytes.
As you are aware B Tree or a balanced tree contains maximum two child for each node, all nodes left side will be lower than current node and all right side nodes will be higher than current node.
How can we use B Tree to model the random unique numbers?
In case of 256 unique numbers we will have 256 nodes, left most ...]]></description>
			<content:encoded><![CDATA[<p>In this article I will explain how a B Tree can be used to model the unique numbers. This will be interesting article as best case scenario requires only 64bytes.</p>
<p>As you are aware B Tree or a balanced tree contains maximum two child for each node, all nodes left side will be lower than current node and all right side nodes will be higher than current node.</p>
<p>How can we use B Tree to model the random unique numbers?</p>
<p>In case of 256 unique numbers we will have 256 nodes, left most node starts with zero and right most node 255. For each node we need two flags to identify weather it contains left node and right node.</p>
<p>Recursively we can build the tree structure. Once tree structure is ready we can fill each node with value by using pre-order traverse.</p>
<p>Lets take an example, Assume we have random 16 inputs (7, 12, 9, 3, 13, 6, 8, 15, 1, 4, 2, 11, 0, 14, 5, 10)</p>
<p>Respective tree will be</p>
<p><img src="http://blog.adityon.com/wp-content/uploads/2010/10/binaryTree.jpeg" style="BORDER-BOTTOM: #000000 1px solid; BORDER-LEFT: #000000 1px solid; WIDTH: 600px; DISPLAY: inline; HEIGHT: 427px; BORDER-TOP: #000000 1px solid; BORDER-RIGHT: #000000 1px solid" height="427" alt="binaryTree.jpeg" width="600"/></p>
<p>If we knew tree structure we can refill values of each node by starting from left most to rightmost.</p>
<p>However after building tree we need addition info find which child node appears next.</p>
<p>In the above example we are sure that 7 appeared first in the input array. Next number either 3 or 12 (one of the open child).</p>
<p>We need one bit to select either 3 or 12(That bit will be &#8220;1&#8243;). we end up with 12 as next input. Now as next input should be within 3, 9, 13. We can go on applying one bit dif encoding to select proper child.</p>
<p>Whenever a child is selected its children are added to available options for selecting next node.</p>
<p>This way we need 256bits(for node flag) + one bit dif encoding info of each child.</p>
<p><strong><span style="TEXT-DECORATION: underline">Best case and worst case scenario</span></strong></p>
<p>Using tree in best case scenario i.e each node has maximum only one node, that happens when input is in ascending order we need just 2^(n+1) bits i.e for 256 input we need 512bits or 64 bytes. JUST 64 byte for 256 unique values!!!.</p>
<p>In worst case scenario we need 64byte + bit diff info of worst case = 2047bits (i.e ~256 bytes) No saving.</p>
<p>Worst case scenario happens when child node added to available options is consumed at the end.</p>
<p>Using this technique by rearranging million random digit I could compress up to 27bytes non unique numbers. Worst than merge sort. This technique will be effective provided we find a way to increase the level of tree which results into nodes with single child more. I will write more about this later.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.adityon.com/2010/10/random-data-compression-using-tree/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Random data compression &#8211; diff encoding with merge sort</title>
		<link>http://blog.adityon.com/2010/10/random-data-compression-diff-encoding-with-merge-sort/</link>
		<comments>http://blog.adityon.com/2010/10/random-data-compression-diff-encoding-with-merge-sort/#comments</comments>
		<pubDate>Sat, 02 Oct 2010 18:12:31 +0000</pubDate>
		<dc:creator>Keshav Shetty</dc:creator>
				<category><![CDATA[Data compression]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Headline]]></category>
		<category><![CDATA[comp.compression]]></category>
		<category><![CDATA[infinity compression]]></category>
		<category><![CDATA[lossless data compression]]></category>
		<category><![CDATA[lossless random data compression]]></category>
		<category><![CDATA[magic compressor]]></category>
		<category><![CDATA[million random digit]]></category>
		<category><![CDATA[random data compression]]></category>
		<category><![CDATA[unsorting algorithm]]></category>

		<guid isPermaLink="false">http://blog.adityon.com/2010/10/random-data-compression-diff-encoding-with-merge-sort/</guid>
		<description><![CDATA[Last article didn&#8217;t go well with my readers, so I decided to elaborate with example.
Lets assume we have eight random input say 5, 1, 3, 1, 2, 3, 0, 5
In this input 4th, 6th and 8th element are non unique i.e they already appeared at least once.
For each number we need a identifier or a flag to identify it as unique or duplicate. So we will have bit info like 0 0 0 1 0 1 0 1
When 4th element appears which is duplicate of one of the past unique, ...]]></description>
			<content:encoded><![CDATA[<p>Last article didn&#8217;t go well with my readers, so I decided to elaborate with example.</p>
<p>Lets assume we have eight random input say 5, 1, 3, 1, 2, 3, 0, 5</p>
<p>In this input 4th, 6th and 8th element are non unique i.e they already appeared at least once.</p>
<p>For each number we need a identifier or a flag to identify it as unique or duplicate. So we will have bit info like 0 0 0 1 0 1 0 1</p>
<p>When 4th element appears which is duplicate of one of the past unique, that should exist within (5, 1, 3)</p>
<p>Similarly 6th element should exist within (5, 1, 3, 2) and same applies to 8th element which should exist within (5, 1, 3, 2, 0)</p>
<p><img src="http://blog.adityon.com/wp-content/uploads/2010/10/difencoding.png" style="BORDER-BOTTOM: #000000 1px solid; BORDER-LEFT: #000000 1px solid; BORDER-TOP: #000000 1px solid; BORDER-RIGHT: #000000 1px solid" height="417" alt="difencoding.png" width="512"/></p>
<p>Lets go back to the algorithm mentioned in my previous article.</p>
<p>1. Read 256 bytes from the input (In the above example we will read 8 input of each 3 bit)</p>
<p>2. Sequential process and mark unique and duplicates, that results into 0 0 0 1 0 1 0 1. We need to store this info to be used at the time of decompression. For above 8 input we need 8bits (for 256 input we need 256bits).</p>
<p>3. For each marked as non unique data, remember the position using one bit diff encoding.</p>
<p>i.e for 4th element which is duplicate of 2nd element, using one bit diff encoding we get &#8220;01&#8243; and for for 6th element we get &#8220;10&#8243; for 8th element we get &#8220;000&#8243;. The complete table of one bit diff encoding available <a href="http://blog.adityon.com/wp-content/uploads/2010/10/256BitDifMap.txt" target="_blank" title="256 variance of one bit dif encoding">here</a></p>
<p>4. Remove non unique data and rearrange the list by moving forward, at the end empty space fill with missing numbers. Refer above image bottom array is rearranged one.</p>
<p>Now the array contains only unique elements, we can apply reverse merge sort to sort and store these bit info.</p>
<p>So compressed file contains unique or duplicate flags generated in step(2) + one bit diff encoding info of duplicate numbers generated in above step(3) + merge sort bit info.</p>
<p>While decompressing use merge sort bit info to re create unique array and use unique or duplicate flags to note which was duplicate and then use bit dif info refill duplicates.</p>
<p>I hope this example clears many of your doubts.</p>
<p>Next article will be interesting one, which is based on B Tree.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.adityon.com/2010/10/random-data-compression-diff-encoding-with-merge-sort/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Random data compression &#8211; One bit diff encoding continued</title>
		<link>http://blog.adityon.com/2010/10/random-data-compression-one-bit-diff-encoding-continued/</link>
		<comments>http://blog.adityon.com/2010/10/random-data-compression-one-bit-diff-encoding-continued/#comments</comments>
		<pubDate>Fri, 01 Oct 2010 12:42:14 +0000</pubDate>
		<dc:creator>Keshav Shetty</dc:creator>
				<category><![CDATA[Data compression]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Headline]]></category>
		<category><![CDATA[comp.compression]]></category>
		<category><![CDATA[infinity compression]]></category>
		<category><![CDATA[lossless data compression]]></category>
		<category><![CDATA[lossless random data compression]]></category>
		<category><![CDATA[magic compressor]]></category>
		<category><![CDATA[million random digit]]></category>
		<category><![CDATA[random data compression]]></category>
		<category><![CDATA[unsorting algorithm]]></category>

		<guid isPermaLink="false">http://blog.adityon.com/2010/10/random-data-compression-one-bit-diff-encoding-continued/</guid>
		<description><![CDATA[This is continuation of my previous article on Random data compression-One bit dif encoding.
Sorry for the long gap between this and last article. I was too busy.
After my my previous article many had asked how I achieved compression up to 42 bytes duplicates.
Here we go with the algorithm I used.
1. Read 256 bytes from the input.
2. Sequential process and mark unique and duplicates, we need 256 bits or 32 bytes. (Can&#8217;t stop this loss)
3. For each marked as non unique data, remember the position using one bit diff encoding. (Details ...]]></description>
			<content:encoded><![CDATA[<p>This is continuation of my <a href="http://blog.adityon.com/2010/06/random-data-compression-one-bit-diff-encoding/" target="_blank">previous article on Random data compression-One bit dif encoding</a>.</p>
<p>Sorry for the long gap between this and last article. I was too busy.</p>
<p>After my my previous article many had asked how I achieved compression up to 42 bytes duplicates.</p>
<p>Here we go with the algorithm I used.</p>
<p>1. Read 256 bytes from the input.</p>
<p>2. Sequential process and mark unique and duplicates, we need 256 bits or 32 bytes. (Can&#8217;t stop this loss)</p>
<p>3. For each marked as non unique data, remember the position using one bit diff encoding. (Details in previous article) Requires much lesser bits than 8bits depending on non unique data appearance. We will consume lesser bits if the non unique data appears at the beginning rather than at the end of the list.</p>
<p>4. Remove non unique data and rearrange the list by moving forward, at the end empty space fill with missing numbers.</p>
<p>5. Apply merge sort and store bit information of sorting data. (Refer how to use merge sort <a href="http://blog.adityon.com/2010/06/random-data-compression-is-it-possible-how-to-use-merge-sort/" target="_blank">article</a>)</p>
<p>While decompressing</p>
<p>1. Use merge sort data to recreate the unique data list.</p>
<p>2. Apply non unique flags to rearrange the list.</p>
<p>3. Use one bit dif encoding to fill duplicate or non unique numbers.</p>
<p>Using above technique I could compress random data up to 42 bytes. I repeat up to, not always. I mean if the non unique appears at the beginning of the list we can accommodate up to 42 bytes, whereas if the non unique appears at the end of the list we can hardly compress data up to 12~14bytes.</p>
<p>Note: This method is not effective for Million Random Digit from RAND. We need different approach.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.adityon.com/2010/10/random-data-compression-one-bit-diff-encoding-continued/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Random data compression &#8211; One bit diff encoding</title>
		<link>http://blog.adityon.com/2010/06/random-data-compression-one-bit-diff-encoding/</link>
		<comments>http://blog.adityon.com/2010/06/random-data-compression-one-bit-diff-encoding/#comments</comments>
		<pubDate>Sat, 05 Jun 2010 18:10:03 +0000</pubDate>
		<dc:creator>Keshav Shetty</dc:creator>
				<category><![CDATA[Data compression]]></category>
		<category><![CDATA[Headline]]></category>
		<category><![CDATA[comp.compression]]></category>
		<category><![CDATA[infinity compression]]></category>
		<category><![CDATA[lossless data compression]]></category>
		<category><![CDATA[lossless random data compression]]></category>
		<category><![CDATA[magic compressor]]></category>
		<category><![CDATA[million random digit]]></category>
		<category><![CDATA[random data compression]]></category>
		<category><![CDATA[unsorting algorithm]]></category>

		<guid isPermaLink="false">http://blog.adityon.com/2010/06/random-data-compression-one-bit-diff-encoding/</guid>
		<description><![CDATA[This is continuation of my previous article on Random data compression-How to use merge sort?.
As I mentioned earlier &#8211; it is hard or impossible(as of now) to compress million random digit. As per my analysis million random digit file contains around 90 &#8211; 110 bytes duplicates or repeated number within every block of 256 bytes. When input data is very pure or near pure(I mean uniqueness) or when input data is highly polluted or noise(I mean duplicates), then it is easy to compress. May be we can borrow the idea ...]]></description>
			<content:encoded><![CDATA[<p>This is continuation of my <a href="http://blog.adityon.com/2010/06/random-data-compression-is-it-possible-how-to-use-merge-sort/" target="_blank">previous article on Random data compression-How to use merge sort?</a>.</p>
<p>As I mentioned earlier &#8211; it is hard or impossible(as of now) to compress <a href="http://marknelson.us/2006/06/20/million-digit-challenge/" target="_blank">million random digit</a>. As per my analysis million random digit file contains around 90 &#8211; 110 bytes duplicates or repeated number within every block of 256 bytes. When input data is very pure or near pure(I mean uniqueness) or when input data is highly polluted or noise(I mean duplicates), then it is easy to compress. May be we can borrow the idea from chemistry. i.e if you refer periodic table lighter elements can be combined using nuclear fusion(Stars) or higher elements can be broken using nuclear fission(Uranium). In both cases energy released which we can relate to compression, however if you refer periodic table you will understand that elements with highly stable nucleus hard to break or fuse. e.g: Lead.</p>
<p>Same pattern I can see in million random digit where if uniqueness reaches extreme high or extreme low we can easily compress the data.</p>
<p>My so far developed compressor could achieve compression with doubles less than 42 or more than 156 in every 256 bytes.</p>
<p>Before I go further about my compression theory, I have to write one more article about one bit difference variable encoder. This is one of the known algorithm based on probability or possibilities.</p>
<p>About bijective you might have observed it is one on one mapping, however this fails to address relation within elements of donor domain set or acceptor domain set.</p>
<p>The one bit variant encoding based on the possibilities. i.e if we have 4 unique possibilities then we need 2 bits (00,01,10,11) to represent all possible values. How many bits do we require when we have 5 possibilities or say 6 possibilities?</p>
<p>In one of my earlier article I mentioned usage of insertion sort or bubble sort stating using this technique we can represent for every possible combination we will have 1792 bits or 224bytes. But I need to re factor this statement. We need much lesser bits.</p>
<p>Assume that you have a sorted list of 256 all unique values, when a random input comes lets say 26, you pick from the initialized list, actually you pick the position, Since it is first input and there are 256 possibilities we need eight bit to represent the position. For next random input we need to chose among remaining 255 possibilities, in this case exception for last position we need eight bit. Here it depends on the position which we pick from remaining elements. We can calculate required bits for best case scenario(when input are in sorted order) and worst case(when input are in reverse sorted order) as below.</p>
<p><img src="http://blog.adityon.com/wp-content/uploads/2010/06/onebitdiffencoding.png" style="BORDER-BOTTOM: #000000 1px solid; BORDER-LEFT: #000000 1px solid; WIDTH: 419px; DISPLAY: inline; HEIGHT: 161px; BORDER-TOP: #000000 1px solid; BORDER-RIGHT: #000000 1px solid" height="161" alt="onebitdiffencoding.png" width="419"/></p>
<p>If you refer the below illustration you will quickly understand that as possibilities changes required bits changes by maintaining uniqueness of bit stream.</p>
<p><a href="http://blog.adityon.com/wp-content/uploads/2010/06/1bitdiff.jpg" target="_blank"><img src="http://blog.adityon.com/wp-content/uploads/2010/06/1bitdiff.jpg" style="WIDTH: 600px; DISPLAY: inline; HEIGHT: 263px" title="Click for enlarge" height="263" width="600" alt="1bitdiff.jpg"/></a></p>
<p>The cells in yellow color are one bit diff encode based first entry in each row.</p>
<p>Using above calculation you can represent using 1545-1793bits(193.125-224.125bytes) for every 256 unique bytes.</p>
<p>Next article I will explain how I achieved compression for 42doubles.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.adityon.com/2010/06/random-data-compression-one-bit-diff-encoding/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Random data compression &#8211; Is it possible? How to use merge sort?</title>
		<link>http://blog.adityon.com/2010/06/random-data-compression-is-it-possible-how-to-use-merge-sort/</link>
		<comments>http://blog.adityon.com/2010/06/random-data-compression-is-it-possible-how-to-use-merge-sort/#comments</comments>
		<pubDate>Thu, 03 Jun 2010 14:14:45 +0000</pubDate>
		<dc:creator>Keshav Shetty</dc:creator>
				<category><![CDATA[Data compression]]></category>
		<category><![CDATA[Headline]]></category>
		<category><![CDATA[infinity compression]]></category>
		<category><![CDATA[Kolmogorov complexity]]></category>
		<category><![CDATA[lossless data compression]]></category>
		<category><![CDATA[lossless random data compression]]></category>
		<category><![CDATA[merge sort]]></category>
		<category><![CDATA[random data compression]]></category>
		<category><![CDATA[unsorting algorithm]]></category>

		<guid isPermaLink="false">http://blog.adityon.com/2010/06/random-data-compression-is-it-possible-how-to-use-merge-sort/</guid>
		<description><![CDATA[This is continuation of my previous article on Random data compression possibilities.
Some of the readers asked how reverse merge sort (merge unsort) can be used to represent 256 unique values using 128bytes(best case) or 224.25(worst case).
As illustrated in previous article using merge sort we sort the random input and store bit information of the list from where we picked the smaller number. Actually we are reshuffling the original position and stored bit information represents how a input changed its position from original list to sorted list. Lets take an example ...]]></description>
			<content:encoded><![CDATA[<p>This is continuation of my <a href="http://blog.adityon.com/2009/12/random-data-compression-is-it-possible-part-2/" target="_blank">previous article on Random data compression possibilities</a>.</p>
<p>Some of the readers asked how reverse merge sort (merge unsort) can be used to represent 256 unique values using 128bytes(best case) or 224.25(worst case).</p>
<p>As illustrated in previous article using merge sort we sort the random input and store bit information of the list from where we picked the smaller number. Actually we are reshuffling the original position and stored bit information represents how a input changed its position from original list to sorted list. Lets take an example of a random array of 16 numbers varying from 0&#8230;15.</p>
<p><img src="http://blog.adityon.com/wp-content/uploads/2010/06/MergeUnsort.jpg" style="WIDTH: 600px; DISPLAY: inline; HEIGHT: 153px" height="153" alt="MergeUnsort.jpg" width="600"/></p>
<p>As shown in above image after sorting, the position are reshuffled and stored bit information exactly represent from where the each element moved.</p>
<p>The encoding code is recursive function, to start applying merge from two list of size 2 to merge the two list of 128. You can refer the <a href="http://blog.adityon.com/wp-content/uploads/2009/12/mergesort1.JPG" target="_blank">mergesort in action here</a>.</p>
<p>The merge sort encoding code given below(C++).</p>
<pre id="code" name="code" class="java">
   void Transform::mergeSort(unsigned int beginPosition, unsigned int endPosition) {
                unsigned int midPosition = beginPosition + (endPosition-beginPosition)/2;
                if (midPosition&gt;beginPosition &amp;&amp; midPosition&lt;endPosition) { // Recursive call until list is of size 2
                        mergeSort(beginPosition,midPosition);
                        mergeSort(midPosition+1,endPosition);
                }
                unsigned int elementsInEachList = (endPosition-beginPosition+1)/2;
                unsigned char list1[256];
                unsigned char list2[256];
                for(unsigned int i=beginPosition;i&lt;beginPosition+elementsInEachList;i++) {
                        list1[i] = transformedByte[i];
                }
                for(unsigned int i=midPosition+1;i&lt;midPosition+1+elementsInEachList;i++) {
                        list2[i] = transformedByte[i];
                }
                unsigned int pendingInList1 = elementsInEachList;
                unsigned int list1Pointer = beginPosition;
                unsigned int pendingInList2 = elementsInEachList;
                unsigned int list2Pointer = midPosition+1;
                unsigned int mainArrayPointer = beginPosition;
                while(pendingInList1&gt;0 &amp;&amp; pendingInList2&gt;0) {
                        if (list1[list1Pointer]&lt;list2[list2Pointer]) {
                                transformedByte[mainArrayPointer] = list1[list1Pointer];
                                mainArrayPointer++;
                                list1Pointer++;
                                pendingInList1--;
                                bitWriter.write(false); // Store bit information 0 as item picked from first list
                        } else {
                                transformedByte[mainArrayPointer] = list2[list2Pointer];
                                mainArrayPointer++;
                                list2Pointer++;
                                pendingInList2--;
                                bitWriter.write(true); // Store bit information 1 as item picked from second list
                        }
           }
           if (pendingInList1&gt;0) {
                        while(pendingInList1&gt;0) {
                                transformedByte[mainArrayPointer] = list1[list1Pointer];
                                mainArrayPointer++;
                                list1Pointer++;
                                pendingInList1--;
                        }
                } else {
                        while(pendingInList2&gt;0) {
                                transformedByte[mainArrayPointer] = list2[list2Pointer];
                                mainArrayPointer++;
                                list2Pointer++;
                                pendingInList2--;
                        }
           }
        }
</pre>
<p>As you observed above when items in one list gets over (either first list or second list) for remaining element we don&#8217;t require sorting bit information.</p>
<p>Please note in above code random input is stored in transformedByte variable &amp; initilized in separate method for each 256byte input. The object bitWriter is a separate class of type BitWriter(Utility class) which is initialized in constructor. BitWriter will accumulates bit information and writes to output file when accumulated bits crosses 1 byte.</p>
<p>The merge sort decoding code given below(C++).</p>
<pre id="code" name="code" class="java">

  void Transform::reverseMergeSort(unsigned int beginPosition, unsigned int endPosition) {
        unsigned int midPosition = beginPosition + (endPosition-beginPosition)/2;
        if (midPosition&gt;beginPosition &amp;&amp; midPosition&lt;endPosition) { // Recursive call until list is of size 2
                mergeSort(beginPosition,midPosition);
                mergeSort(midPosition+1,endPosition);
        }
        unsigned int elementsInEachList = (endPosition-beginPosition+1)/2;
        unsigned char list1[256];
        unsigned char list2[256];
        for(unsigned int i=beginPosition;i&lt;beginPosition+elementsInEachList;i++) {
                list1[i] = inputPositionArray[i];
        }
        for(unsigned int i=midPosition+1;i&lt;midPosition+1+elementsInEachList;i++) {
                list2[i] = inputPositionArray[i];
        }
        unsigned int pendingInList1 = elementsInEachList;
        unsigned int list1Pointer = beginPosition;
        unsigned int pendingInList2 = elementsInEachList;
        unsigned int list2Pointer = midPosition+1;
        unsigned int mainArrayPointer = beginPosition;
        while(pendingInList1&gt;0 &amp;&amp; pendingInList2&gt;0) {
                bool nextBit = bitReader.genNextBit(); // Read bit information from sorted encoding
                if (nextBit==false) {
                        inputPositionArray[mainArrayPointer] = list1[list1Pointer];
                        mainArrayPointer++;
                        list1Pointer++;
                        pendingInList1--;
                } else {
                        inputPositionArray[mainArrayPointer] = list2[list2Pointer];
                        mainArrayPointer++;
                        list2Pointer++;
                        pendingInList2--;
                }
        }
        if (pendingInList1&gt;0) {
                while(pendingInList1&gt;0) {
                        inputPositionArray[mainArrayPointer] = list1[list1Pointer];
                        mainArrayPointer++;
                        list1Pointer++;
                        pendingInList1--;
                }
        } else {
                while(pendingInList2&gt;0) {
                        inputPositionArray[mainArrayPointer] = list2[list2Pointer];
                        mainArrayPointer++;
                        list2Pointer++;
                        pendingInList2--;
                }
        }
}
</pre>
<p>Before calling reverseMergeSort inputPositionArray is initialized with 0,1,2.. as default initial position. The object bitReader is a separate class of type BitReader(Utility class) which is initialized in constructor and provides API to read a bit at a time(not byte), which internally reads a byte and gives a bit at a time. These utility classes are developed by me exclusively for above purpose.</p>
<p>After exiting reverseMergeSort the inputPositionArray does not represent original input, instead it represent the positions of input in original data. After exiting we have to go thru one more loop to construct original input as below.</p>
<pre id="code" name="code" class="java">
   for(int i=0; i&lt;256;i++) {
       inputData[inputPositionArray[i]] = i;
   }
</pre>
<p>Now inputData contains original data.</p>
<p>I will inform in next article how I compressed random data in which we have maximum 42 duplicates(non unique) for every 256 input byte.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.adityon.com/2010/06/random-data-compression-is-it-possible-how-to-use-merge-sort/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Random data compression &#8211; Is it possible? (Part 2)</title>
		<link>http://blog.adityon.com/2009/12/random-data-compression-is-it-possible-part-2/</link>
		<comments>http://blog.adityon.com/2009/12/random-data-compression-is-it-possible-part-2/#comments</comments>
		<pubDate>Sun, 20 Dec 2009 10:44:15 +0000</pubDate>
		<dc:creator>Keshav Shetty</dc:creator>
				<category><![CDATA[Data compression]]></category>
		<category><![CDATA[Headline]]></category>
		<category><![CDATA[infinity compression]]></category>
		<category><![CDATA[Kolmogorov complexity]]></category>
		<category><![CDATA[lossless data compression]]></category>
		<category><![CDATA[lossless random data compression]]></category>
		<category><![CDATA[merge sort]]></category>
		<category><![CDATA[random data compression]]></category>
		<category><![CDATA[unsorting algorithm]]></category>

		<guid isPermaLink="false">http://blog.adityon.com/2009/12/random-data-compression-is-it-possible-part-2/</guid>
		<description><![CDATA[This is continuation of my previous article on Random data compression possibilities.
Compression of unique values
As I indicated in previous article if for every 256 bytes each value appears only ones, then we can achieve the compression using any of the below techniques.
1. Using insertion sort &#8211; by remembering the position. Using this technique for every 256 byte &#8211; we can save exactly 32 bytes, i.e for first element no need to remember any position, for next value one bit sufficient to remember weather current element inserted after or before previous ...]]></description>
			<content:encoded><![CDATA[<p>This is continuation of my <a href="http://blog.adityon.com/2009/12/random-data-compression-is-it-possible/">previous article on Random data compression possibilities</a>.</p>
<p><strong><span style="TEXT-DECORATION: underline">Compression of unique values</span></strong></p>
<p>As I indicated in previous article if for every 256 bytes each value appears only ones, then we can achieve the compression using any of the below techniques.</p>
<p>1. Using insertion sort &#8211; by remembering the position. <br/>Using this technique for every 256 byte &#8211; we can save exactly 32 bytes, i.e for first element no need to remember any position, for next value one bit sufficient to remember weather current element inserted after or before previous number, for next two numbers we need two bits for each value, for next 4 numbers we need 3 bits for each number and so on until for last 128 values we need 8 bit for each value. It is constant save &#8211; i.e for every possible combination we will have 1792 bits or 224bytes. (We save 256 &#8211; 224 = 32bytes)</p>
<p>2. Using factorial of 256 <br/>In this approach, all possible variations can represented using factorial of 256, that will end up with 1684bits for every possible combination. It is constant save &#8211; i.e for every possible combination we will have 1684 bits or 210.5 bytes. (We save 256 &#8211; 210.5 = 45.5bytes)</p>
<p>3. Using reverse merge sort &#8211; Using this approach use a merge sort and store bit information of the list from where we picked the smaller number. Advantage of this technique is, when one list gets over, we don&#8217;t require to store or remember unsorting information for remaining element in other list, we just need to append it as illustrated below.</p>
<p><img src="http://blog.adityon.com/wp-content/uploads/2009/12/mergesort1.JPG" alt="mergesort1.JPG" height="249" border="0" width="538"/></p>
<p>In best case scenario(when all elements are in ascending order) we just need (1/2*8bit) x 256 = 128 bytes for total 256 input bytes, in worst case we need 1793 bits as illustrated below.</p>
<p><img src="http://blog.adityon.com/wp-content/uploads/2009/12/mergesort22.png" alt="mergesort2.png" height="234" width="540"/></p>
<p>Depending on the unique numbers appear in the list we can save between 128 byte(best case) to 31.75 byte (worst case)</p>
<p>Similarly we can use reverse quick sort technique, however merge sort is better than quick in storing minimal unsorting information, because in merge when one list gets over, we don&#8217;t require unsorting information for remaining elements in other list, this feature missing if we use quick sort, but still reverse quick sort can be used with lesser than 2048bits.</p>
<p><strong><span style="TEXT-DECORATION: underline">How to use above theory when all numbers are not unique?</span></strong></p>
<p>Well above said theory works when all numbers are unique, but in real time such scenario is rare, instead there will be some duplicates and some unique numbers missing.</p>
<p>In fact random digits created by the RAND group contains on an average 100bytes duplicates(repeated) in every block of 256 numbers. in such data we need to patch the dataset by removing non unique, and inserting missing unique numbers, so that above reverse merge sort can be applied. When we remove duplicates we need to retain its position and actual value. (No need to retain the introduced missing numbers, which will be discarded after unsorting. But to patch this way we need lot of byte, e.g: if 100 numbers is duplicates (i.e 100 numbers are missing), we need to have 100*2=200 bytes. (very costly).</p>
<p>However there is better patch as below <br/>1. Read 256 block of data, mark the duplicates/non unique, remember these relative to current position of the number (so we don&#8217;t require 8bits for position), Now we have missing unique numbers. <br/>2. Rearrange the input data by eliminating duplicates so that unique numbers moves upper part of the list, fill missing unique numbers at the end of list. (make sure all missing numbers are filled in ascending order so that we can avoid storing unsorting information of missing numbers &#8211; which is of nu use for us &#8211; This way we can save few bytes) <br/>3. Perform merge sort on the list excluding introduced missing numbers, now the compressed file will have Unsorting information + duplicate number positions + duplicate values. <br/>4. If missing numbers are less than 32 byte we can actually store position using eight bits per position, if it is more than 32, then we use a bit for each byte indicating duplicate or not.</p>
<p>Using above technique I developed a compressor which can compress random data up to 42 duplicates or missing unique numbers. (I used million digit of Rand with some modification to maintain maximum 42 missing numbers in every 256 input bytes. However I am yet to find way to address another 60 bytes so that million digit of Rand group can be compressed at least by a byte.</p>
<p>I will fine tune my compressor and announce in next article in next year. (After Xmas/new year), next article I will introduce another technique of transition or state representation for random data.</p>
<p>Wish you all happy new year 2010.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.adityon.com/2009/12/random-data-compression-is-it-possible-part-2/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Random data compression &#8211; Is it possible? Part 1</title>
		<link>http://blog.adityon.com/2009/12/random-data-compression-is-it-possible-part-1/</link>
		<comments>http://blog.adityon.com/2009/12/random-data-compression-is-it-possible-part-1/#comments</comments>
		<pubDate>Mon, 07 Dec 2009 13:55:53 +0000</pubDate>
		<dc:creator>Keshav Shetty</dc:creator>
				<category><![CDATA[Data compression]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Headline]]></category>
		<category><![CDATA[infinity compression]]></category>
		<category><![CDATA[Kolmogorov complexity]]></category>
		<category><![CDATA[lossless data compression]]></category>
		<category><![CDATA[lossless random data compression]]></category>
		<category><![CDATA[random data compression]]></category>
		<category><![CDATA[random data compressor]]></category>

		<guid isPermaLink="false">http://blog.adityon.com/2009/12/random-data-compression-is-it-possible/</guid>
		<description><![CDATA[Random data compression lossless &#8211; Is it possible?
The answer is NO (&#8230;. and yes!!!)
This is little lengthy blog to show and prove the difficulties and possible ways. This article divided into three section.
(Before you read further I suggest you to read Mark Nelson &#8220;The Million Random Digit Challenge&#8220;)
1. Why it is not possible to compress random data. (brief introduction to Kolmogorov theory)
2. Why it is possible (Quantum theory, matter and anti matter introduction)
3. The future for compression and possible solution with transition representation and unsorting techniques
If you want to skip ...]]></description>
			<content:encoded><![CDATA[<p><strong>Random data compression lossless &#8211; Is it possible?</strong></p>
<p>The answer is NO (&#8230;. and yes!!!)</p>
<p>This is little lengthy blog to show and prove the difficulties and possible ways. This article divided into three section.</p>
<p>(Before you read further I suggest you to read Mark Nelson &#8220;<a href="http://marknelson.us/2006/06/20/million-digit-challenge/" target="_blank">The Million Random Digit Challenge</a>&#8220;)</p>
<p>1. Why it is not possible to compress random data. (brief introduction to Kolmogorov theory)</p>
<p>2. Why it is possible (Quantum theory, matter and anti matter introduction)</p>
<p>3. The future for compression and possible solution with transition representation and unsorting techniques</p>
<p>If you want to skip this introduction and continue with next article, please click &#8211; <a href="http://blog.adityon.com/2009/12/random-data-compression-is-it-possible-part-2/">using unsorting techniques for random data compression</a>.</p>
<p><strong style="COLOR: #0000ff"><span style="TEXT-DECORATION: underline">1. Why it is not possible to compress random data. (brief introduction to Kolmogorov theory)</span></strong></p>
<p>As per pigeonhole principle &#8211; if n pigeons are put into m pigeonholes with n &gt; m, then at least one pigeonhole must contain more than one pigeon or if n&lt;m there should be at least one empty pigeonhole. In pure mathematics term &#8211; There does not exist an injective function on finite sets whose codomain is smaller than its domain</p>
<p><img src="http://blog.adityon.com/wp-content/uploads/2009/12/datacompression1.JPG" alt="datacompression1.JPG" height="283" width="307"/></p>
<p>The Russian mathematician Kolmogorov proved thru his complexity theory that &#8211; With the uniform probability distribution on the space of bitstrings of length n, the probability that a string is incompressible by c is at least 1 − 2^(-c+1) + 2^−n (More details please refer to <a href="http://en.wikipedia.org/wiki/Kolmogorov_complexity" target="_blank">wikipedia here</a>)</p>
<p>Lets take a arithmetic approach to prove the above theory.</p>
<p>The byte i.e 8 bit can represent 256 variety of data i.e 0, 1, 2, &#8230;.255 &#8211; As you can see there are totally 256 variation or possible values and assume if each one appears only once(in 256 byte) then you cannot represent all these possibilities with a domain contains smaller set than 256.</p>
<p>All existing compression algorithms can compress the data if a data set where all possibilities are not present. e.g: In a input data if only 8 <strong><span style="COLOR: #0000ff"><strong>unique</strong></span></strong> bytes present, then we just need 3 bits(0, 1, 2, &#8230;.7 &#8211; total possibilities) to represent each byte of the input data.</p>
<p>Note: Each algorithm works differently, however all existing algorithms like dictionary based, LZW, Huffman, RLE etc works based on patterns, repetitions, appearance, occurrence or uniqueness of the data.</p>
<p>People often fail to digest the simplicity of the above theory and often claim they have developed magic data compressor which can compress any type of data. (Refer <a href="http://www.faqs.org/faqs/compression-faq/part1/" target="_blank">comp.compression</a>)</p>
<p>(Note: I am not claiming any such magic compressor &#8211; but I claim such possibilities)</p>
<p><strong><span style="TEXT-DECORATION: underline"><span style="COLOR: #0000ff">2. Why do I believe that random data compression is possible?</span></span></strong></p>
<p>Don&#8217;t mistake me, I strongly believe in Kolmogorov complexity and number theory, It is impossible to develop magic compressor which can compress any type of DATA.</p>
<p>In my school days I learnt that our universe was in the form of bindurupi (in the form of dot or originated from nothing) and expanded (still expanding), generating so many possibilities and varieties. As per the quantum theory matter and anti matter joins together &#8211; during this process new variations appears. Better example is each couple produce totally different baby, the new baby is neither 100% of father or 100% of mother (or not even 50% father + 50% mother), Baby might be having few feature of father and mother and ancestors, but it also got its own uniqueness. Genes keep mutating, but based on what?</p>
<p>Note: &#8220;Nothing&#8221; doesn&#8217;t mean empty or zero, we are still not clear about nothing. In Sanskrit and Sanathan mythology it is mentioned about Brahma vidya which deals with separation of matter and anti matter.</p>
<p>A matter is a materialistic world or physical world which we can see, we can feel or we can represent, and store (Random data is also a type of matter), whereas as anti matter we cannot imagine or can be represented, even quantum theory couldn&#8217;t provide any visual way to see or feel this. (May be human beings are not yet ready to digest this because we are still living in materialistic world)</p>
<p>As per Einstein&#8217;s Energy mass relationship E = mc^2 which states &#8220;E&#8221; energy generated when a object of mass &#8220;m&#8221; is converted to energy, neither mass nor energy were conserved separately. This is basic used in Nuclear fusion.</p>
<p>Now you might be thinking &#8211; What is the relation between Einstein&#8217;s mass/energy theory and random data compression?</p>
<p>In one of the lecture Einstein mentioned that you can convert mass &#8220;m&#8221; to energy &#8220;E&#8221;, neither mass or energy is destroyable. Similarly you can <span style="COLOR: #ff8040"><strong style="COLOR: #ff0000">convert energy to matter</strong></span> !</p>
<p>One of the audience asked Einstein how is it possible to generate mass from energy? &#8211; effectively if you have enough energy you can generate any type of object like Gold, Uranium etc.</p>
<p>Einstein said it is outside the definition of Physics and a new world opens thru spiritual way to prove it. Mostly Einstein had a idea of anti matter at that time. In my view anti matter is huge energy reservoir which acted on matter to generate variety of materials we see today. (Imagine Nuclear fission of heavier material), please note I believe there should different type of antimatter similar to matter we see)</p>
<p>We heard many such information in ancient India and even current time it is believed that few got such capability to generate objects. e.g:</p>
<p>1. Pandavas had Akshya patre &#8211; to generate food <br/>2. Kamadhenu &#8211; A holy cow could provide anything sage Vasista asked for <br/>3. Devine Sri Sathya Sai Baba</p>
<p>I will not go further in spiritual way, which I cannot address properly or I have no sufficient knowledge in this area.</p>
<p>As long as we treat the material as data we cannot compress the random data, the day we find the way to represent the anti matter we can achieve this or in other words we should deal with existing random data as transition or state representation, because data is static form of Bhrahma vidya, it is anti matter which generates all variations.</p>
<p><span style="TEXT-DECORATION: underline">What does it mean transition or state representation?</span></p>
<p>There exist a huge set of static data &#8211; The size of static data is beyond the imagination of human being. The so called anti matter acts on these matter and generates new set of static data. (This is how universe is expanding?)</p>
<p>The static data is growing! then how come it is static? (I used static to represent default or initial set) During this process there is a different stage of the matter it passes thru. e.g: Radio activity material decays over a period of long time to generate another material, during this long process in the intervals we have variety of data or material, if we know the state of the item we can accurately tell the process involved in it or time involved in it, Similarly if we knew all the process it passed thru, then we can define the state of the material.</p>
<p>Kolmogorov turing machine is actually generating data from above said static data. We call it as highly random or high entropy because we cannot define it as set or within domain. We human beings not reached the stage where we can define the random. Note: Random doesn&#8217;t mean uncertainty or non guessable future, but it is predefined beyond the imagination or understanding limit of materialistic human.</p>
<p>Summary : As long as we treat the random numbers as data we cannot compress it, because number theory which applicable to data clearly proves that it is impossible. However the day if we find the way to treat them as either anti matter or state transition then we can change the way we represent the data.</p>
<p><strong><span style="COLOR: #0000ff"><strong><span style="TEXT-DECORATION: underline">3. The future for compression and possible solution with transition representation and unsorting techniques</span></strong></span></strong></p>
<p>In above section I mentioned about state or transition representation, now if we represent random data as state or transition then we will see totally different scenario or concept.</p>
<p>I will take a small portion of the static data I mentioned above, i.e a subset of 256 unique values. That mean I need 8 bits to represent all variations. If in every 256 data each value appears only once then none of the existing algorithm can compress the data. (Although such entropy chances in real time is not high, but they are highly random that cannot be compressed)</p>
<p><span style="COLOR: #800080">[I will try to prove this using conventional manner instead of mathematical form, I don't want my readers get confused with mathematical <span style="COLOR: rgb(128,0,128)">formulas</span>, I will try to avoid any use of formula or function unless it is un-avoidable].</span></p>
<p>Using a unsorting algorithm we can achieve compression of such data between 10% to 50%!!!.</p>
<p>The catch here is data set is fixed, but only their positions are different or data not organized. When all numbers are unique we need to find a way to represent original position (Which requires lesser than 8 bits per value)</p>
<p>I hope I have not confused you, let me describe in other way</p>
<p>I have a set of 256 variations and all are unique &#8211; In this scenario I need 8bits to represent each value, The values might be 250, 140, 0, 130, 239, &#8230; and so on, each value appears only once.</p>
<p>If you use any of the existing compression algorithm we cannot achieve any compression, since there is no repetition or patterns present (Highly random and each value is unique)</p>
<p>In fact most of the compressor generates compressed data in the form of above mentioned form i.e highly unique and non repeatable patterns (Close to unique within every 256 block, but not exact) A better example is random digits created by the RAND group. Please refer Mark Nelson <a href="http://marknelson.us/2006/06/20/million-digit-challenge" target="_blank">The Million Random Digit Challenge Revisited</a>. (This data is very close to each 256 variations, but not 100% unique &#8211; that is based on Kolmogorov complexity)</p>
<p>But my methods can compress such random data (Not Kolmogorov complexity or his Turing machine based random data &#8211; because it is not 100% unique within 256 block, I am yet to reach there)</p>
<p><strong><span style="COLOR: #0000ff"><strong><span style="TEXT-DECORATION: underline">What is unsorting algorithm?</span></strong></span></strong> In our graduation or engineering we studied various sorting algorithms like bubble sort, quick sort, merge sort etc, However we never came across anything called unsorting algorithm! &#8211; Why we need it?</p>
<p>Actually an unsorting algorithm is shuffling back the sorted items back into original order or position. Although such algorithms may not be that useful in real time applications, but I will demonstrate compression of 256 unique data using that.</p>
<p>If there are 256 data, then there are 256 positions that means still I need 8 bit to store each position. However unsorting algorithm is not just about placing the sorted order into original list, it also uses minimal place or storage required to represent original places.</p>
<p>Lets take merge sort, if we have list of two items, then we just need 1 bit(both best and worst case) to unsort back to original list, that bit represent which is smaller among two. If we have four items then we need 4 bits best case scenario and 5 bits in worst case, If we have 8 items then we need 12 bits best case and 17 bits worst case scenario. (Note: In case of 8 items we need actually 24 bits to represent all unique values), If we have 256 unique items we need 1024bits(128byte) best case and 1856 worst case. Actually 256 unique value requires 2048 bits. As you might observed using reverse merge sort we can achieve the compression mentioned above. I will not go in details, because it will fill another page. If you need further details please <a href="http://blog.adityon.com/contact-me/" target="_blank">contact me</a>. Please write me if you need a working code of such compression using reverse merge sort. Similar results can be obtained using reverse quick sort.</p>
<p>Above compressor works well for each value uniqueness within every 256 values, However in real time data is not that unique and Kolmogorov doesn&#8217;t guarantee that entropy within predefined set of data.</p>
<p>The above concept is same as probability theory, i.e If we toss a coin chances of getting either head or tail is 50%, However it won&#8217;t guarantee that for every two successive toss one will head and other will be tail. But over large number of toss you will end up with 50% head and 50% tail.</p>
<p>Coming back to random digits created by the RAND group &#8211; I observed for every 256 block of data uniqueness of values lies anywhere between 166 to 146, that means if we find a way to patch the missing unique number within every 256 block, then we reached the stage of random data compression, however patched unique number + unsorting information should be less than 2047 bits for every 256 block. I am still looking for such possibility.</p>
<p>Well now other approach, as I mentioned in previous section instead of representing random data as data, but as state or transition then new concept opens. i.e treat random digits of RAND group as unique transition of another process.</p>
<p>Continue reading next article &#8211; <a href="http://blog.adityon.com/2009/12/random-data-compression-is-it-possible-part-2/">using unsorting techniques for random data compression</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.adityon.com/2009/12/random-data-compression-is-it-possible-part-1/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

