<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Reports from the Valley</title>
	<atom:link href="http://kurtrose.com/thevalley/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://kurtrose.com/thevalley</link>
	<description>Kurt's Blog</description>
	<pubDate>Sat, 27 Jun 2009 06:56:33 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
	<language>en</language>
			<item>
		<title>Extending Python (improved)&#8230;</title>
		<link>http://kurtrose.com/thevalley/?p=75</link>
		<comments>http://kurtrose.com/thevalley/?p=75#comments</comments>
		<pubDate>Sat, 27 Jun 2009 06:56:33 +0000</pubDate>
		<dc:creator>Kurt</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://kurtrose.com/thevalley/?p=75</guid>
		<description><![CDATA[I&#8217;ve fixed a bug in the previous HTML crushing function, and greatly improved the performance.  Also, I will explain the way the algorithm works in a bit more detail.
The basic concept of the algorithm is to do a single (destructive) pass over the HTML string, without allocating any new memory.  To achieve this, [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve fixed a bug in the <a href="http://kurtrose.com/thevalley/?p=62">previous HTML crushing function</a>, and greatly improved the performance.  Also, I will explain the way the algorithm works in a bit more detail.</p>
<p>The basic concept of the algorithm is to do a single (destructive) pass over the HTML string, without allocating any new memory.  To achieve this, it keeps track of two offsets inside the HTML string: copy_from and copy_to.  Each run through the loop, copy_from is incremented.  Then, there is a bunch of code for checking the current syntactic state of the copy_from location within the HTML (and set some flags depending on that state).  Depending on how those flags have been set, the character at offset copy_from may or may not be copied to offset copy_to.</p>
<p>This operation can be visualized as &#8220;crushing&#8221; the string, by moving all of the text parts of the HTML next to each other on the left side of the string.  The final result is that the shorter string consisting of only the text part of the HTML is written over the original HTML string.</p>
<p>The following improvements were made for this version of the code:</p>
<ul>
<li>If no &lt;body&gt; is present, will simply grab everything (useful for snippets of HTML instead of only whole pages).</li>
<li>Only checks for special tags (&lt;script&gt;, &lt;body&gt;, &lt;style&gt;, &lt;/script&gt;, &lt;/body&gt; &lt;/style&gt;) in the case that a &lt; character has been detected.  This reduces the number of comparisons that must be made on every character down to three.</li>
<li>Fixed a major bug where it checked for the closure of the previous tag after it checked for the opening of the current tag.  If the end of one tag touched the beginning of the other, all of the second tag would be included as if it were text.  (e.g. &quot;&lt;/p&gt;&lt;/b&gt;&quot; would be crushed to &quot;/b&gt;&quot; instead of &quot;&quot;)
</ul>
<div style="font-family: Courier; line-height: 1">
//<br />
//&nbsp;does&nbsp;what&nbsp;name&nbsp;implies:&nbsp;case&nbsp;insensitive&nbsp;wide-character&nbsp;string&nbsp;comparison&nbsp;out&nbsp;to&nbsp;the&nbsp;count&#8217;th&nbsp;character<br />
//&nbsp;NOTE:&nbsp;wcsincmp&nbsp;is&nbsp;safe&nbsp;to&nbsp;call&nbsp;even&nbsp;if&nbsp;one&nbsp;or&nbsp;the&nbsp;other&nbsp;string&nbsp;terminates&nbsp;before&nbsp;count,&nbsp;but&nbsp;is&nbsp;not&nbsp;safe<br />
//&nbsp;to&nbsp;call&nbsp;if&nbsp;BOTH&nbsp;strings&nbsp;&nbsp;may&nbsp;terminate&nbsp;before&nbsp;count<br />
//<br />
int&nbsp;wcsincmp(const&nbsp;wchar_t*&nbsp;str1,&nbsp;const&nbsp;wchar_t*&nbsp;str2,&nbsp;size_t&nbsp;count)<br />
{<br />
&nbsp;&nbsp;&nbsp;&nbsp;int&nbsp;i;<br />
&nbsp;&nbsp;&nbsp;&nbsp;for(i=0;&nbsp;i&lt;count;&nbsp;i++)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(towupper(str1[i])&nbsp;!=&nbsp;towupper(str2[i]))<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return&nbsp;0;<br />
&nbsp;&nbsp;&nbsp;&nbsp;return&nbsp;1;<br />
}<br />
&nbsp;<br />
&nbsp;<br />
//<br />
//&nbsp;crushes&nbsp;the&nbsp;HTML&nbsp;string&nbsp;down,&nbsp;leaving&nbsp;only&nbsp;text&nbsp;behind;&nbsp;returns&nbsp;the&nbsp;length&nbsp;of&nbsp;the&nbsp;new&nbsp;string<br />
//<br />
size_t&nbsp;clean(wchar_t*&nbsp;html)<br />
{<br />
&nbsp;&nbsp;&nbsp;size_t&nbsp;copy_to&nbsp;=&nbsp;0;&nbsp;//the&nbsp;location&nbsp;inside&nbsp;string&nbsp;html&nbsp;that&nbsp;characters&nbsp;are&nbsp;being&nbsp;copied&nbsp;to<br />
&nbsp;&nbsp;&nbsp;size_t&nbsp;copy_from&nbsp;=&nbsp;0;&nbsp;//the&nbsp;location&nbsp;inside&nbsp;string&nbsp;html&nbsp;that&nbsp;characters&nbsp;are&nbsp;being&nbsp;copied&nbsp;from<br />
&nbsp;&nbsp;&nbsp;int&nbsp;intext&nbsp;=&nbsp;1;&nbsp;//are&nbsp;we&nbsp;currently&nbsp;looking&nbsp;at&nbsp;text&nbsp;or&nbsp;attributes&nbsp;inside&nbsp;a&nbsp;tag<br />
&nbsp;&nbsp;&nbsp;int&nbsp;inscript&nbsp;=&nbsp;0;&nbsp;//are&nbsp;we&nbsp;currently&nbsp;inside&nbsp;a&nbsp;javascript&nbsp;tag?<br />
&nbsp;&nbsp;&nbsp;int&nbsp;instyle&nbsp;=&nbsp;0;&nbsp;//are&nbsp;we&nbsp;currently&nbsp;inside&nbsp;a&nbsp;css&nbsp;style&nbsp;tag?<br />
&nbsp;&nbsp;&nbsp;int&nbsp;found_body&nbsp;=&nbsp;0;<br />
&nbsp;&nbsp;&nbsp;//keep&nbsp;going&nbsp;until&nbsp;we&nbsp;reach&nbsp;the&nbsp;end&nbsp;of&nbsp;the&nbsp;string&nbsp;(some&nbsp;conditions&nbsp;inside&nbsp;may&nbsp;cause&nbsp;early&nbsp;termination&nbsp;of&nbsp;the&nbsp;loop)<br />
&nbsp;&nbsp;&nbsp;for(&nbsp;;&nbsp;html[copy_from]&nbsp;!=&nbsp;L&#8217;\0&#8242;;&nbsp;copy_from++)<br />
&nbsp;&nbsp;&nbsp;{&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(copy_from&nbsp;&#038;&&nbsp;html[copy_from-1]&nbsp;==&nbsp;L&#8217;&gt;&#8217;)&nbsp;//&nbsp;skips&nbsp;this&nbsp;one&nbsp;if&nbsp;copy_from&nbsp;is&nbsp;zero<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;intext&nbsp;=&nbsp;1;&nbsp;//if&nbsp;LAST&nbsp;character&nbsp;is&nbsp;&#8217;&gt;&#8217;,&nbsp;we&nbsp;are&nbsp;out&nbsp;of&nbsp;a&nbsp;tag<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;indiv_attr&nbsp;=&nbsp;0;&nbsp;//if&nbsp;we&nbsp;are&nbsp;in&nbsp;text,&nbsp;we&nbsp;are&nbsp;not&nbsp;in&nbsp;the&nbsp;attribute&nbsp;section&nbsp;of&nbsp;a&nbsp;div<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(html[copy_from]&nbsp;==&nbsp;L&#8217;&lt;&#8217;)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;intext&nbsp;=&nbsp;0;&nbsp;//if&nbsp;CURRENT&nbsp;character&nbsp;is&nbsp;&#8217;&lt;&#8217;,&nbsp;we&nbsp;are&nbsp;back&nbsp;in&nbsp;a&nbsp;tag<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//since&nbsp;we&nbsp;are&nbsp;entering&nbsp;a&nbsp;tag,&nbsp;check&nbsp;to&nbsp;see&nbsp;if&nbsp;it&nbsp;is&nbsp;one&nbsp;of&nbsp;the&nbsp;special&nbsp;tags<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(wcsincmp(html+copy_from,&nbsp;L&quot;&lt;script&quot;,&nbsp;wcslen(L&quot;&lt;script&quot;)))&nbsp;//NOTE:&nbsp;decent&nbsp;compiler&nbsp;will&nbsp;move&nbsp;wcslen&nbsp;call&nbsp;out&nbsp;of&nbsp;loop<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;inscript&nbsp;=&nbsp;1;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(wcsincmp(html+copy_from,&nbsp;L&quot;&lt;style&quot;,&nbsp;wcslen(L&quot;&lt;style&quot;)))<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;instyle&nbsp;=&nbsp;1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(wcsincmp(html+copy_from,&nbsp;L&quot;&lt;body&quot;,&nbsp;wcslen(L&quot;&lt;body&quot;)))<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;copy_to&nbsp;=&nbsp;0;&nbsp;//if&nbsp;we&nbsp;find&nbsp;an&nbsp;opening&nbsp;body&nbsp;tag,&nbsp;reset&nbsp;to&nbsp;the&nbsp;beginning&nbsp;of&nbsp;the&nbsp;string<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;found_body&nbsp;=&nbsp;1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(found_body&nbsp;&#038;&&nbsp;wcsincmp(html+copy_from,&nbsp;L&quot;&lt;/body&gt;&quot;,&nbsp;&nbsp;wcslen(L&quot;&lt;/body&quot;)))<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;break;&nbsp;//if&nbsp;we&nbsp;find&nbsp;the&nbsp;end&nbsp;body&nbsp;tag,&nbsp;we&nbsp;are&nbsp;done;&nbsp;exit&nbsp;the&nbsp;loop<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(inscript)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;//inside&nbsp;a&nbsp;script,&nbsp;look&nbsp;for&nbsp;the&nbsp;end&nbsp;of&nbsp;the&nbsp;script<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(wcsincmp(html+copy_from,&nbsp;L&quot;&lt;/script&gt;&quot;,&nbsp;wcslen(L&quot;&lt;/script&gt;&quot;)))<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;inscript&nbsp;=&nbsp;0;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(instyle)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;//inside&nbsp;a&nbsp;style,&nbsp;look&nbsp;for&nbsp;the&nbsp;end&nbsp;of&nbsp;the&nbsp;style<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(wcsincmp(html+copy_from,&nbsp;L&quot;&lt;/style&gt;&quot;,&nbsp;wcslen(L&quot;&lt;/style&gt;&quot;)))<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;instyle&nbsp;=&nbsp;0;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(intext&nbsp;&#038;&&nbsp;!inscript&nbsp;&#038;&&nbsp;!instyle)&nbsp;//skip&nbsp;non-text,&nbsp;script,&#038;nbspand&nbsp;style<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;html[copy_to++]&nbsp;=&nbsp;html[copy_from];//copy,&nbsp;then&nbsp;increment&nbsp;pointer&nbsp;for&nbsp;next&nbsp;time<br />
&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;html[copy_to]&nbsp;=&nbsp;L&#8217;\0&#8242;;&nbsp;//pointer&nbsp;was&nbsp;left&nbsp;one&nbsp;past&nbsp;the&nbsp;last&nbsp;valid&nbsp;character&nbsp;by&nbsp;loop<br />
&nbsp;&nbsp;&nbsp;return&nbsp;copy_to;&nbsp;//length&nbsp;is&nbsp;equal&nbsp;to&nbsp;the&nbsp;last&nbsp;index&nbsp;+&nbsp;1,&nbsp;aka&nbsp;the&nbsp;index&nbsp;of&nbsp;the&nbsp;null&nbsp;terminator<br />
}
</div>
]]></content:encoded>
			<wfw:commentRss>http://kurtrose.com/thevalley/?feed=rss2&amp;p=75</wfw:commentRss>
		</item>
		<item>
		<title>Extending Python&#8230;</title>
		<link>http://kurtrose.com/thevalley/?p=62</link>
		<comments>http://kurtrose.com/thevalley/?p=62#comments</comments>
		<pubDate>Mon, 15 Jun 2009 11:34:11 +0000</pubDate>
		<dc:creator>Kurt</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://kurtrose.com/thevalley/?p=62</guid>
		<description><![CDATA[I have successfully created my first C Python module!
The purpose of the module is to extract just the text from an HTML document.  The problem with existing solutions, is that they are all based on semantically parsing the HTML into some kind of tree structure.  This is much more complicated than necessary, and [...]]]></description>
			<content:encoded><![CDATA[<p>I have successfully created my first C Python module!</p>
<p>The purpose of the module is to extract just the text from an HTML document.  The problem with existing solutions, is that they are all based on semantically parsing the HTML into some kind of tree structure.  This is much more complicated than necessary, and also brittle (i.e. an unclosed tag can cause the parsing to fail &#8212; even though this has no impact on extracting the text.)</p>
<p>So, what this module does is simply go left to right down the string and starts copying everything that is not a tag to the beginning of the string.  This does an extremely efficient in-place &#8220;compaction&#8221;.</p>
<p>I thought about implementing this method in Python, however the problem that I could not get around was the immutability of strings in Python.  The fact that strings must all be maintained seems like it would require far too many copies for a C-style character by character processing.</p>
<p>Here is the core algorithm.  As always, feel free to include this in whatever you are working on.</p>
<div style="font-family: Courier; line-height: 1">
&nbsp;<br />
#include&nbsp;&lt;wchar.h&gt;<br />
#include&nbsp;&lt;wctype.h&gt;<br />
&nbsp;<br />
//<br />
//&nbsp;does&nbsp;what&nbsp;name&nbsp;implies:&nbsp;case&nbsp;insensitive&nbsp;wide-character&nbsp;string&nbsp;comparison&nbsp;out&nbsp;to&nbsp;the&nbsp;count&#8217;th&nbsp;character<br />
//<br />
int&nbsp;wcsincmp(const&nbsp;wchar_t*&nbsp;str1,&nbsp;const&nbsp;wchar_t*&nbsp;str2,&nbsp;size_t&nbsp;count)<br />
{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;int&nbsp;i;<br />
&nbsp;&nbsp;&nbsp;for(i=0;&nbsp;i&lt;count;&nbsp;i++)<br />
&nbsp;&nbsp;&nbsp;if(towupper(str1[i])&nbsp;!=&nbsp;towupper(str2[i]))&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return&nbsp;0;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;return&nbsp;1;<br />
}<br />
&nbsp;<br />
//<br />
//&nbsp;crushes&nbsp;the&nbsp;HTML&nbsp;string&nbsp;down,&nbsp;leaving&nbsp;only&nbsp;text&nbsp;behind;&nbsp;returns&nbsp;the&nbsp;length&nbsp;of&nbsp;the&nbsp;new&nbsp;string&nbsp;(minus&nbsp;null&nbsp;terminator,&nbsp;as&nbsp;standard&nbsp;strlen&nbsp;does)<br />
//<br />
size_t&nbsp;clean(wchar_t*&nbsp;html)<br />
{<br />
&nbsp;&nbsp;&nbsp;size_t&nbsp;copy_to&nbsp;=&nbsp;0;&nbsp;//the&nbsp;location&nbsp;inside&nbsp;string&nbsp;html&nbsp;that&nbsp;characters&nbsp;are&nbsp;being&nbsp;copied&nbsp;to<br />
&nbsp;&nbsp;&nbsp;size_t&nbsp;copy_from&nbsp;=&nbsp;0;&nbsp;//the&nbsp;location&nbsp;inside&nbsp;string&nbsp;html&nbsp;that&nbsp;characters&nbsp;are&nbsp;being&nbsp;copied&nbsp;from<br />
&nbsp;&nbsp;&nbsp;int&nbsp;intext&nbsp;=&nbsp;0;&nbsp;//are&nbsp;we&nbsp;currently&nbsp;looking&nbsp;at&nbsp;text&nbsp;or&nbsp;attributes&nbsp;inside&nbsp;a&nbsp;tag<br />
&nbsp;&nbsp;&nbsp;int&nbsp;inscript&nbsp;=&nbsp;0;&nbsp;//are&nbsp;we&nbsp;currently&nbsp;inside&nbsp;a&nbsp;javascript&nbsp;tag?<br />
&nbsp;&nbsp;&nbsp;for(&nbsp;;&nbsp;html[copy_from]&nbsp;!=&nbsp;L&#8217;\0&#8242;&nbsp;&#038;&&nbsp;!wcsincmp(html+copy_from,&nbsp;L&quot;&lt;body&quot;,&nbsp;wcslen(L&quot;&lt;body&quot;));&nbsp;copy_from++);&nbsp;//find&nbsp;the&nbsp;start&nbsp;of&nbsp;the&nbsp;body<br />
&nbsp;&nbsp;&nbsp;for(&nbsp;;&nbsp;html[copy_from]&nbsp;!=&nbsp;L&#8217;\0&#8242;&nbsp;&#038;&&nbsp;html[copy_from]&nbsp;!=&nbsp;&#8217;&gt;&#8217;;&nbsp;copy_from++);&nbsp;//find&nbsp;the&nbsp;end&nbsp;of&nbsp;the&nbsp;body&nbsp;tag<br />
&nbsp;&nbsp;&nbsp;if(copy_from&nbsp;==&nbsp;0)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return&nbsp;html;&nbsp;//loop&nbsp;below&nbsp;requires&nbsp;that&nbsp;copy_from&nbsp;has&nbsp;advanced&nbsp;at&nbsp;least&nbsp;1&nbsp;character<br />
&nbsp;&nbsp;&nbsp;for(&nbsp;;&nbsp;html[copy_from]&nbsp;!=&nbsp;&#8217;\0&#8242;&nbsp;&#038;&&nbsp;!wcsincmp(html+copy_from,&nbsp;L&quot;&lt;/body&gt;&quot;,&nbsp;wcslen(L&quot;&lt;/body&quot;));&nbsp;copy_from++)<br />
&nbsp;&nbsp;&nbsp;{&nbsp;//finally,&nbsp;go&nbsp;through&nbsp;the&nbsp;body&nbsp;until&nbsp;we&nbsp;find&nbsp;the&nbsp;end&nbsp;of&nbsp;body&nbsp;tag&nbsp;or&nbsp;the&nbsp;end&nbsp;of&nbsp;the&nbsp;file<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(wcsincmp(html+copy_from,&nbsp;L&quot;&lt;script&quot;,&nbsp;wcslen(L&quot;script&quot;)))<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;inscript&nbsp;=&nbsp;1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(inscript)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;//inside&nbsp;a&nbsp;script,&nbsp;just&nbsp;look&nbsp;for&nbsp;the&nbsp;end&nbsp;of&nbsp;the&nbsp;script<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(wcsincmp(html+copy_from,&nbsp;L&quot;&lt;/script&gt;&quot;,&nbsp;wcslen(L&quot;&lt;/script&gt;&quot;)))<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;inscript&nbsp;=&nbsp;0;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;else<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;//otherwise,&nbsp;are&nbsp;we&nbsp;in&nbsp;a&nbsp;tag&nbsp;or&nbsp;not&nbsp;in&nbsp;a&nbsp;tag?<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(html[copy_from-1]&nbsp;==&nbsp;L&#8217;&gt;&#8217;)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;intext&nbsp;=&nbsp;1;&nbsp;//if&nbsp;LAST&nbsp;character&nbsp;is&nbsp;&#8217;&gt;&#8217;,&nbsp;start&nbsp;copying&nbsp;again<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(html[copy_from]&nbsp;==&nbsp;L&#8217;&lt;&#8217;)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;intext&nbsp;=&nbsp;0;&nbsp;//if&nbsp;CURRENT&nbsp;character&nbsp;is&nbsp;&#8217;&lt;&#8217;,&nbsp;stop&nbsp;copying<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(intext)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;html[copy_to++]&nbsp;=&nbsp;html[copy_from];&nbsp;//copy,&nbsp;then&nbsp;increment&nbsp;pointer&nbsp;for&nbsp;next&nbsp;time&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;html[copy_to]&nbsp;=&nbsp;L&#8217;\0&#8242;;&nbsp;//pointer&nbsp;was&nbsp;left&nbsp;one&nbsp;past&nbsp;the&nbsp;last&nbsp;valid&nbsp;character&nbsp;by&nbsp;loop<br />
&nbsp;&nbsp;&nbsp;return&nbsp;copy_to;&nbsp;<br />
}
</div>
]]></content:encoded>
			<wfw:commentRss>http://kurtrose.com/thevalley/?feed=rss2&amp;p=62</wfw:commentRss>
		</item>
		<item>
		<title>A better way to get random numbers</title>
		<link>http://kurtrose.com/thevalley/?p=54</link>
		<comments>http://kurtrose.com/thevalley/?p=54#comments</comments>
		<pubDate>Sat, 30 May 2009 21:23:44 +0000</pubDate>
		<dc:creator>Kurt</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://kurtrose.com/thevalley/?p=54</guid>
		<description><![CDATA[First, the punchline.  There is a method to generate a sequence of pseudo-random numbers six times faster than the standard C rand().  (138 million random integers per second over 26 million on my machine.)  Even better, this method only requires seven lines of code.

int lfsr() //linear feedback shift register
{
&#160;&#160;&#160;const int taps = [...]]]></description>
			<content:encoded><![CDATA[<p>First, the punchline.  There is a method to generate a sequence of pseudo-random numbers six times faster than the standard C rand().  (138 million random integers per second over 26 million on my machine.)  Even better, this method only requires seven lines of code.</p>
<p><span style="font-family: courier,monospace;"><br />
int lfsr() //linear feedback shift register<br />
{<br />
&nbsp;&nbsp;&nbsp;const int taps = 0&#215;8000000C; //this is a magic number<br />
&nbsp;&nbsp;&nbsp;static int rnd_num = 1;<br />
&nbsp;&nbsp;&nbsp;if(0&#215;80000000 &amp; rnd_num)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;rnd_num = (rnd_num &lt;&lt; 1) ^ taps;<br />
&nbsp;&nbsp;&nbsp;else<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;rnd_num = rnd_num &lt;&lt; 1;<br />
&nbsp;&nbsp;&nbsp;return rnd_num;<br />
}<br />
</span></p>
<p>A Linear Feedback Shift Register is a method used by hardware engineers to generate pseudo random numbers at extremely high speed.  As in 3Gbps speed.  When implemented in hardware, an LFSR actually takes fewer transistors than a counter.  I&#8217;m not going to go into the theory of operation.  If you&#8217;d like to explore that, wikipedia has a <a href="http://en.wikipedia.org/wiki/Linear_feedback_shift_register">pretty good article</a>.</p>
<p>Feel free to cut and paste the above function into whatever project you are working on.</p>
<p>The only downside of an LFSR is that it is totally unsuited to cryptography.  A maximal period LFSR (like the one I provided) will go through every possible 32 bit integer.  This means if you know the taps and the last number, you know all future numbers that will be generated.  The more common function used for rand()&#8211;the Mersenne Twister algorithm&#8211;has more &#8220;hidden state&#8221; and one has to observe hundreds of outputs before being able to make this prediction.</p>
<p>So, don&#8217;t use this method if your purpose is cryptography.  However, if you are trying to do a Montecarlo simulation or generating random inputs for unit tests, a Linear Feedback Shift Register is superior to the default C rand().</p>
<p>Edit:<br />
You should only do this substitution in C.  If you are working in a higher level language, the language overhead will probably cost more than using the built-in call to the default C implementation.</p>
]]></content:encoded>
			<wfw:commentRss>http://kurtrose.com/thevalley/?feed=rss2&amp;p=54</wfw:commentRss>
		</item>
		<item>
		<title>Javascript Execution Environment</title>
		<link>http://kurtrose.com/thevalley/?p=48</link>
		<comments>http://kurtrose.com/thevalley/?p=48#comments</comments>
		<pubDate>Sat, 24 Jan 2009 06:46:24 +0000</pubDate>
		<dc:creator>Kurt</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://kurtrose.com/thevalley/?p=48</guid>
		<description><![CDATA[I&#8217;ve set up a Javascript execution environment.
It isn&#8217;t much &#8212; there are two elements in the page.  The bottom part is a Javascript command line which executes whatever single line of Javascript code is entered.  The top part is an edit area to write larger pieces of Javascript code.  The larger piece [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://kurtrose.com/JavaCommander/">I&#8217;ve set up a Javascript execution environment.</a></p>
<p>It isn&#8217;t much &#8212; there are two elements in the page.  The bottom part is a Javascript command line which executes whatever single line of Javascript code is entered.  The top part is an edit area to write larger pieces of Javascript code.  The larger piece of code can be executed by calling the &#8220;run()&#8221; function from the command line.</p>
<p>Also, a command history is maintained on the command line.  Use the up and down arrows to scroll through previously entered commands.  The contents of the top editor area can be stored by using the File->Save Cookie and File->Load Cookie functions.</p>
<p>Go ahead and try it, you can start with alert(&#8221;Hello World!&#8221;) in the command line.</p>
]]></content:encoded>
			<wfw:commentRss>http://kurtrose.com/thevalley/?feed=rss2&amp;p=48</wfw:commentRss>
		</item>
		<item>
		<title>Learning Linux: syntax highlighting in nano</title>
		<link>http://kurtrose.com/thevalley/?p=39</link>
		<comments>http://kurtrose.com/thevalley/?p=39#comments</comments>
		<pubDate>Wed, 21 Jan 2009 07:43:43 +0000</pubDate>
		<dc:creator>Kurt</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://kurtrose.com/thevalley/?p=39</guid>
		<description><![CDATA[In case you are not familiar with it, nano is a basic text editor that comes with many Linux distributions.  It has a lower learning curve than vi or emacs, but also fewer features.
However, you can spruce nano up a bit by enabling features in its configuration file, &#8220;.nanorc&#8221; under your home directory.  Specifically, you [...]]]></description>
			<content:encoded><![CDATA[<p>In case you are not familiar with it, nano is a basic text editor that comes with many Linux distributions.  It has a lower learning curve than vi or emacs, but also fewer features.</p>
<p>However, you can spruce nano up a bit by enabling features in its configuration file, &#8220;.nanorc&#8221; under your home directory.  Specifically, you can provide regular expression based syntax highlighting rules.</p>
<p>For your entertainment, I present meta-syntax highlighting rules for nano.  These syntax rules, when put into your .nanrc file, will cause nano to syntax highlight the .nanorc file.</p>
<div style="font-family:courier">
syntax &#34;nanorc&#34; &#34;\.nanorc$&#34;<br />
color brightred &#34;(color|syntax|set|start=|end=)&#34;<br />
color green &#34;(red|brightred|green|brightgreen|blue|brightblue|cyan|brightcyan|white|brightwhite|yellow|brightyellow)&#34;<br />
color brightcyan &#34;#+(.*)&#34;<br />
color brightyellow &#34;\&#34;(([^']|\\\&#34;)*|\[(^]]|\\\]|\^\])*\])*\&#34;&#34;<br />
color brightblue &#34;[^\\\^]\[(\^\])?([^]]|[\\]{1,3,5,7,9}\])*\]&#34;<br />
color brightgreen &#34;\(&#34; &#34;\)&#34; &#34;[|]&#34;<br />
color brightred &#34;\^&#34;<br />
color red &#34;\\.&#34;
</div>
<p>Here is what this looks like when applied (to itself):</p>
<p><img src="http://www.kurtrose.com/thevalley/wp-content/images/nano_meta_highlight.jpg" /></p>
]]></content:encoded>
			<wfw:commentRss>http://kurtrose.com/thevalley/?feed=rss2&amp;p=39</wfw:commentRss>
		</item>
		<item>
		<title>How to make a cool photo quality 3&#8242;x3&#8242; Hubble image poster for under $5&#8230;</title>
		<link>http://kurtrose.com/thevalley/?p=31</link>
		<comments>http://kurtrose.com/thevalley/?p=31#comments</comments>
		<pubDate>Thu, 14 Aug 2008 06:53:12 +0000</pubDate>
		<dc:creator>Kurt</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://kurtrose.com/thevalley/?p=31</guid>
		<description><![CDATA[
Download a high quality image from here: Best of Hubble Large Images.
Cut the picture into 1800&#215;1200 pixel chunks (this will look good printed out on standard 4&#215;6 photos).  There are probably many ways to do this, the easiest (free) method I have found so far is to use Irfanview.  Use &#8220;Edit-&#62;Create Custom Crop [...]]]></description>
			<content:encoded><![CDATA[<ol>
<li>Download a high quality image from here: <a href="http://opostaff.stsci.edu/~levay/picks/big.html">Best of Hubble Large Images</a>.</li>
<li>Cut the picture into 1800&#215;1200 pixel chunks (this will look good printed out on standard 4&#215;6 photos).  There are probably many ways to do this, the easiest (free) method I have found so far is to use <a href="http://www.irfanview.com/">Irfanview</a>.  Use &#8220;Edit-&gt;Create Custom Crop Selection(Shift+C)&#8221;.  This will let you select an exact rectangle of pixels.  Then cut and paste your selection into a separate image file and save that one.  Work your way across the image 1800 pixels at a time till you reach the end, then go 1200 pixels down and repeat.</li>
<li>Upload the set of smaller images to a professional photo printer.  The best I have found so far is <a href="http://www.clarkcolor.com">Clark Color</a>.  Printing the photos shouldn&#8217;t cost more than $0.08 each plus shipping.</li>
<li>Tape the photos up to a flat, clean wall.  (Or you could get fancy with glue and frames, but I promised this would be under $5).</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://kurtrose.com/thevalley/?feed=rss2&amp;p=31</wfw:commentRss>
		</item>
		<item>
		<title>Photonics talk</title>
		<link>http://kurtrose.com/thevalley/?p=11</link>
		<comments>http://kurtrose.com/thevalley/?p=11#comments</comments>
		<pubDate>Fri, 08 Aug 2008 07:06:10 +0000</pubDate>
		<dc:creator>Kurt</dc:creator>
		
		<category><![CDATA[Technical Talks]]></category>

		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[ieee]]></category>

		<category><![CDATA[photonics]]></category>

		<guid isPermaLink="false">http://kurtrose.com/thevalley/?p=11</guid>
		<description><![CDATA[I went to a talk put on by IEEE on Tuesday.
Abstract
Research Group Info
That has a lot of information, but assumes you are already basically familiar with the field.  Here is a layman&#8217;s summary: using lithographic techniques, one can carve laser devices out of a thin dielectric (non-conducting) surface.  The reason this is possible [...]]]></description>
			<content:encoded><![CDATA[<p>I went to a talk put on by IEEE on Tuesday.<br />
<a href="http://ewh.ieee.org/r6/scv/leos/archive/leosabs20080805.html">Abstract</a><br />
<a href="http://mpdg.usc.edu/">Research Group Info</a></p>
<p>That has a lot of information, but assumes you are already basically familiar with the field.  Here is a layman&#8217;s summary: using lithographic techniques, one can carve laser devices out of a thin dielectric (non-conducting) surface.  The reason this is possible is that modern lithography technology can etch features smaller than visible light wavelengths.</p>
<div class="wp-caption alignleft" style="width: 110px"><a href="http://kurtrose.com/thevalley/wp-content/images/usc_photonic_crystal_defect_laser.jpg"><img title="Photonic Crystal Defect Laser" src="http://kurtrose.com/thevalley/wp-content/images/usc_photonic_crystal_defect_laser.jpg" alt="" width="100" height="100" /></a><p class="wp-caption-text">Image Credit USC Viterbi School of Engineering</p></div>
<p>If the structure is carved out in just the right shape, then sort of like a pendulum it will have a natural oscillation frequency.  If the film is excited with energy, it will radiate coherent light at a frequency determined by its geometry.  Currently, that energy is in the form of light but hopefully in the future it will be an electric current.</p>
<p>The complicated part is figuring out what geometry will get the light to come out efficiently.  That is, in the desired direction and frequency.  This takes super computer simulations and intuition to determine.</p>
<p>Fascinating stuff, some of the people in the audience asked some great questions.  Dr. O&#8217;Brien was mostly interested in getting the technology to the point of being able to integrate with semiconductors.</p>
]]></content:encoded>
			<wfw:commentRss>http://kurtrose.com/thevalley/?feed=rss2&amp;p=11</wfw:commentRss>
		</item>
		<item>
		<title>Hello World!</title>
		<link>http://kurtrose.com/thevalley/?p=5</link>
		<comments>http://kurtrose.com/thevalley/?p=5#comments</comments>
		<pubDate>Thu, 07 Aug 2008 06:30:28 +0000</pubDate>
		<dc:creator>Kurt</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://kurtrose.com/thevalley/?p=5</guid>
		<description><![CDATA[Hello anyone who is actually reading this right now.
I&#8217;m not sure if I&#8217;m going to use this for anything, but it was neat to set up.
]]></description>
			<content:encoded><![CDATA[<p>Hello anyone who is actually reading this right now.</p>
<p>I&#8217;m not sure if I&#8217;m going to use this for anything, but it was neat to set up.</p>
]]></content:encoded>
			<wfw:commentRss>http://kurtrose.com/thevalley/?feed=rss2&amp;p=5</wfw:commentRss>
		</item>
		<item>
		<title>Hello World!</title>
		<link>http://kurtrose.com/thevalley/?p=3</link>
		<comments>http://kurtrose.com/thevalley/?p=3#comments</comments>
		<pubDate>Thu, 07 Aug 2008 06:26:19 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://kurtrose.com/thevalley/?p=3</guid>
		<description><![CDATA[Hello friends and family that would actually be reading this.
]]></description>
			<content:encoded><![CDATA[<p>Hello friends and family that would actually be reading this.</p>
]]></content:encoded>
			<wfw:commentRss>http://kurtrose.com/thevalley/?feed=rss2&amp;p=3</wfw:commentRss>
		</item>
	</channel>
</rss>
