<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Neofreko</title>
	<atom:link href="http://blog.neofreko.com/index.php/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.neofreko.com</link>
	<description>Nothing but neofreko</description>
	<lastBuildDate>Tue, 07 Feb 2012 08:51:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>#php #unicode #insertcursewordhere</title>
		<link>http://blog.neofreko.com/index.php/2012/02/07/php-unicode-insertcursewordhere/</link>
		<comments>http://blog.neofreko.com/index.php/2012/02/07/php-unicode-insertcursewordhere/#comments</comments>
		<pubDate>Tue, 07 Feb 2012 08:44:22 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=922</guid>
		<description><![CDATA[PHP and Unicode is just, a well-known secret. My story began with SOLR DIH. It was way too slow. So, I ended up building another tool to replace DIH. Something friendly to CPU and memory. I did it. Not. After &#8230; <a href="http://blog.neofreko.com/index.php/2012/02/07/php-unicode-insertcursewordhere/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>PHP and Unicode is just, a well-known secret.</p>
<p>My story began with SOLR DIH. It was way too slow. So, I ended up building another tool to replace DIH. Something friendly to CPU and memory. I did it. Not.</p>
<p>After indexing I realized that my text was full of ???????. WTF. Yeah, it&#8217;s encoding problem. So I&#8217;ve spent a day trying to solve this thing. What works for me was this <a href="http://www.php.net/manual/en/ref.mbstring.php#50298">advice from 2005</a>:</p>
<blockquote><p>PHP can input and output Unicode, but a little different from what Microsoft means: when Microsoft says &#8220;Unicode&#8221;, it unexplicitly means little-endian UTF-16 with BOM(FF FE = chr(255).chr(254)), whereas PHP&#8217;s &#8220;UTF-16&#8243; means big-endian with BOM. For this reason, PHP does not seem to be able to output Unicode CSV file for Microsoft Excel. Solving this problem is quite simple: just put BOM infront of UTF-16LE string.</p>
<p>Example:</p>
<p>$unicode_str_for_Excel = chr(255).chr(254).mb_convert_encoding( $utf8_str, &#8216;UTF-16LE&#8217;, &#8216;UTF-8&#8242;);</p></blockquote>
<p>I get no ??? char anymore. I don&#8217;t know if it is the proper way to do it. And I still get occasional htmlspecialchars invalid multibyte sequence. I think I&#8217;ll classify this solution as &#8220;miracle&#8221;.</p>
<p>When&#8217;s PHP 6 finally come?</p>
<p><strong>Update:</strong></p>
<p><strong>CRAP. DOES NOT WORK!.</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2012/02/07/php-unicode-insertcursewordhere/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Belajar Mahout</title>
		<link>http://blog.neofreko.com/index.php/2012/01/14/belajar-mahout/</link>
		<comments>http://blog.neofreko.com/index.php/2012/01/14/belajar-mahout/#comments</comments>
		<pubDate>Sat, 14 Jan 2012 11:11:00 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[mahout]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=919</guid>
		<description><![CDATA[My brain exploded. That&#8217;s pretty much my limit. So, yes, I&#8217;ve been interested in SOLR, Apache Tika, and of course Mahout. The promise of classifying and clustering data are enough to persuade me digging up examples about Mahout. So far, &#8230; <a href="http://blog.neofreko.com/index.php/2012/01/14/belajar-mahout/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>My brain exploded. That&#8217;s pretty much my limit.</p>
<p>So, yes, I&#8217;ve been interested in SOLR, Apache Tika, and of course Mahout. The promise of classifying and clustering data are enough to persuade me digging up examples about Mahout. So far, what really helps is Seinfeld demo example. It gives me a proper example to try. We can replace the data with our own to get the gist on how Mahout would work.</p>
<p>However, I haven&#8217;t get the gist yet. So far, I&#8217;ve tried to cluster 2 datasource. One of them is blog post from navinot.com. Here&#8217;s an excerpt from cluster-dump:</p>
<p>C-18 [Ponsel, Mobile, Internet, Mobile internet, Iphone]<br />
- /6 Hal Tentang Mobile Internet.txt<br />
- /Mobile Application_ Masa Depan Yang Ditunggu?.txt<br />
- /Netbook_ Bakal Lenyap Seperti PDA?.txt<br />
- /Premium Mobile Internet?.txt<br />
- /The Gaps in Indonesian Internet.txt<br />
- /iPhone &amp; Telkomsel_ Deal or No Deal?.txt</p>
<p>I&#8217;m imagining Mahout with cluster it into similarity groups. My guess is, it was clustered by keyword. I was using kmeans.</p>
<p>Anyway, obviousy we need to filter out stopwords. Mahout can read directly from SOLR/Lucene index. But I didn&#8217;t have much luck on it. Something to do with empty terms or whatever. Probably, feed my raw data to SOLR and then query it out to get text files will make a decent workaround.</p>
<p>That&#8217;s a wrap for today. Time for Pocket Legend!</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2012/01/14/belajar-mahout/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to use Lucene 3.4 with Mahout 0.5</title>
		<link>http://blog.neofreko.com/index.php/2011/12/30/how-to-use-lucene-3-4-with-mahout-0-5/</link>
		<comments>http://blog.neofreko.com/index.php/2011/12/30/how-to-use-lucene-3-4-with-mahout-0-5/#comments</comments>
		<pubDate>Fri, 30 Dec 2011 07:39:09 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[mahout]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=916</guid>
		<description><![CDATA[As you may have been frustrated by, Mahout 0.5 was build with Lucene 3.1 dependencies. How on earth can we use Lucene 3.4 then? My SOLR is 3.4, I want to use its index to play with Mahout. Fear not. &#8230; <a href="http://blog.neofreko.com/index.php/2011/12/30/how-to-use-lucene-3-4-with-mahout-0-5/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As you may have been frustrated by, Mahout 0.5 was build with Lucene 3.1 dependencies. How on earth can we use Lucene 3.4 then? My SOLR is 3.4, I want to use its index to play with Mahout.</p>
<p>Fear not. Just download mahout 0.5, both source and binaries. Extract them, it will reside on the same folder i.e: mahout-distribution-0.5. Now, open up that pom.xml. Find lucene and replace 3.1.0 with 3.4.0. I reckon there are only 4 of them. The do mvn install. You may want to skip tests with: mvn -DskipTests=true install.</p>
<p>Once done, do: export MAHOUT_CORE=1</p>
<p>Run mahout from mahout-distribution-0.5/bin folder.</p>
<p>I don&#8217;t get index incompatibility anymore. But, I keep getting not enough term vector on document. Even I&#8217;ve set the schema.xml dan reindex my docs.</p>
<p>Will write more once I pass it.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/12/30/how-to-use-lucene-3-4-with-mahout-0-5/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>11 Things I want in Japan</title>
		<link>http://blog.neofreko.com/index.php/2011/12/16/11-things-i-want-in-japan/</link>
		<comments>http://blog.neofreko.com/index.php/2011/12/16/11-things-i-want-in-japan/#comments</comments>
		<pubDate>Thu, 15 Dec 2011 20:02:18 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[I love Japan]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=913</guid>
		<description><![CDATA[When Hiro asked me what I would particularly see in Japan, I became spaced out. It turned out that I don&#8217;t really have the list. But when think about it all over again, I do have a short list[1]. Hatsune &#8230; <a href="http://blog.neofreko.com/index.php/2011/12/16/11-things-i-want-in-japan/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>When Hiro asked me what I would particularly see in Japan, I became spaced out. It turned out that I don&#8217;t really have the list. But when think about it all over again, I do have a short list[1].</p>
<ol>
<li>Hatsune Miku, either having her figure or ultimately watching her live concert.</li>
<li>Dolfie, see one, touch it. Owning it can wait until I have moved into my own apartment (and permission from my wife)</li>
<li>Akihabara. It&#8217;s a common thing in everyone&#8217;s list I believe. Probably visiting a maid cafe would be nice.</li>
<li>Comiket. Seeing a field of figures would be awesome. The crowd looks scary tho.</li>
<li>Tokyo Game Show. Same as above.</li>
<li>My own figures collection. I believe I have a different taste with@dannychoo. Not on the dolfie part tho <img src='http://blog.neofreko.com/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> . I have been wanting a Macross figure and other figures I saw in some dannychoo&#8217;s pics.</li>
<li>See cosplayer at Yoyogi Park.</li>
<li>Tanabata festival. Firework show. I want to wear yukata someday.</li>
<li>Taking photos of lots sailorfuku school girls.</li>
<li>Talking of sailorfuku, I love Scandal girlband. I want to have their merchandise.</li>
<li>Dir En Grey concert? I want to scream in Dozing Green song.</li>
</ol>
<div>More will come.</div>
<div>Yeah, I am in Japan right now. It still feels unreal, even after almost a week. I keep saying I am living in a dream to my wife. Wife said it&#8217;s jetlag effect.</div>
<div>The tag of this blog finally came true.</div>
<div>Yeayyyyyyyyyyyyyyyyyyyyyyyyyyy! Aaaaaaaaaaaa! Saiko desuuuuuuuuuuuuuuuuuuuu! Yeaaaayyyyyyyyyyy! *rolling on the floor, bursting happiness tears*</div>
<div>Footnote:</div>
<div>[1] This is pretty common to me. My brain spins slowly. And it tends work best when writing when I can took more time thinking. This results in me being unspontaneous.</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/12/16/11-things-i-want-in-japan/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>bacula-fd authentication failed</title>
		<link>http://blog.neofreko.com/index.php/2011/12/06/bacula-fd-authentication-failed/</link>
		<comments>http://blog.neofreko.com/index.php/2011/12/06/bacula-fd-authentication-failed/#comments</comments>
		<pubDate>Tue, 06 Dec 2011 08:49:24 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[bacula]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=910</guid>
		<description><![CDATA[So, been trying to setup two-tier bacula. Stuck on cannot connect to client. To grab more clues, run this line on bacula-fd machine: sudo /usr/sbin/bacula-fd -f -d100 -c /etc/bacula/bacula-fd.conf Then do bconsole dance on bacula-dir machine. Use &#8220;status&#8221; command to &#8230; <a href="http://blog.neofreko.com/index.php/2011/12/06/bacula-fd-authentication-failed/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>So, been trying to setup two-tier bacula. Stuck on cannot connect to client.</p>
<p>To grab more clues, run this line on bacula-fd machine:</p>
<p>sudo /usr/sbin/bacula-fd -f -d100 -c /etc/bacula/bacula-fd.conf</p>
<p>Then do bconsole dance on bacula-dir machine. Use &#8220;status&#8221; command to test connection to client. I you see cram-md5 authentication failed in bacula-fd output then you have the same problem as I did. Otherwise, check your connection between bacula-dir and nacula-fd</p>
<p>Here&#8217;s the solution:</p>
<p>in bacula-fd.conf:</p>
<pre>Director {
  Name = bacula-director
  Password = "remote-fd-passwd"
}</pre>
<p>&#8220;Name&#8221; should be your bacula-dir Name. You can found this in bacula-dir.conf. See below:</p>
<pre>Director {                            # define myself
  Name = bacula-director
  DIRport = 9101                # where we listen for UA connections
  QueryFile = "/etc/bacula/scripts/query.sql"
  WorkingDirectory = "/var/lib/bacula"
  PidDirectory = "/var/run/bacula"
  Maximum Concurrent Jobs = 1
  Password = "blahblahblah"         # Console password
  Messages = Daemon
  DirAddress = 127.0.0.1
}</pre>
<p>Then the password part on bacula-fd.conf should be the same with your client definition in bacula-dir.conf. eg:</p>
<pre>Client {
  Name = remote-fd
  Address = remote.fd.ip
  FDPort = 9102
  Catalog = MyCatalog
  Password = "remote-fd-passwd"          # password for FileDaemon
  File Retention = 30 days            # 30 days
  Job Retention = 6 months            # six months
  AutoPrune = yes                     # Prune expired Jobs/Files
}</pre>
<p>Don&#8217;t forget to restart bacula-dir and bacula-fd after modifying conf files. Good luck!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/12/06/bacula-fd-authentication-failed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bacula Backup Management</title>
		<link>http://blog.neofreko.com/index.php/2011/12/03/bacula-backup-management/</link>
		<comments>http://blog.neofreko.com/index.php/2011/12/03/bacula-backup-management/#comments</comments>
		<pubDate>Sat, 03 Dec 2011 16:02:38 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=907</guid>
		<description><![CDATA[So, been evaluating backup management solutions. Simple shell script won&#8217;t do good since I want auto-rotation, better scheduling and incremental backup support (storage friendly). Open source solution is a no-brainer priority. So, I&#8217; taking bacula from bacula.org for a spin &#8230; <a href="http://blog.neofreko.com/index.php/2011/12/03/bacula-backup-management/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>So, been evaluating backup management solutions. Simple shell script won&#8217;t do good since I want auto-rotation, better scheduling and incremental backup support (storage friendly). Open source solution is a no-brainer priority. So, I&#8217; taking bacula from <a href="http://bacula.org" target="_blank">bacula.org</a> for a spin for a few days to understand how it works. So far so good. It has good scheduler with better-than-cron syntax, eg: 1st mon at 23:05 to schedule a backup on first monday of a month at 23:05. Neat eh? <a href="https://help.ubuntu.com/community/Bacula" target="_blank">Installing Bacula in Ubuntu</a> is a pretty straight forward process. There&#8217;s a fatal misconfiguration tho. It&#8217;s known and simple to fix.</p>
<blockquote><p>The definition of the catalog Mycatalog contains a line starting with &#8216; dbname = &#8220;bacula;&#8221;&#8216;. The semicolon inside the quotes should follow the quotes, so should start with &#8216; dbname = &#8220;bacula&#8221; ;&#8217;</p></blockquote>
<p>Another tip, Pool resources by default are not enabling auto-volume naming. This is pretty annoying for a newbie. And it is way better to have it enabled by default to make it work out-of-the-box. To this, add label format option into your Pool resource definition. Something like this:</p>
<pre>Pool {
  Name = File
  Pool Type = Backup
  Volume Use Duration = 23h
  LabelFormat = "VolFile-${Year}-${Month:p/2/0/r}-${Day:p/2/0/r}"
}</pre>
<p>It will automagically creating proper Pool Volume when job runs, eg: Vol-2011-12-02.</p>
<p>You can use bat GUI to list your jobs and volumes. To restore files, see my tips <a href="http://neofreko.posterous.com/restoring-backup-from-bacula" target="_blank">here</a>.</p>
<p>PS:</p>
<p>When you changed bacula-sd.conf, aside from restarting bacula-sd service do restart bacula-director service as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/12/03/bacula-backup-management/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On the obvious</title>
		<link>http://blog.neofreko.com/index.php/2011/11/04/on-the-obvious/</link>
		<comments>http://blog.neofreko.com/index.php/2011/11/04/on-the-obvious/#comments</comments>
		<pubDate>Fri, 04 Nov 2011 06:16:47 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Youtube]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=904</guid>
		<description><![CDATA[Why would google put Youtube sidebar (pop up) on Google+? Because it is logical. Youtube has many great content. I love watching CN Blue video on it all the times. Why does it feel awkward? Awkward? How? You mean the &#8230; <a href="http://blog.neofreko.com/index.php/2011/11/04/on-the-obvious/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Why would google put Youtube sidebar (pop up) on Google+?</p>
<p><em>Because it is logical. Youtube has many great content. I love watching CN Blue video on it all the times.</em></p>
<p>Why does it feel awkward?</p>
<p><em>Awkward? How? You mean the pop up window? Because making the Youtube sidebar reside in G+ page would take space. It&#8217;s logical.</em></p>
<p>Will other button from Google product catalog follow suit?</p>
<p><em>Obviously. Perhaps. Aside from Youtube? Blogger?</em></p>
<p>Are you sure?</p>
<p><em>What? Why looking at me like that? Do I look like Vic?</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/11/04/on-the-obvious/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Javascript is the new cool</title>
		<link>http://blog.neofreko.com/index.php/2011/11/03/javascript-is-the-new-cool/</link>
		<comments>http://blog.neofreko.com/index.php/2011/11/03/javascript-is-the-new-cool/#comments</comments>
		<pubDate>Thu, 03 Nov 2011 09:23:57 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=900</guid>
		<description><![CDATA[Still on NLP. You&#8217;re read about UIMA and the Stanford parser (typed dependency) the other day. I&#8217;ve been wondering if there is an online service provider for Stanford parser. Lo and behold, there is. Although it is better to spend &#8230; <a href="http://blog.neofreko.com/index.php/2011/11/03/javascript-is-the-new-cool/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Still on NLP. You&#8217;re read about UIMA and the <a title="on Cloning Siri, understanding the query" href="http://blog.neofreko.com/index.php/2011/11/01/on-cloning-siri-understanding-the-query/" target="_blank">Stanford parser</a> (typed dependency) the other day. I&#8217;ve been wondering if there is an online service provider for Stanford parser. Lo and behold, there is. Although it is better to spend some cash and run my own Standford parser, this service should be suffice to test my idea. You can find it <a href="http://nlp.naturalparsing.com/documentation/datatypes" target="_blank">here</a>, along with JSONP API to access it.</p>
<p>More resource on javascript and NPL, there are some on <a title="Javascript NLP" href="http://www.chrisumbel.com/article/node_js_natural_language_nlp" target="_blank">github</a>. There are some Entity Extractors as well. And it concludes that some <a href="https://github.com/spencermountain/nlp-node/blob/master/lib/singularize.js" target="_blank">extractor simply cannot get away from using a dictionary</a>. Maybe I will end up with one.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/11/03/javascript-is-the-new-cool/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>At the end of the tide</title>
		<link>http://blog.neofreko.com/index.php/2011/11/03/at-the-end-of-the-tide/</link>
		<comments>http://blog.neofreko.com/index.php/2011/11/03/at-the-end-of-the-tide/#comments</comments>
		<pubDate>Thu, 03 Nov 2011 07:37:22 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Regular Hours]]></category>
		<category><![CDATA[Daydreaming]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/index.php/2011/11/03/at-the-end-of-the-tide/</guid>
		<description><![CDATA[I holding &#8220;Naked Conversation&#8221;, reading trough one of its chapter.Apparently, blog projects for big companies had been started since 2003. I was still in college, getting myself familiar with Delphi. Eight years later, blog has gone mainstream. Even some may &#8230; <a href="http://blog.neofreko.com/index.php/2011/11/03/at-the-end-of-the-tide/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I holding &#8220;Naked Conversation&#8221;, reading trough one of its chapter.Apparently, blog projects for big companies had been started since 2003. I was still in college, getting myself familiar with Delphi. Eight years later, blog has gone mainstream. Even some may say: outdated.</p>
<p>That made me think, I and many of us have been trapped at the end of the tides for too long. WE never see the small ripple. But we always ended up being washed away by the resulting tide. That is probably the &#8220;perks&#8221; of staying in developing country. To make it even worse, the internet that has been helping us cutting the gap has also keeping us away from the ripple. Small changes, controversial innovation that may be the next mainstream.</p>
<p>Needless to say, given those ripples put up upon our eyes, determining the next mainstream will need eagle hunch. Still, the years of gap feels unfair. Those years may have been spent on learning and honing what matters most.</p>
<p>All we can do is cutting more gap, catch up faster. And hopefully we will land on the same plateu. Hear what everyone hear, see what everyone see.</p>
<p>Sounds like american dream eh?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/11/03/at-the-end-of-the-tide/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>on Cloning Siri, understanding the query</title>
		<link>http://blog.neofreko.com/index.php/2011/11/01/on-cloning-siri-understanding-the-query/</link>
		<comments>http://blog.neofreko.com/index.php/2011/11/01/on-cloning-siri-understanding-the-query/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 15:09:36 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[dependency parser]]></category>
		<category><![CDATA[Siri]]></category>
		<category><![CDATA[typed dependency]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=894</guid>
		<description><![CDATA[Well, IUMA is interesting. I haven&#8217;t able to make the Feature extraction work. However I got the gist that it&#8217;s working similar to NLTK with additional benefit: we can construct/pipeline several analysis by configuring an XML. This is almost as &#8230; <a href="http://blog.neofreko.com/index.php/2011/11/01/on-cloning-siri-understanding-the-query/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Well, <a title="on Cloning Siri" href="http://blog.neofreko.com/index.php/2011/10/31/on-cloning-siri/" target="_blank">IUMA is interesting</a>. I haven&#8217;t able to make the Feature extraction work. However I got the gist that it&#8217;s working similar to NLTK with additional benefit: we can construct/pipeline several analysis by configuring an XML. This is almost as sweet as SOLR config.</p>
<p>Today, I&#8217;ve just found another approach on understanding user query. I thought it will help alot if we can determine the Subject, Predicate and Object of a query. We do\n&#8217;t need to understand the whole sentence but we do need to extract the essence of the query. What should our clone do if user says: how is the weather? where is bandung? do I have any meeting today?</p>
<p>Fortunately there are free implementation of typed dependency. If you want to know more about typed dependency, just google it. I will only give you an example of it. Given the query &#8220;how is the weather in jakarta&#8221;, typed dependency analysis will give us:</p>
<pre>advmod(is-2, how-1)
det(weather-4, the-3)
nsubj(is-2, weather-4)</pre>
<p>From this output, we can use the availability of subject or object to determine the essence of a query. Example above show us, it probably, weather is the essence of the query. You can test more typed dependency <a title="Standford Typed Dependency" href="http://nlp.stanford.edu:8080/parser/index.jsp" target="_blank">here</a>. Below are some more examples:</p>
<pre>Do I have meeting today</pre>
<pre>aux(have-3, do-1)
nsubj(have-3, I-2)
dobj(have-3, meeting-4)
tmod(have-3, today-5)</pre>
<pre>call John</pre>
<pre>amod(John-2, call-1)</pre>
<pre>make appointment with John on 3</pre>
<pre>dobj(make-1, appointment-2)
prep_with(make-1, John-4)
prep_on(John-4, 3-6)</pre>
<pre>texts John, send me detail</pre>
<pre>prep_text(send-4, John-2)
nsubj(detail-6, me-5)
ccomp(send-4, detail-6)</pre>
<p>From above example, it is possible for us to choose a pattern as a trigger for a datasource query. However, it will not always adequate. Some question may be hard to understand, still. As is it still too vague, such as: how do I get home. To understand this, we need to be aware that &#8220;home&#8221; is a destination/location. This should trigger some sort of map datasource.</p>
<p>I have been imagining the clone as a pluggable framework. The main function of the host program is to provide as many analysis as it can, via plugins. And then decide which datasource plugin to trigger. Typed dependency should be one plugin, <a href="http://uima.apache.org/d/uima-addons-current/ConfigurableFeatureExtractor/CFE_UG.html" target="_blank">feature extraction</a> should be another plugin.</p>
<p>Hmm, interesting.</p>
<p>PS:</p>
<p><a href="http://stackoverflow.com/questions/2705888/rdf-of-sentences" target="_blank">There are more dependency parsers</a> I still need to check.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/11/01/on-cloning-siri-understanding-the-query/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

