<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Neofreko &#187; Dev Hours</title>
	<atom:link href="http://blog.neofreko.com/index.php/category/dev-hours/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.neofreko.com</link>
	<description>Nothing but neofreko</description>
	<lastBuildDate>Tue, 07 Feb 2012 08:51:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>#php #unicode #insertcursewordhere</title>
		<link>http://blog.neofreko.com/index.php/2012/02/07/php-unicode-insertcursewordhere/</link>
		<comments>http://blog.neofreko.com/index.php/2012/02/07/php-unicode-insertcursewordhere/#comments</comments>
		<pubDate>Tue, 07 Feb 2012 08:44:22 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=922</guid>
		<description><![CDATA[PHP and Unicode is just, a well-known secret. My story began with SOLR DIH. It was way too slow. So, I ended up building another tool to replace DIH. Something friendly to CPU and memory. I did it. Not. After &#8230; <a href="http://blog.neofreko.com/index.php/2012/02/07/php-unicode-insertcursewordhere/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>PHP and Unicode is just, a well-known secret.</p>
<p>My story began with SOLR DIH. It was way too slow. So, I ended up building another tool to replace DIH. Something friendly to CPU and memory. I did it. Not.</p>
<p>After indexing I realized that my text was full of ???????. WTF. Yeah, it&#8217;s encoding problem. So I&#8217;ve spent a day trying to solve this thing. What works for me was this <a href="http://www.php.net/manual/en/ref.mbstring.php#50298">advice from 2005</a>:</p>
<blockquote><p>PHP can input and output Unicode, but a little different from what Microsoft means: when Microsoft says &#8220;Unicode&#8221;, it unexplicitly means little-endian UTF-16 with BOM(FF FE = chr(255).chr(254)), whereas PHP&#8217;s &#8220;UTF-16&#8243; means big-endian with BOM. For this reason, PHP does not seem to be able to output Unicode CSV file for Microsoft Excel. Solving this problem is quite simple: just put BOM infront of UTF-16LE string.</p>
<p>Example:</p>
<p>$unicode_str_for_Excel = chr(255).chr(254).mb_convert_encoding( $utf8_str, &#8216;UTF-16LE&#8217;, &#8216;UTF-8&#8242;);</p></blockquote>
<p>I get no ??? char anymore. I don&#8217;t know if it is the proper way to do it. And I still get occasional htmlspecialchars invalid multibyte sequence. I think I&#8217;ll classify this solution as &#8220;miracle&#8221;.</p>
<p>When&#8217;s PHP 6 finally come?</p>
<p><strong>Update:</strong></p>
<p><strong>CRAP. DOES NOT WORK!.</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2012/02/07/php-unicode-insertcursewordhere/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Belajar Mahout</title>
		<link>http://blog.neofreko.com/index.php/2012/01/14/belajar-mahout/</link>
		<comments>http://blog.neofreko.com/index.php/2012/01/14/belajar-mahout/#comments</comments>
		<pubDate>Sat, 14 Jan 2012 11:11:00 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[mahout]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=919</guid>
		<description><![CDATA[My brain exploded. That&#8217;s pretty much my limit. So, yes, I&#8217;ve been interested in SOLR, Apache Tika, and of course Mahout. The promise of classifying and clustering data are enough to persuade me digging up examples about Mahout. So far, &#8230; <a href="http://blog.neofreko.com/index.php/2012/01/14/belajar-mahout/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>My brain exploded. That&#8217;s pretty much my limit.</p>
<p>So, yes, I&#8217;ve been interested in SOLR, Apache Tika, and of course Mahout. The promise of classifying and clustering data are enough to persuade me digging up examples about Mahout. So far, what really helps is Seinfeld demo example. It gives me a proper example to try. We can replace the data with our own to get the gist on how Mahout would work.</p>
<p>However, I haven&#8217;t get the gist yet. So far, I&#8217;ve tried to cluster 2 datasource. One of them is blog post from navinot.com. Here&#8217;s an excerpt from cluster-dump:</p>
<p>C-18 [Ponsel, Mobile, Internet, Mobile internet, Iphone]<br />
- /6 Hal Tentang Mobile Internet.txt<br />
- /Mobile Application_ Masa Depan Yang Ditunggu?.txt<br />
- /Netbook_ Bakal Lenyap Seperti PDA?.txt<br />
- /Premium Mobile Internet?.txt<br />
- /The Gaps in Indonesian Internet.txt<br />
- /iPhone &amp; Telkomsel_ Deal or No Deal?.txt</p>
<p>I&#8217;m imagining Mahout with cluster it into similarity groups. My guess is, it was clustered by keyword. I was using kmeans.</p>
<p>Anyway, obviousy we need to filter out stopwords. Mahout can read directly from SOLR/Lucene index. But I didn&#8217;t have much luck on it. Something to do with empty terms or whatever. Probably, feed my raw data to SOLR and then query it out to get text files will make a decent workaround.</p>
<p>That&#8217;s a wrap for today. Time for Pocket Legend!</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2012/01/14/belajar-mahout/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to use Lucene 3.4 with Mahout 0.5</title>
		<link>http://blog.neofreko.com/index.php/2011/12/30/how-to-use-lucene-3-4-with-mahout-0-5/</link>
		<comments>http://blog.neofreko.com/index.php/2011/12/30/how-to-use-lucene-3-4-with-mahout-0-5/#comments</comments>
		<pubDate>Fri, 30 Dec 2011 07:39:09 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[mahout]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=916</guid>
		<description><![CDATA[As you may have been frustrated by, Mahout 0.5 was build with Lucene 3.1 dependencies. How on earth can we use Lucene 3.4 then? My SOLR is 3.4, I want to use its index to play with Mahout. Fear not. &#8230; <a href="http://blog.neofreko.com/index.php/2011/12/30/how-to-use-lucene-3-4-with-mahout-0-5/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As you may have been frustrated by, Mahout 0.5 was build with Lucene 3.1 dependencies. How on earth can we use Lucene 3.4 then? My SOLR is 3.4, I want to use its index to play with Mahout.</p>
<p>Fear not. Just download mahout 0.5, both source and binaries. Extract them, it will reside on the same folder i.e: mahout-distribution-0.5. Now, open up that pom.xml. Find lucene and replace 3.1.0 with 3.4.0. I reckon there are only 4 of them. The do mvn install. You may want to skip tests with: mvn -DskipTests=true install.</p>
<p>Once done, do: export MAHOUT_CORE=1</p>
<p>Run mahout from mahout-distribution-0.5/bin folder.</p>
<p>I don&#8217;t get index incompatibility anymore. But, I keep getting not enough term vector on document. Even I&#8217;ve set the schema.xml dan reindex my docs.</p>
<p>Will write more once I pass it.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/12/30/how-to-use-lucene-3-4-with-mahout-0-5/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>bacula-fd authentication failed</title>
		<link>http://blog.neofreko.com/index.php/2011/12/06/bacula-fd-authentication-failed/</link>
		<comments>http://blog.neofreko.com/index.php/2011/12/06/bacula-fd-authentication-failed/#comments</comments>
		<pubDate>Tue, 06 Dec 2011 08:49:24 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[bacula]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=910</guid>
		<description><![CDATA[So, been trying to setup two-tier bacula. Stuck on cannot connect to client. To grab more clues, run this line on bacula-fd machine: sudo /usr/sbin/bacula-fd -f -d100 -c /etc/bacula/bacula-fd.conf Then do bconsole dance on bacula-dir machine. Use &#8220;status&#8221; command to &#8230; <a href="http://blog.neofreko.com/index.php/2011/12/06/bacula-fd-authentication-failed/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>So, been trying to setup two-tier bacula. Stuck on cannot connect to client.</p>
<p>To grab more clues, run this line on bacula-fd machine:</p>
<p>sudo /usr/sbin/bacula-fd -f -d100 -c /etc/bacula/bacula-fd.conf</p>
<p>Then do bconsole dance on bacula-dir machine. Use &#8220;status&#8221; command to test connection to client. I you see cram-md5 authentication failed in bacula-fd output then you have the same problem as I did. Otherwise, check your connection between bacula-dir and nacula-fd</p>
<p>Here&#8217;s the solution:</p>
<p>in bacula-fd.conf:</p>
<pre>Director {
  Name = bacula-director
  Password = "remote-fd-passwd"
}</pre>
<p>&#8220;Name&#8221; should be your bacula-dir Name. You can found this in bacula-dir.conf. See below:</p>
<pre>Director {                            # define myself
  Name = bacula-director
  DIRport = 9101                # where we listen for UA connections
  QueryFile = "/etc/bacula/scripts/query.sql"
  WorkingDirectory = "/var/lib/bacula"
  PidDirectory = "/var/run/bacula"
  Maximum Concurrent Jobs = 1
  Password = "blahblahblah"         # Console password
  Messages = Daemon
  DirAddress = 127.0.0.1
}</pre>
<p>Then the password part on bacula-fd.conf should be the same with your client definition in bacula-dir.conf. eg:</p>
<pre>Client {
  Name = remote-fd
  Address = remote.fd.ip
  FDPort = 9102
  Catalog = MyCatalog
  Password = "remote-fd-passwd"          # password for FileDaemon
  File Retention = 30 days            # 30 days
  Job Retention = 6 months            # six months
  AutoPrune = yes                     # Prune expired Jobs/Files
}</pre>
<p>Don&#8217;t forget to restart bacula-dir and bacula-fd after modifying conf files. Good luck!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/12/06/bacula-fd-authentication-failed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bacula Backup Management</title>
		<link>http://blog.neofreko.com/index.php/2011/12/03/bacula-backup-management/</link>
		<comments>http://blog.neofreko.com/index.php/2011/12/03/bacula-backup-management/#comments</comments>
		<pubDate>Sat, 03 Dec 2011 16:02:38 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=907</guid>
		<description><![CDATA[So, been evaluating backup management solutions. Simple shell script won&#8217;t do good since I want auto-rotation, better scheduling and incremental backup support (storage friendly). Open source solution is a no-brainer priority. So, I&#8217; taking bacula from bacula.org for a spin &#8230; <a href="http://blog.neofreko.com/index.php/2011/12/03/bacula-backup-management/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>So, been evaluating backup management solutions. Simple shell script won&#8217;t do good since I want auto-rotation, better scheduling and incremental backup support (storage friendly). Open source solution is a no-brainer priority. So, I&#8217; taking bacula from <a href="http://bacula.org" target="_blank">bacula.org</a> for a spin for a few days to understand how it works. So far so good. It has good scheduler with better-than-cron syntax, eg: 1st mon at 23:05 to schedule a backup on first monday of a month at 23:05. Neat eh? <a href="https://help.ubuntu.com/community/Bacula" target="_blank">Installing Bacula in Ubuntu</a> is a pretty straight forward process. There&#8217;s a fatal misconfiguration tho. It&#8217;s known and simple to fix.</p>
<blockquote><p>The definition of the catalog Mycatalog contains a line starting with &#8216; dbname = &#8220;bacula;&#8221;&#8216;. The semicolon inside the quotes should follow the quotes, so should start with &#8216; dbname = &#8220;bacula&#8221; ;&#8217;</p></blockquote>
<p>Another tip, Pool resources by default are not enabling auto-volume naming. This is pretty annoying for a newbie. And it is way better to have it enabled by default to make it work out-of-the-box. To this, add label format option into your Pool resource definition. Something like this:</p>
<pre>Pool {
  Name = File
  Pool Type = Backup
  Volume Use Duration = 23h
  LabelFormat = "VolFile-${Year}-${Month:p/2/0/r}-${Day:p/2/0/r}"
}</pre>
<p>It will automagically creating proper Pool Volume when job runs, eg: Vol-2011-12-02.</p>
<p>You can use bat GUI to list your jobs and volumes. To restore files, see my tips <a href="http://neofreko.posterous.com/restoring-backup-from-bacula" target="_blank">here</a>.</p>
<p>PS:</p>
<p>When you changed bacula-sd.conf, aside from restarting bacula-sd service do restart bacula-director service as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/12/03/bacula-backup-management/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Javascript is the new cool</title>
		<link>http://blog.neofreko.com/index.php/2011/11/03/javascript-is-the-new-cool/</link>
		<comments>http://blog.neofreko.com/index.php/2011/11/03/javascript-is-the-new-cool/#comments</comments>
		<pubDate>Thu, 03 Nov 2011 09:23:57 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=900</guid>
		<description><![CDATA[Still on NLP. You&#8217;re read about UIMA and the Stanford parser (typed dependency) the other day. I&#8217;ve been wondering if there is an online service provider for Stanford parser. Lo and behold, there is. Although it is better to spend &#8230; <a href="http://blog.neofreko.com/index.php/2011/11/03/javascript-is-the-new-cool/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Still on NLP. You&#8217;re read about UIMA and the <a title="on Cloning Siri, understanding the query" href="http://blog.neofreko.com/index.php/2011/11/01/on-cloning-siri-understanding-the-query/" target="_blank">Stanford parser</a> (typed dependency) the other day. I&#8217;ve been wondering if there is an online service provider for Stanford parser. Lo and behold, there is. Although it is better to spend some cash and run my own Standford parser, this service should be suffice to test my idea. You can find it <a href="http://nlp.naturalparsing.com/documentation/datatypes" target="_blank">here</a>, along with JSONP API to access it.</p>
<p>More resource on javascript and NPL, there are some on <a title="Javascript NLP" href="http://www.chrisumbel.com/article/node_js_natural_language_nlp" target="_blank">github</a>. There are some Entity Extractors as well. And it concludes that some <a href="https://github.com/spencermountain/nlp-node/blob/master/lib/singularize.js" target="_blank">extractor simply cannot get away from using a dictionary</a>. Maybe I will end up with one.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/11/03/javascript-is-the-new-cool/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>on Cloning Siri, understanding the query</title>
		<link>http://blog.neofreko.com/index.php/2011/11/01/on-cloning-siri-understanding-the-query/</link>
		<comments>http://blog.neofreko.com/index.php/2011/11/01/on-cloning-siri-understanding-the-query/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 15:09:36 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[dependency parser]]></category>
		<category><![CDATA[Siri]]></category>
		<category><![CDATA[typed dependency]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=894</guid>
		<description><![CDATA[Well, IUMA is interesting. I haven&#8217;t able to make the Feature extraction work. However I got the gist that it&#8217;s working similar to NLTK with additional benefit: we can construct/pipeline several analysis by configuring an XML. This is almost as &#8230; <a href="http://blog.neofreko.com/index.php/2011/11/01/on-cloning-siri-understanding-the-query/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Well, <a title="on Cloning Siri" href="http://blog.neofreko.com/index.php/2011/10/31/on-cloning-siri/" target="_blank">IUMA is interesting</a>. I haven&#8217;t able to make the Feature extraction work. However I got the gist that it&#8217;s working similar to NLTK with additional benefit: we can construct/pipeline several analysis by configuring an XML. This is almost as sweet as SOLR config.</p>
<p>Today, I&#8217;ve just found another approach on understanding user query. I thought it will help alot if we can determine the Subject, Predicate and Object of a query. We do\n&#8217;t need to understand the whole sentence but we do need to extract the essence of the query. What should our clone do if user says: how is the weather? where is bandung? do I have any meeting today?</p>
<p>Fortunately there are free implementation of typed dependency. If you want to know more about typed dependency, just google it. I will only give you an example of it. Given the query &#8220;how is the weather in jakarta&#8221;, typed dependency analysis will give us:</p>
<pre>advmod(is-2, how-1)
det(weather-4, the-3)
nsubj(is-2, weather-4)</pre>
<p>From this output, we can use the availability of subject or object to determine the essence of a query. Example above show us, it probably, weather is the essence of the query. You can test more typed dependency <a title="Standford Typed Dependency" href="http://nlp.stanford.edu:8080/parser/index.jsp" target="_blank">here</a>. Below are some more examples:</p>
<pre>Do I have meeting today</pre>
<pre>aux(have-3, do-1)
nsubj(have-3, I-2)
dobj(have-3, meeting-4)
tmod(have-3, today-5)</pre>
<pre>call John</pre>
<pre>amod(John-2, call-1)</pre>
<pre>make appointment with John on 3</pre>
<pre>dobj(make-1, appointment-2)
prep_with(make-1, John-4)
prep_on(John-4, 3-6)</pre>
<pre>texts John, send me detail</pre>
<pre>prep_text(send-4, John-2)
nsubj(detail-6, me-5)
ccomp(send-4, detail-6)</pre>
<p>From above example, it is possible for us to choose a pattern as a trigger for a datasource query. However, it will not always adequate. Some question may be hard to understand, still. As is it still too vague, such as: how do I get home. To understand this, we need to be aware that &#8220;home&#8221; is a destination/location. This should trigger some sort of map datasource.</p>
<p>I have been imagining the clone as a pluggable framework. The main function of the host program is to provide as many analysis as it can, via plugins. And then decide which datasource plugin to trigger. Typed dependency should be one plugin, <a href="http://uima.apache.org/d/uima-addons-current/ConfigurableFeatureExtractor/CFE_UG.html" target="_blank">feature extraction</a> should be another plugin.</p>
<p>Hmm, interesting.</p>
<p>PS:</p>
<p><a href="http://stackoverflow.com/questions/2705888/rdf-of-sentences" target="_blank">There are more dependency parsers</a> I still need to check.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/11/01/on-cloning-siri-understanding-the-query/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>on Cloning Siri</title>
		<link>http://blog.neofreko.com/index.php/2011/10/31/on-cloning-siri/</link>
		<comments>http://blog.neofreko.com/index.php/2011/10/31/on-cloning-siri/#comments</comments>
		<pubDate>Mon, 31 Oct 2011 15:24:52 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[NLTK]]></category>
		<category><![CDATA[Semantic]]></category>
		<category><![CDATA[Siri]]></category>
		<category><![CDATA[UIMA]]></category>
		<category><![CDATA[Wolfram Alpha]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=890</guid>
		<description><![CDATA[Ya ya ya, it&#8217;s a novel goal. Nevertheless, it&#8217;s an interesting journey to take. To make our clone clever, it must be smart enough to understand any general query. &#8220;Who is Obama?&#8221;. &#8220;Where is Taj Mahal?&#8221;. To answer this, we &#8230; <a href="http://blog.neofreko.com/index.php/2011/10/31/on-cloning-siri/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Ya ya ya, it&#8217;s a novel goal. Nevertheless, it&#8217;s an interesting journey to take.</p>
<p>To make our clone clever, it must be smart enough to understand any general query. &#8220;Who is Obama?&#8221;. &#8220;Where is Taj Mahal?&#8221;. To answer this, we can simply forward the query to Wolfram Alpha. With a simple trick, we can also answer a floating question such as: &#8220;How&#8217;s the weather tomorrow?&#8221;. How? Simply add current geolocation to the quesry then pass along to Wolfram Alpha, eg:&#8221;How&#8217;s the weather tomorrow jakarta&#8221;. Don&#8217;t worry, Wolfram Alpha will understand what you mean.</p>
<p>Now, the hard part. We need to teach our Siri clone about ourself. I wish Wolfram Alpha is open enough that we can add new information into its database. Unfortunately it&#8217;s not open enough. Unless for enterprise user. Now, a solution for an information/data mining is inevitable.</p>
<p><a href="http://nltk.org" target="_blank">NLTK </a>on Python is a good candidate to solve the problem. I am depicting a sentence got tagged. We, then, extract the Subject, Verb and Object and pass it along to appropriate data source provider. A question such as &#8220;do I have a meeting tomorrow?&#8221; should should trigger Calendar datasource. A datasource will be an addon which register its trigger in Verb,and check other Tag type availability within a sentence.</p>
<p>Another solution may come from Apache UIMA project. I am looking at its <a href="http://uima.apache.org/d/uima-addons-current/ConfigurableFeatureExtractor/CFE_UG.html" target="_blank">Configurable Feature Extractor</a> addon. It is capable of tagging and identifying entities. Compared to simple pattern matching in our first solution, this second alternative has more metadata to match against. Further, we can combine it with SOLR to harness its search engine power. Boosting, synonim, stopword and what not.</p>
<p>Do you have something else in mind? I am being a bit practical here because I can&#8217;t comprehend much math <img src='http://blog.neofreko.com/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/10/31/on-cloning-siri/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Tungsten Replicator</title>
		<link>http://blog.neofreko.com/index.php/2011/10/11/tungsten-replicator/</link>
		<comments>http://blog.neofreko.com/index.php/2011/10/11/tungsten-replicator/#comments</comments>
		<pubDate>Tue, 11 Oct 2011 07:50:22 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[tungsten]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=875</guid>
		<description><![CDATA[Been looking for a hassle-free replication service. Well, so far I have only tried traditional mysql master slave. The problem was provisioning (oh boy, I use this word finally) the slave. Usually, I mysqldump-ed the master then mysqlimport them on &#8230; <a href="http://blog.neofreko.com/index.php/2011/10/11/tungsten-replicator/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Been looking for a hassle-free replication service. Well, so far I have only tried traditional mysql master slave. The problem was provisioning (oh boy, I use this word finally) the slave. Usually, I mysqldump-ed the master then mysqlimport them on the slave. Then I need to set log master and log pos etc. Mysqldump can take some hours and there&#8217;s a possibilty of performance penalty. Luckily you can use <a href="http://www.percona.com/doc/percona-xtrabackup/howtos/recipes_ibkx_local.html" target="_blank">innobackupex</a> from Percona. Performance penalty can be avoided (not really sure completely) and log pos is automatically recorded in dump file. No more manually taking note of master pos on master db. But still, you need to manually set log pos and log master on slave db.</p>
<p>Welcome <a href="http://code.google.com/p/tungsten-replicator/" target="_blank">Tungsten Replicator</a>. The basic principle are the same. We need to enable replication setting in mysql. Dump the master db to provision the slave node. However, we need not to set the log pos and log master as tungsten replicator will do this for as. Further, we can provision multiple slaves at once with this tool. Neat eh.</p>
<p>I barely scratch the surface here with tungsten. Here are some command that my come handy:</p>
<p><code>replicator start/stop #start/stop the replicator manager</code></p>
<p>We can have many replication services (cluster). When you want to reset replication service (redo from beginning) make sure to to the replicator first. Then drop tungsten_&lt;your service&gt; database before innobackupex-ing master db to be used in slaves.</p>
<p><code>trepctl -service &lt;your service&gt; status</code></p>
<p>Useful to check replication status. There are probably slave problem, especially when you are bin-log-do on not all databases.</p>
<p>I&#8217;ll write more on tungsten later. So far, it eases my pain.</p>
<p>References:</p>
<ol>
<li>Getting started with Tungsten. Learn how to setup replication services. It&#8217;s a command line away. <a href="http://datacharmer.blogspot.com/2011/06/getting-started-with-tungsten.html">http://datacharmer.blogspot.com/2011/06/getting-started-with-tungsten.html</a></li>
<li>The Cookbook. Various tricks you can do with Tungsten. I want to test direct replication to provision new slave in order to omit innobackupex activity. <a href="http://code.google.com/p/tungsten-replicator/wiki/TRCBasicInstallation">http://code.google.com/p/tungsten-replicator/wiki/TRCBasicInstallation</a></li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/10/11/tungsten-replicator/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WordPress Opengraph Plugin for Facebook Timeline</title>
		<link>http://blog.neofreko.com/index.php/2011/10/01/wordpress-opengraph-plugin-for-facebook-timeline/</link>
		<comments>http://blog.neofreko.com/index.php/2011/10/01/wordpress-opengraph-plugin-for-facebook-timeline/#comments</comments>
		<pubDate>Fri, 30 Sep 2011 17:01:59 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[opengraph]]></category>
		<category><![CDATA[timeline]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=866</guid>
		<description><![CDATA[Yes, my previous copy paste research on opengraph and facebook plugin finally got some proper love. It&#8217;s now, still copy pasted, wrapped in a wordpress plugin. Based on Post Ender sample wp plugin, modified option page. Well, that&#8217;s it. If &#8230; <a href="http://blog.neofreko.com/index.php/2011/10/01/wordpress-opengraph-plugin-for-facebook-timeline/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Yes, my <a title="Crash Course, Facebook Timeline Opengraph" href="http://blog.neofreko.com/index.php/2011/09/30/crash-test-facebook-timeline-opengraph/" target="_blank">previous</a> copy paste research on opengraph and facebook plugin finally got some proper love. It&#8217;s now, still copy pasted, wrapped in a wordpress plugin. Based on Post Ender sample wp plugin, modified option page.</p>
<p>Well, that&#8217;s it. If you want some sort of boilerplate to fire up your opengraph app, then download this plugin. Remember, it&#8217;s not pretty at all.</p>
<p>Download <a href="http://www.box.net/shared/rlo69bikxdlh4oepg3q2" target="_blank">here</a></p>
<p><a href="http://blog.neofreko.com/wp-content/uploads/2011/10/opengrapher-plugin.png"><img class="alignnone size-medium wp-image-867" title="opengrapher-plugin" src="http://blog.neofreko.com/wp-content/uploads/2011/10/opengrapher-plugin-300x163.png" alt="" width="300" height="163" /></a>.</p>
<p><strong>Update:</strong></p>
<p>If you are using old app (before Facebook Timeline announcement), make sure you have enabled to advance auth dialog option id your app advanced setting.</p>
<p>Bug fix 0.1.1:</p>
<ul>
<li>Fix invalid og:type when viewing single post. Should have been article instead of (default) website</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/10/01/wordpress-opengraph-plugin-for-facebook-timeline/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

