<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Neofreko &#187; solr</title>
	<atom:link href="http://blog.neofreko.com/index.php/tag/solr/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.neofreko.com</link>
	<description>Nothing but neofreko</description>
	<lastBuildDate>Tue, 07 Feb 2012 08:51:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>#php #unicode #insertcursewordhere</title>
		<link>http://blog.neofreko.com/index.php/2012/02/07/php-unicode-insertcursewordhere/</link>
		<comments>http://blog.neofreko.com/index.php/2012/02/07/php-unicode-insertcursewordhere/#comments</comments>
		<pubDate>Tue, 07 Feb 2012 08:44:22 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=922</guid>
		<description><![CDATA[PHP and Unicode is just, a well-known secret. My story began with SOLR DIH. It was way too slow. So, I ended up building another tool to replace DIH. Something friendly to CPU and memory. I did it. Not. After &#8230; <a href="http://blog.neofreko.com/index.php/2012/02/07/php-unicode-insertcursewordhere/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>PHP and Unicode is just, a well-known secret.</p>
<p>My story began with SOLR DIH. It was way too slow. So, I ended up building another tool to replace DIH. Something friendly to CPU and memory. I did it. Not.</p>
<p>After indexing I realized that my text was full of ???????. WTF. Yeah, it&#8217;s encoding problem. So I&#8217;ve spent a day trying to solve this thing. What works for me was this <a href="http://www.php.net/manual/en/ref.mbstring.php#50298">advice from 2005</a>:</p>
<blockquote><p>PHP can input and output Unicode, but a little different from what Microsoft means: when Microsoft says &#8220;Unicode&#8221;, it unexplicitly means little-endian UTF-16 with BOM(FF FE = chr(255).chr(254)), whereas PHP&#8217;s &#8220;UTF-16&#8243; means big-endian with BOM. For this reason, PHP does not seem to be able to output Unicode CSV file for Microsoft Excel. Solving this problem is quite simple: just put BOM infront of UTF-16LE string.</p>
<p>Example:</p>
<p>$unicode_str_for_Excel = chr(255).chr(254).mb_convert_encoding( $utf8_str, &#8216;UTF-16LE&#8217;, &#8216;UTF-8&#8242;);</p></blockquote>
<p>I get no ??? char anymore. I don&#8217;t know if it is the proper way to do it. And I still get occasional htmlspecialchars invalid multibyte sequence. I think I&#8217;ll classify this solution as &#8220;miracle&#8221;.</p>
<p>When&#8217;s PHP 6 finally come?</p>
<p><strong>Update:</strong></p>
<p><strong>CRAP. DOES NOT WORK!.</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2012/02/07/php-unicode-insertcursewordhere/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to use Lucene 3.4 with Mahout 0.5</title>
		<link>http://blog.neofreko.com/index.php/2011/12/30/how-to-use-lucene-3-4-with-mahout-0-5/</link>
		<comments>http://blog.neofreko.com/index.php/2011/12/30/how-to-use-lucene-3-4-with-mahout-0-5/#comments</comments>
		<pubDate>Fri, 30 Dec 2011 07:39:09 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[mahout]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=916</guid>
		<description><![CDATA[As you may have been frustrated by, Mahout 0.5 was build with Lucene 3.1 dependencies. How on earth can we use Lucene 3.4 then? My SOLR is 3.4, I want to use its index to play with Mahout. Fear not. &#8230; <a href="http://blog.neofreko.com/index.php/2011/12/30/how-to-use-lucene-3-4-with-mahout-0-5/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As you may have been frustrated by, Mahout 0.5 was build with Lucene 3.1 dependencies. How on earth can we use Lucene 3.4 then? My SOLR is 3.4, I want to use its index to play with Mahout.</p>
<p>Fear not. Just download mahout 0.5, both source and binaries. Extract them, it will reside on the same folder i.e: mahout-distribution-0.5. Now, open up that pom.xml. Find lucene and replace 3.1.0 with 3.4.0. I reckon there are only 4 of them. The do mvn install. You may want to skip tests with: mvn -DskipTests=true install.</p>
<p>Once done, do: export MAHOUT_CORE=1</p>
<p>Run mahout from mahout-distribution-0.5/bin folder.</p>
<p>I don&#8217;t get index incompatibility anymore. But, I keep getting not enough term vector on document. Even I&#8217;ve set the schema.xml dan reindex my docs.</p>
<p>Will write more once I pass it.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2011/12/30/how-to-use-lucene-3-4-with-mahout-0-5/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Four Tips on Designing SOLR Schema</title>
		<link>http://blog.neofreko.com/index.php/2008/08/20/tips-on-designing-solr-schema/</link>
		<comments>http://blog.neofreko.com/index.php/2008/08/20/tips-on-designing-solr-schema/#comments</comments>
		<pubDate>Wed, 20 Aug 2008 01:51:56 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Dev Hours]]></category>
		<category><![CDATA[One Post Per Day]]></category>
		<category><![CDATA[schema]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[tips]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=815</guid>
		<description><![CDATA[However similar SOLR to a database, designing schema for each of them has a distinct difference. SOLR is optimized to search purpose, on the other hand, database was commonly design to store (related) data SOLR has a custom behaviour when &#8230; <a href="http://blog.neofreko.com/index.php/2008/08/20/tips-on-designing-solr-schema/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a title="tips" href="http://flickr.com/photos/78364563@N00/13553883"><img class="alignright" style="margin: 4px; float: right;" src="http://farm1.static.flickr.com/9/13553883_1f97989a2d_m.jpg" alt="" /></a></p>
<p>However similar SOLR to a database, designing schema for each of them has a distinct difference.</p>
<ul>
<li>SOLR is optimized to search purpose, on the other hand, database was commonly design to store (related) data</li>
<li>SOLR has a custom behaviour when storing and querying data, ie: indexing behaviour and query behaviour.</li>
</ul>
<p>So how is it to design schema in SOLR?</p>
<ol>
<li><strong>Cover all basic data.</strong> Make sure to index everything you need to search onto. Indexing more data won’t hurt, storage is cheap.</li>
<li><strong>Cover common search behaviour.</strong> Do you search over several fields? Dismax query type sometime does fit your need as it is searching word by word. Store in one field or multifield? Or both? SOLR has copyField feature.You can use it for store concatenated values.</li>
<li>Work on the relevancy and scoring. Setup proper score boosting to your search query. You may found a necessity to ignore score or to use FunctionQuery to tweak scoring or filling a “formula” field, ie: linear, product, sum, etc.</li>
<li><strong>Practise make perfect.</strong> Gather feedback from your users, check what search is work and what is not.</li>
</ol>
<p>FAQ (to be answered soon):</p>
<ul>
<li>Why SOLR, is database not enough?</li>
<li>Would it be fair to compare database to Lucene?</li>
</ul>
<p><em>Photo by <a title="Link to estherase's photostream" href="http://flickr.com/photos/estherase/"><strong>estherase</strong></a></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2008/08/20/tips-on-designing-solr-schema/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lebih jauh dengan (bisnis dan)  SOLR</title>
		<link>http://blog.neofreko.com/index.php/2008/06/02/lebih-jauh-dengan-bisnis-dan-solr/</link>
		<comments>http://blog.neofreko.com/index.php/2008/06/02/lebih-jauh-dengan-bisnis-dan-solr/#comments</comments>
		<pubDate>Mon, 02 Jun 2008 01:54:49 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Ideas]]></category>
		<category><![CDATA[idea]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=786</guid>
		<description><![CDATA[Kenapa harus SOLR, tidak cukupkah google site search/google coop? Google memang mengindeks dengan handal,akan tetapi Google sepertinya tak akan bisa memberikan banyak interpretasi semantic. Di sinilah implementasi SOLR akan banyak membantu kita. Model searching yang dicontohkan Google adalah evolusi dari &#8230; <a href="http://blog.neofreko.com/index.php/2008/06/02/lebih-jauh-dengan-bisnis-dan-solr/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<pre><a title="Just Full Of Ideas" href="http://flickr.com/photos/17731548@N00/981372736"><img class="alignright" style="float: right;" src="http://farm2.static.flickr.com/1288/981372736_74e2d99d8f_m.jpg" alt="" /></a></pre>
<p><strong>Kenapa harus SOLR, tidak cukupkah google site search/google coop?</strong></p>
<p>Google memang mengindeks dengan handal,akan tetapi Google sepertinya tak akan bisa memberikan banyak interpretasi semantic. Di sinilah implementasi SOLR akan banyak membantu kita. Model searching yang dicontohkan Google adalah evolusi dari search jaman dulu. Walau model ini bekerja dengan baik, kebutuhan search terus berkembang, pada akhirnya pengalaman yang didapat dari aktivitas pencarian lewat Google tidak akan mampu mengakomodasi seluruh permintaan konsumen. Konsumen membutuhkan pengalaman pencarian yang lebih berkesan. Vertical search adalah salah satu jawaban dan harapan. Vertical search hanya bisa dibangun jika semua data dan relasi antar data tersebut tersedia. Dan tentu saja hanya yang pemilik data itu sendiri yang tahu dan mempunyai informasi yang telah disebutkan sebelumnya. Ya, berarti semua orang terkualifikasi untuk mendayagunakan SOLR. Dan sang pemilik data itu sendirilah memang pihak yang paling tepat untuk mengendalikan pengalaman aktivitas pencarian.</p>
<p>Contoh kasus. Mislakan berikut ini adalah data yang kita punyai:</p>
<pre>SOLR for Dummies
Pennington, Havoc
7888XXXXX
USD $27</pre>
<p>Bisakah google menjawab pertanyaan <em>“Buku apa yang ditulis oleh Havoc Pennington?”</em>, <em>“Buku-buku apa saja yang harganya di bawah $30?”</em>. Google tak akan mampu menjawab satu pun dari pertanyaan tersebut kecuali Google tahu semantik/makna data dalam teks tersebut. Sebaliknya, dengan SOLR kita bisa menjawab pertanyaan di atas. Karena kita tahu bahwa “SOLR for Dummies” adalah judul buku, maka kita bisa mengindeks data ini di bawah field book_title dalam SOLR. Kemudian “Pennington, Havoc” bisa diindeks di bawah “author” dan seterusnya. Maka kita pun bisa mencari data yang kita inginkan dengan lebih akurat.</p>
<p><strong>Siapakah klien potensial kita?</strong></p>
<p>Mengulang kembali apa yang sempat kita singgung sebelumnya. Berikut ini adalah beberapa jenis klien potensial.</p>
<ol>
<li> <em>Blogs.</em> Data dalam blog bisa dipastikan jauh dari terstruktur. Ini adalah klien potensial yang tersulit. Potensial karena tumpukan datanya sangat banyak dan sepertinya belum ada yang sanggup mengolahnya menjadi data berharga. Hmmm, sebentar, mari kita berimajinasi. Ada tidak kemungkinan: ”cari ulasan tentang SOLR yang ditulis oleh Akhmad Fathonih”. Dengan struktur umum kontent blog berupa title, excerpt, permalink, full-content, category, dan tags ternyata kita sudah bsia memberikan value lebih. Dengan semakin meningkatnya permintaan dan kepercayaan pengguna internet akan peer review dan citizen jurnalism, query yang baru saja saya sebut pasti akan muncul.</li>
<li><em>e-commerce sites.</em> Segala situs yang bertema amazon, e-bay atau craiglist akan lebih mudah diindeks karena data telah distrukturkan dan mempunyai relasi antar data yang sudah jelas.</li>
<li><em>Semua pemilik data</em> yang menginginkan datanya lebih discoverable bagi penggunanya</li>
</ol>
<p><strong>Apa yang harus kita bootstrap lebih dulu?</strong></p>
<p>Selain infrastruktur (cores, storage, bandwidth, etc) di mana kita akan mendeploy SOLR, plugins (untuk CMS dan other data management system) adalah salah satu hal kritikal. Faktor ini akan turut menentukan rendahnya barrier of entry pada konsumen. Semakin mudah konsumer bisa memanfaatkan layanan kita maka besar peluang kita untuk mendapatkan konsumen, data (untuk di-mining is possible) dan peluang-peluang lain.</p>
<p><strong>What do you say? </strong></p>
<p>While I will think and write more on this subject, I&#8217;m all free for any discussion; whether you are rushing to execute this plan before anybody else, or you want a simply routine-breaking chit-chat.</p>
<dl>
<dt>Photo:
</dt>
<dt>Source</dt>
<dd><a href="http://flickr.com/">Flickr</a></dd>
<dt>Author</dt>
<dd><a href="http://flickr.com/photos/17731548@N00">Cayusa</a></dd>
<dt>License</dt>
<dd><a href="http://creativecommons.org/licenses/by-nc/2.0/"><img src="http://i.creativecommons.org/l/by-nc/3.0/80x15.png" alt="" /></a></dd>
</dl>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2008/06/02/lebih-jauh-dengan-bisnis-dan-solr/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Going Vertical with SOLR: Apa sih SOLR itu?</title>
		<link>http://blog.neofreko.com/index.php/2008/05/30/going-vertical-with-solr-apa-sih-solr/</link>
		<comments>http://blog.neofreko.com/index.php/2008/05/30/going-vertical-with-solr-apa-sih-solr/#comments</comments>
		<pubDate>Fri, 30 May 2008 15:14:32 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[vertical search]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=785</guid>
		<description><![CDATA[Hehehe, harus saya akui bahwa saya melewatkan hal penting dalam tulisan saya sebelumnya. Mungkin hal tersebut yang menjerumuskan tulisan saya ke jurang kenistaan tanpa komentar. Nyahahahahah. To tell you the truth, SOLR is great. SOLR sebenarnya mirip dengan flat database &#8230; <a href="http://blog.neofreko.com/index.php/2008/05/30/going-vertical-with-solr-apa-sih-solr/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a title="Wireless Video Rocket" href="http://flickr.com/photos/44124348109@N01/222610933"><img class="alignleft" style="float: left; margin-left: 2px; margin-right: 2px;" src="http://farm1.static.flickr.com/95/222610933_b478a4b972_m.jpg" alt="rocket" /></a>Hehehe, harus saya akui bahwa saya melewatkan hal penting dalam tulisan saya <a title="Selling custom search using SOLR" href="http://blog.neofreko.com/index.php/2008/05/28/selling-custom-search-using-solr/">sebelumnya</a>. Mungkin hal tersebut yang menjerumuskan tulisan saya ke jurang kenistaan tanpa komentar. Nyahahahahah.</p>
<p>To tell you the truth, SOLR is great. SOLR sebenarnya mirip dengan flat database yang teroptimasi untuk keperluan searching. Sama seperti halnya database, dalam SOLR juga dikenal apa yang disebut field. Jika dalam common DBMS bisa terdapat banyak tabel, dalam SOLR hanya bisa dibuat satu &#8220;tabel&#8221;. Lalu apa bedanya dengan database pada umumnya?</p>
<p>Seperti pada database pada umumnya, field dalam SOLR juga bisa diindex. yang membedakan SOLR dengan ordinary database adalah bahwa cara mengindex dengan algoritma yang kita definisikan sendiri. Misal, kita bisa mengindex dengna menghilangkan whitespace sehingga suatu record bisa dimatchkan dengan keyword: &#8220;PowerShot&#8221;, &#8220;Power-shot&#8221;, ataupun &#8220;power shot&#8221;, atau &#8220;power/shot&#8221;. Jika memakai database pada umumnya, anda memang bisa mensimulasikan hal yang sama. Akan tetapi anda pasti harus memproses keyword sebelum diforward ke database sebagai query. You won&#8217;t need such activity when dealing with SOLR. Dalam dunia SOLR, keyword akan dianalisa oleh SOLR sendiri. Bisa jadi prosesnya sama persis seperti saat hendak melakukan peng-indeks-an atau sama sekali berbeda. Kita bisa mendefinisikan tata caranya sesuai kebutuhan kita. Misalnya, kita ambil dari definisi yang ada di file skema SOLR:</p>
<blockquote><p>A text field that uses WordDelimiterFilter to enable splitting and matching of         words on case-change, alpha numeric boundaries, and non-alphanumeric chars, so that a query of &#8220;wifi&#8221; or &#8220;wi fi&#8221; could match a document containing &#8220;Wi-Fi&#8221;.</p>
<p>Synonyms and stopwords are customized by external files, and stemming is enabled.        Duplicate tokens at the same position (which may result from Stemmed Synonyms or        WordDelim parts) are removed.</p></blockquote>
<p>I guess above quote explains to you how interesting SOLR field is. Hehehe. Versi complete contoh schema bisa dilihat di <a title="schema.xml" href="http://svn.apache.org/viewvc/lucene/solr/trunk/example/solr/conf/schema.xml?view=markup">sini</a>. Jika dicuplik, terkait quote di atas, akan tampak seperti ini:</p>
<pre>
</pre>
<pre>
</pre>
<pre>&lt;!-- A text field that uses WordDelimiterFilter to enable splitting and matching of
        words on case-change, alpha numeric boundaries, and non-alphanumeric chars,
        so that a query of "wifi" or "wi fi" could match a document containing "Wi-Fi".
        Synonyms and stopwords are customized by external files, and stemming is enabled.
        Duplicate tokens at the same position (which may result from Stemmed Synonyms or
        WordDelim parts) are removed.
        --&gt;
    &lt;fieldType name="text" class="solr.TextField" positionIncrementGap="100"&gt;
      &lt;analyzer type="index"&gt;
        &lt;tokenizer class="solr.WhitespaceTokenizerFactory"/&gt;
        &lt;!-- in this example, we will only use synonyms at query time
        &lt;filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/&gt;
        --&gt;
        &lt;!-- Case insensitive stop word removal.
             enablePositionIncrements=true ensures that a 'gap' is left to
             allow for accurate phrase queries.
        --&gt;
        &lt;filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                /&gt;
        &lt;filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/&gt;
        &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
        &lt;filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/&gt;
        &lt;filter class="solr.RemoveDuplicatesTokenFilterFactory"/&gt;
      &lt;/analyzer&gt;
      &lt;analyzer type="query"&gt;
        &lt;tokenizer class="solr.WhitespaceTokenizerFactory"/&gt;
        &lt;filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/&gt;
        &lt;filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/&gt;
        &lt;filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/&gt;
        &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
        &lt;filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/&gt;
        &lt;filter class="solr.RemoveDuplicatesTokenFilterFactory"/&gt;
      &lt;/analyzer&gt;
    &lt;/fieldType&gt;</pre>
<p>See, ada bagian tersendiri untuk melakukan proses analisa dalam rangka pengindeksan dan ada definisi tata cara tersendiri untuk pemrosesan query yang dimasukkan oleh pengguna.</p>
<p>Lalu apa keunggulannya? Jelas unggul karena proses ini sudah &#8220;di-refactor&#8221;, tidak perlu lagi anda tangani sendiri jika anda memakai solusi database biasanya. Ini berarti anda bisa menyediakan pengalaman berbeda dan lebih unggul. tentun saja ini bisa berarti konten anda akan menjadi lebih discoverable dan lebih memiliki banyak value daripada sekedar teks biasa.</p>
<p><a title="Vertical Search di Wikipedia" href="http://en.wikipedia.org/wiki/Vertical_search">Vertical search</a>, itulah yang terdekat bisa kita pikir. Tidak lagi seperti Google yang saat ini (ya, Google mungkin juga punya data untuk melakukan vertical search), akan tetapi mungkin bisa jadi seperti AOL yang saya contohkan kemarin.</p>
<p>Sumber foto:</p>
<dl>
<dt>Source</dt>
<dd><a href="http://flickr.com/">Flickr</a></dd>
<dt>Author</dt>
<dd><a href="http://flickr.com/photos/44124348109@N01">jurvetson</a></dd>
<dt>License</dt>
<dd><a href="http://creativecommons.org/licenses/by/2.0/"><img src="http://i.creativecommons.org/l/by/3.0/80x15.png" alt="" /></a></dd>
</dl>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2008/05/30/going-vertical-with-solr-apa-sih-solr/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Ideas: Selling custom search (using SOLR)</title>
		<link>http://blog.neofreko.com/index.php/2008/05/28/selling-custom-search-using-solr/</link>
		<comments>http://blog.neofreko.com/index.php/2008/05/28/selling-custom-search-using-solr/#comments</comments>
		<pubDate>Wed, 28 May 2008 10:23:50 +0000</pubDate>
		<dc:creator>Akhmad Fathonih</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Ideas]]></category>
		<category><![CDATA[Indonesia]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[web-services]]></category>

		<guid isPermaLink="false">http://blog.neofreko.com/?p=784</guid>
		<description><![CDATA[SOLR is a standalone enterprise search server with a web-services like API (per definition). Many reputable site has used SOLR as their search backend. See a sample on AOL or the complete list here. So, will it work? What do &#8230; <a href="http://blog.neofreko.com/index.php/2008/05/28/selling-custom-search-using-solr/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a title="Hong Kong International Finance Center" href="http://flickr.com/photos/7578081@N07/2293376525"><img class="alignright" style="float: right;" src="http://farm4.static.flickr.com/3172/2293376525_3f84f2350f_m.jpg" alt="" /></a>SOLR is a standalone enterprise search server with a web-services like API (<a href="http://lucene.apache.org/solr/features.html">per definition</a>). Many reputable site has used SOLR as their search backend. See a <a href="http://realestate.aol.com/listings-chicago-IL/sort-Price/order-1">sample on AOL</a> or the complete list <a href="http://wiki.apache.org/solr/PublicServers">here</a>.</p>
<p>So, will it work? What do we sell?</p>
<ul>
<li><strong>Sell gr</strong><strong>id power.</strong> This is no-brainer. But shouldn;t be the first in the priority list. For most site, search is a non-dominant action. So, they may already have the computing power needed to run their site (includes search activity)</li>
<li><strong>Tuneable search experience.</strong> User will no longer depending on what their CMS provide, instead they can create their own search experience using their defined field and search weigthing.</li>
<li><strong>Leverage content discoverability. </strong>Some site may have a big bulk of data inside their server. Indexing this data (on SOLR) may give them added value. Eg: faceted data can be displayed to attract visitors. It&#8217;s like &#8220;simple&#8221; data mining.</li>
<li><strong>Targetted ad <img src='http://blog.neofreko.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> .</strong> We can push ad onto the search page</li>
</ul>
<p>Then, who are our potential customers?</p>
<ul>
<li><strong>e-commerce site.</strong> There are many e-commerce site in Indonesia. many of then is simply built from scratch of using off-the-shelf FOSS CMS such as osCommerce which is <a href="http://forums.oscommerce.com/index.php?s=9320f2c2dd82b38be541d55b0ff0b6cb&amp;showtopic=303261&amp;pid=1252885&amp;st=0&amp;#entry1252885">lacking</a> good search feature. yes, MySQL Fultext search just won&#8217;t cut it.</li>
<li><strong>document intensive site.</strong> Eg: library (OMG, there are lots of library out there), .go.id site</li>
<li><strong>any rich content site</strong></li>
</ul>
<p>Ok, so how are we going to execute it, technically?</p>
<ul>
<li>provide REST API, or</li>
<li>simply expose the SOLR endpoint for integration and searching purposes</li>
</ul>
<p>Former option would enable us to push our ad. And the later seems fit premium service, where user can process raw SOLR data.</p>
<p>So, anyone interested? Or desperately wanting one for your own site? Or, you suddenly want to <em>start your own web crawler</em>? That&#8217;s an interesting idea &#8230; Going vertical, anyone?</p>
<p>Photo source: <a title="Link to swisscan's photostream" href="http://flickr.com/photos/swisscan/"><strong>swisscan</strong></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neofreko.com/index.php/2008/05/28/selling-custom-search-using-solr/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

