Belajar Mahout

My brain exploded. That’s pretty much my limit.

So, yes, I’ve been interested in SOLR, Apache Tika, and of course Mahout. The promise of classifying and clustering data are enough to persuade me digging up examples about Mahout. So far, what really helps is Seinfeld demo example. It gives me a proper example to try. We can replace the data with our own to get the gist on how Mahout would work.

However, I haven’t get the gist yet. So far, I’ve tried to cluster 2 datasource. One of them is blog post from navinot.com. Here’s an excerpt from cluster-dump:

C-18 [Ponsel, Mobile, Internet, Mobile internet, Iphone]
- /6 Hal Tentang Mobile Internet.txt
- /Mobile Application_ Masa Depan Yang Ditunggu?.txt
- /Netbook_ Bakal Lenyap Seperti PDA?.txt
- /Premium Mobile Internet?.txt
- /The Gaps in Indonesian Internet.txt
- /iPhone & Telkomsel_ Deal or No Deal?.txt

I’m imagining Mahout with cluster it into similarity groups. My guess is, it was clustered by keyword. I was using kmeans.

Anyway, obviousy we need to filter out stopwords. Mahout can read directly from SOLR/Lucene index. But I didn’t have much luck on it. Something to do with empty terms or whatever. Probably, feed my raw data to SOLR and then query it out to get text files will make a decent workaround.

That’s a wrap for today. Time for Pocket Legend!

 

How to use Lucene 3.4 with Mahout 0.5

As you may have been frustrated by, Mahout 0.5 was build with Lucene 3.1 dependencies. How on earth can we use Lucene 3.4 then? My SOLR is 3.4, I want to use its index to play with Mahout.

Fear not. Just download mahout 0.5, both source and binaries. Extract them, it will reside on the same folder i.e: mahout-distribution-0.5. Now, open up that pom.xml. Find lucene and replace 3.1.0 with 3.4.0. I reckon there are only 4 of them. The do mvn install. You may want to skip tests with: mvn -DskipTests=true install.

Once done, do: export MAHOUT_CORE=1

Run mahout from mahout-distribution-0.5/bin folder.

I don’t get index incompatibility anymore. But, I keep getting not enough term vector on document. Even I’ve set the schema.xml dan reindex my docs.

Will write more once I pass it.

11 Things I want in Japan

When Hiro asked me what I would particularly see in Japan, I became spaced out. It turned out that I don’t really have the list. But when think about it all over again, I do have a short list[1].

  1. Hatsune Miku, either having her figure or ultimately watching her live concert.
  2. Dolfie, see one, touch it. Owning it can wait until I have moved into my own apartment (and permission from my wife)
  3. Akihabara. It’s a common thing in everyone’s list I believe. Probably visiting a maid cafe would be nice.
  4. Comiket. Seeing a field of figures would be awesome. The crowd looks scary tho.
  5. Tokyo Game Show. Same as above.
  6. My own figures collection. I believe I have a different taste with@dannychoo. Not on the dolfie part tho :D . I have been wanting a Macross figure and other figures I saw in some dannychoo’s pics.
  7. See cosplayer at Yoyogi Park.
  8. Tanabata festival. Firework show. I want to wear yukata someday.
  9. Taking photos of lots sailorfuku school girls.
  10. Talking of sailorfuku, I love Scandal girlband. I want to have their merchandise.
  11. Dir En Grey concert? I want to scream in Dozing Green song.
More will come.
Yeah, I am in Japan right now. It still feels unreal, even after almost a week. I keep saying I am living in a dream to my wife. Wife said it’s jetlag effect.
The tag of this blog finally came true.
Yeayyyyyyyyyyyyyyyyyyyyyyyyyyy! Aaaaaaaaaaaa! Saiko desuuuuuuuuuuuuuuuuuuuu! Yeaaaayyyyyyyyyyy! *rolling on the floor, bursting happiness tears*
Footnote:
[1] This is pretty common to me. My brain spins slowly. And it tends work best when writing when I can took more time thinking. This results in me being unspontaneous.