Updates from November, 2011 Toggle Comment Threads | Keyboard Shortcuts

  • Akhmad Fathonih 10:09 pm on 11/1/2011 Permalink | Reply
    Tags: dependency parser, , typed dependency   

    on Cloning Siri, understanding the query 

    Well, IUMA is interesting. I haven’t able to make the Feature extraction work. However I got the gist that it’s working similar to NLTK with additional benefit: we can construct/pipeline several analysis by configuring an XML. This is almost as sweet as SOLR config.

    Today, I’ve just found another approach on understanding user query. I thought it will help alot if we can determine the Subject, Predicate and Object of a query. We do\n’t need to understand the whole sentence but we do need to extract the essence of the query. What should our clone do if user says: how is the weather? where is bandung? do I have any meeting today?

    Fortunately there are free implementation of typed dependency. If you want to know more about typed dependency, just google it. I will only give you an example of it. Given the query “how is the weather in jakarta”, typed dependency analysis will give us:

    advmod(is-2, how-1)
    det(weather-4, the-3)
    nsubj(is-2, weather-4)

    From this output, we can use the availability of subject or object to determine the essence of a query. Example above show us, it probably, weather is the essence of the query. You can test more typed dependency here. Below are some more examples:

    Do I have meeting today
    aux(have-3, do-1)
    nsubj(have-3, I-2)
    dobj(have-3, meeting-4)
    tmod(have-3, today-5)
    call John
    amod(John-2, call-1)
    make appointment with John on 3
    dobj(make-1, appointment-2)
    prep_with(make-1, John-4)
    prep_on(John-4, 3-6)
    texts John, send me detail
    prep_text(send-4, John-2)
    nsubj(detail-6, me-5)
    ccomp(send-4, detail-6)

    From above example, it is possible for us to choose a pattern as a trigger for a datasource query. However, it will not always adequate. Some question may be hard to understand, still. As is it still too vague, such as: how do I get home. To understand this, we need to be aware that “home” is a destination/location. This should trigger some sort of map datasource.

    I have been imagining the clone as a pluggable framework. The main function of the host program is to provide as many analysis as it can, via plugins. And then decide which datasource plugin to trigger. Typed dependency should be one plugin, feature extraction should be another plugin.

    Hmm, interesting.

    PS:

    There are more dependency parsers I still need to check.

     
  • Akhmad Fathonih 10:24 pm on 10/31/2011 Permalink | Reply
    Tags: NLP, NLTK, , , UIMA, Wolfram Alpha   

    on Cloning Siri 

    Ya ya ya, it’s a novel goal. Nevertheless, it’s an interesting journey to take.

    To make our clone clever, it must be smart enough to understand any general query. “Who is Obama?”. “Where is Taj Mahal?”. To answer this, we can simply forward the query to Wolfram Alpha. With a simple trick, we can also answer a floating question such as: “How’s the weather tomorrow?”. How? Simply add current geolocation to the quesry then pass along to Wolfram Alpha, eg:”How’s the weather tomorrow jakarta”. Don’t worry, Wolfram Alpha will understand what you mean.

    Now, the hard part. We need to teach our Siri clone about ourself. I wish Wolfram Alpha is open enough that we can add new information into its database. Unfortunately it’s not open enough. Unless for enterprise user. Now, a solution for an information/data mining is inevitable.

    NLTK on Python is a good candidate to solve the problem. I am depicting a sentence got tagged. We, then, extract the Subject, Verb and Object and pass it along to appropriate data source provider. A question such as “do I have a meeting tomorrow?” should should trigger Calendar datasource. A datasource will be an addon which register its trigger in Verb,and check other Tag type availability within a sentence.

    Another solution may come from Apache UIMA project. I am looking at its Configurable Feature Extractor addon. It is capable of tagging and identifying entities. Compared to simple pattern matching in our first solution, this second alternative has more metadata to match against. Further, we can combine it with SOLR to harness its search engine power. Boosting, synonim, stopword and what not.

    Do you have something else in mind? I am being a bit practical here because I can’t comprehend much math :(

     
  • Akhmad Fathonih 2:50 pm on 10/11/2011 Permalink | Reply
    Tags: mysql, replication, tungsten   

    Tungsten Replicator 

    Been looking for a hassle-free replication service. Well, so far I have only tried traditional mysql master slave. The problem was provisioning (oh boy, I use this word finally) the slave. Usually, I mysqldump-ed the master then mysqlimport them on the slave. Then I need to set log master and log pos etc. Mysqldump can take some hours and there’s a possibilty of performance penalty. Luckily you can use innobackupex from Percona. Performance penalty can be avoided (not really sure completely) and log pos is automatically recorded in dump file. No more manually taking note of master pos on master db. But still, you need to manually set log pos and log master on slave db.

    Welcome Tungsten Replicator. The basic principle are the same. We need to enable replication setting in mysql. Dump the master db to provision the slave node. However, we need not to set the log pos and log master as tungsten replicator will do this for as. Further, we can provision multiple slaves at once with this tool. Neat eh.

    I barely scratch the surface here with tungsten. Here are some command that my come handy:

    replicator start/stop #start/stop the replicator manager

    We can have many replication services (cluster). When you want to reset replication service (redo from beginning) make sure to to the replicator first. Then drop tungsten_<your service> database before innobackupex-ing master db to be used in slaves.

    trepctl -service <your service> status

    Useful to check replication status. There are probably slave problem, especially when you are bin-log-do on not all databases.

    I’ll write more on tungsten later. So far, it eases my pain.

    References:

    1. Getting started with Tungsten. Learn how to setup replication services. It’s a command line away. http://datacharmer.blogspot.com/2011/06/getting-started-with-tungsten.html
    2. The Cookbook. Various tricks you can do with Tungsten. I want to test direct replication to provision new slave in order to omit innobackupex activity. http://code.google.com/p/tungsten-replicator/wiki/TRCBasicInstallation
     
c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
l
go to login
h
show/hide help
shift + esc
cancel