on Cloning Siri, understanding the query

Well, IUMA is interesting. I haven’t able to make the Feature extraction work. However I got the gist that it’s working similar to NLTK with additional benefit: we can construct/pipeline several analysis by configuring an XML. This is almost as sweet as SOLR config.

Today, I’ve just found another approach on understanding user query. I thought it will help alot if we can determine the Subject, Predicate and Object of a query. We do\n’t need to understand the whole sentence but we do need to extract the essence of the query. What should our clone do if user says: how is the weather? where is bandung? do I have any meeting today?

Fortunately there are free implementation of typed dependency. If you want to know more about typed dependency, just google it. I will only give you an example of it. Given the query “how is the weather in jakarta”, typed dependency analysis will give us:

advmod(is-2, how-1)
det(weather-4, the-3)
nsubj(is-2, weather-4)

From this output, we can use the availability of subject or object to determine the essence of a query. Example above show us, it probably, weather is the essence of the query. You can test more typed dependency here. Below are some more examples:

Do I have meeting today
aux(have-3, do-1)
nsubj(have-3, I-2)
dobj(have-3, meeting-4)
tmod(have-3, today-5)
call John
amod(John-2, call-1)
make appointment with John on 3
dobj(make-1, appointment-2)
prep_with(make-1, John-4)
prep_on(John-4, 3-6)
texts John, send me detail
prep_text(send-4, John-2)
nsubj(detail-6, me-5)
ccomp(send-4, detail-6)

From above example, it is possible for us to choose a pattern as a trigger for a datasource query. However, it will not always adequate. Some question may be hard to understand, still. As is it still too vague, such as: how do I get home. To understand this, we need to be aware that “home” is a destination/location. This should trigger some sort of map datasource.

I have been imagining the clone as a pluggable framework. The main function of the host program is to provide as many analysis as it can, via plugins. And then decide which datasource plugin to trigger. Typed dependency should be one plugin, feature extraction should be another plugin.

Hmm, interesting.

PS:

There are more dependency parsers I still need to check.