on Cloning Siri

Ya ya ya, it’s a novel goal. Nevertheless, it’s an interesting journey to take.

To make our clone clever, it must be smart enough to understand any general query. “Who is Obama?”. “Where is Taj Mahal?”. To answer this, we can simply forward the query to Wolfram Alpha. With a simple trick, we can also answer a floating question such as: “How’s the weather tomorrow?”. How? Simply add current geolocation to the quesry then pass along to Wolfram Alpha, eg:”How’s the weather tomorrow jakarta”. Don’t worry, Wolfram Alpha will understand what you mean.

Now, the hard part. We need to teach our Siri clone about ourself. I wish Wolfram Alpha is open enough that we can add new information into its database. Unfortunately it’s not open enough. Unless for enterprise user. Now, a solution for an information/data mining is inevitable.

NLTK on Python is a good candidate to solve the problem. I am depicting a sentence got tagged. We, then, extract the Subject, Verb and Object and pass it along to appropriate data source provider. A question such as “do I have a meeting tomorrow?” should should trigger Calendar datasource. A datasource will be an addon which register its trigger in Verb,and check other Tag type availability within a sentence.

Another solution may come from Apache UIMA project. I am looking at its Configurable Feature Extractor addon. It is capable of tagging and identifying entities. Compared to simple pattern matching in our first solution, this second alternative has more metadata to match against. Further, we can combine it with SOLR to harness its search engine power. Boosting, synonim, stopword and what not.

Do you have something else in mind? I am being a bit practical here because I can’t comprehend much math :(