SPARQLZ: Autonomic Search

So sometime back in the 20th century, Tim Berners-Lee finally got the time from his boss to lay down the foundations for the world wide web. Incidentally, around the same time, Vint Cerf and a few others (notably Bob Kahn), working for DARPA, laid down the foundations for the internet. Little did they know that soon thereafter, the combination of their inventions would lead to one of the biggest technological evolutions in the history of humanity. Today, both men still work in prestigious positions, both focused on the future of the web. Tim leads the charge in getting what he calls ‘Linked Data’ on the web, while Vint is the ‘Chief Internet Evangelist’ at Google. While both men remain focused on their original passions, they also tend to agree about the future of the web. What I’d like to discuss today, is a little bit about their collective vision, some of the questions that remain and an answer to those questions.

Tim is very excited about what he calls ‘Linked Data’. To get an idea about what ‘Linked Data’ is, it might be worth starting with something more familiar or ‘unlinked data’. As the web has grown, a lot of data has been thrown on it. Websites, web apps, web portals, essentially anything web related has some set of data associated with it. Think about your facebook profile, twitter stream, blog sites you visit — all of these things have data associated with them. The problem is that for a long time this data was ‘unstructured’ or ‘unlinked’. We couldn’t, for example, look at the data on the web as a whole and consider it ‘structured’ in some logical way. People put data on the web haphazardly, to fit their own ends, which would often leave data to make sense in the context of their site, but not in the context of the web as a whole. Data was ‘silo’d’ in the sense that the data that lived in your facebook profile had no connection to the data that lives on the blog sites you visit. This has been fine for the web thus far; thankfully Google recognized that since all this data is unstructured, indexing websites and letting people run keyword searches over that index would allow people to query ‘the web as a whole’ in some sense. That has worked well for quite some time, however, the nature of the web is changing and the linked data movement is definitely the most important change.

The linked data movement is an attempt to give data on the web a structure. To make data the first class citizen on the web and make sure that all the links or relationships data have to one another are recognizable. The linked data movement is an attempt to expose common properties between data via the relations they bear to one another. Why is this important? Why should you care?

Linked data is the main reason Oracle & IBM are multi-billion dollar companies (ok, moreso Oracle). Both companies pioneered the idea of a ‘relational database’, essentially a tool that would give a set of data ‘structure’, (here I am using structured data and linked data interchangeably). Companies want their data to have ‘structure’ because it allows them to query their data on a much deeper level than the keyword query you’re accustomed to. In essence, they can get their computers to do a lot more work for them than before. Structured data allows them to quickly outline exactly what they want, then because the computer can understand that outline and the data’s structure, it can go get what is needed and bring it back quickly. In techie jargon, this is equivalent to running a ‘SQL query’ on a ‘relational database’. If companies were restricted to only running a keyword query on their datasets, a person would have to sit there and comb through the results to find what was ultimately needed. A keyword will bring back lists upon lists of results, while a ‘SQL query’ will let you ask the computer something much more complex and bring back fine grained results. What is the overall benefit of this? You get the computer to do a lot more work for you in asking it a complex question as opposed to you doing a lot more work when you simply give it a keyword. On a conceptual level, data has much more semantic meaning for a machine when it is structured (linked) than when it is unstructured. As such, a computer can be asked more difficult questions when it understands more about the data.

This segues nicely into another nascent meme on the web called the ‘Semantic Web’. The Semantic Web is what linked data enables. The Semantic Web is a web in which machines can understand much more of what is going on behind the scenes and communicate more meaningfully with each other, thus alleviating workloads from human beings. The semantic web allows you to ask it more difficult questions like ‘tell me when a 3 bedroom 2 bathroom house is available in Portland Oregon for less than $200,000’ or ‘tell me when a Java development job is available in Portland Oregon with a salary above $65,000’. Answering that question on the current web means checking and rechecking real-estate or job sites for hours, days and weeks until your conditions are met. On the Semantic Web, it means asking it once and having the web find out for you. Again, this is only possible because there is linked or structured data behind the scenes that machines can navigate with your query in mind. The overall benefit to you is much less time wasted combing through search results over and over to try and find something that matches what you want. One metaphor we often use to describe this is: while keyword search allows you to traverse the ‘surface’ of the web, the Semantic Web allows you to traverse the ‘volume’ or ‘depth’ of the web.

So can you see why Tim & Vint are so excited about Linked Data/Semantic Web? I’ll let them speak for themselves below:

This is the abridged version (Vint’s video is slightly ahead of the audio) ~3 minutes:

Here are the originals

http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html ~16 minutes
http://www.tubechop.com/watch/78187 ~1 minute

Great. So why aren’t we using Linked Data?

So if all this is true, why aren’t we already using Linked/Structured Data regularly and realizing the semantic web? Why hasn’t search changed to fit this new, promising model? Why do we still use keywords to find stuff on the internet? Why hasn’t the ‘dark matter’ been illuminated? The number one reason is this: you have to be a programmer to write a query over linked data! The average non-programmer cannot write a linked data query, or even easily reuse one a programmer has written. To tap all this potential value, the web is in desperate need of a slick, easy-to-use graphical user interface that allows even an average person to build these queries intuitively and then share them amongst her peers.

Introducing Real-Time Faceted Search

Real-Time Faceted Search is a journey; a journey that enables you to easily tell the internet what you’re looking for and let the internet find it for you. It starts with the same key word everyone is familiar with, but then guides the user through a series of contextual prompts, or ‘facets’, that eventually build out a more ‘complex query’ more powerful than that of typical keyword search. Through the use of a number of GUI techniques, the user is able to either select or fill in elements of this query in intuitive ways with all the code crunching being hidden in the background. The query can then be run over mashed up linked data sources. This enables the everyday user to build out these deep queries over linked data and receive much more relevant results. By itself, real-time faceted search is certainly interesting, but there are some addendums to it that make it much more compelling.

Continuous, push based queries

Every time you complete a query via this real-time faceted search UI, that query runs continuously, in the background, independently of you. It continues to mine the present flow of information for the conditions you have concocted. In a way, it can be thought of as ‘searching the future’. When you finish your query, its conditions may not have been met, but there may be a point in the future in which those conditions are met. Via the mechanism described, your query will find those conditions when they are fulfilled sometime in the future and then push the result to you. The idea of push is the same idea that underlies the ‘real-time web’. Here events are pushed to the user instead of the user needing to ‘poll’ the web for events. Say you register the query example I developed above: ‘tell me when a 3 bedroom 2 bathroom house is available in Portland Oregon for less than $200,000’, but nothing presently matches that query. That query will run continuously for you and push results that match it to you through any mechanism of your choosing: text, e-mail, IM, iPhone/Android app, etc., thus alleviating the need for you to come and check if your query conditions have been met and letting you know ‘in real-time’.

Shareability

Arguably one of the most important addendums to make real-time faceted search more compelling is the ability to share these queries amongst each other. Imagine that as you’re using the real-time faceted search UI and building your deep query, a query sentence is slowly growing along with your query at the bottom of your screen. Imagine that sentence matches my example: ‘tell me when a 3 bedroom 2 bathroom house is available in Portland Oregon for less than $200,000’. Now imagine a new user coming on with the intention of doing some real-estate research. He does a quick keyword search for ‘real-estate’, sees that you’ve already built a query for real-estate research and simply reuses what you’ve done. Say he rewrites ‘3 bedroom 2 bathroom’ as ‘4 bedroom 3 bathroom’, ‘Portland Oregon’ as ‘Seattle Washington’ and ‘less than $200,000’ as ‘less than $500,000’. He now can register that query without ever needing to go through the UI and build his own. Via this sharing, these queries can be turned into a social network of immense value.

Extensibility

Extensibility is simple: to make a service extensible, one must build tools into it that empower 3rd party users to extend it, use it, make it better, etc. There is a wealth of really bright minds on the Internet that enjoy extending web services they like either for fun or profit. Twitter & Facebook, for example, have an ecosystem of 3rd party applications around their service that have extended their service and made it more popular than ever. For a service that enables anyone to create deep, continuous queries, it is easy to see how this service could benefit from 3rd party development. For example, developers could further refine the filtering on their queries by executing code on the query, they could build applications that utilize these continuous deep queries as a data platform, or they could use ‘webhooks’ to call a another web-service when an update occurs. While this can be a bit confusing, suffice it to say that the possibilities are endless for what a 3rd party developer could do with a platform for these deep queries.

Rateability

The final social element is the ability to rate updates that come through these real-time faceted search queries. If you’ve been imagining what I’ve been describing correctly, then you’ve been picturing a dashboard where you see a stream of the latest updates from your queries that are relevant and useful to you. Now imagine a tabbing ability between the ‘latest updates’ and the ‘highest rated’ updates. Highest rated updates will be those updates that others have seen and rated favorably. The highest rated updates will percolate to the top of your highest rated list. This introduces another level of filtering outside of what has been described: social filtering. So now you can see the latest stuff to come from your queries as well as what your peers have rated highly. The chance to be distracted by useless or boring information is minimized as well as the effort required to create that harmony. Moreover, this service will allow users to rate the queries themselves so when you go to reuse a query, you know you’re choosing the best query to reuse.

Conclusion

A continuous, real-time, faceted search that is extensible, shareable and rateable. Whew, that’s a mouthful. Luckily, we’ve put this description entirely under one name: SPARQLZ. SPARQLZ is the future of information discovery. It is a general discovery technology that applies from the most basic consumer use cases to the most advanced enterprise use cases. The simple ability to mashup multiple sources of information and run deep queries—queries that do not require a programmer—over those sources is valuable to anyone who needs information in a timely manner. The nascent ‘Internet of Things’ movement will indeed be in need of such powerful-yet-simple querying technology as well. In closing, I’d like to quote some profound words from a profound man, on his vision for SPARQLZ:

“I know it was my idea to consider the notion of an ENGINE as in “discovery engine”. But the more I think about it, this conjures up the (undesirable) notion of a mechanical motor that only drives searching faster and further by just feeding it more gas…like a muscle car or hemi for search. Rather than mechanical and carbon-offensive, I’d like to think of SPARQLZ as a PROCESS, an organic SYSTEM–sentient, alive, evolving…much more like a Web of neurons connected via synapses, firing spontaneously, continuously alerting the user whenever there are relevant events…in other words, SPARQLZ is AUTONOMIC SEARCH.”

The Future is Bright.

This entry was posted in SPARQLZ News and tagged , , , , , , , , . Bookmark the permalink.

One Response to SPARQLZ: Autonomic Search

  1. Pingback: Instant Search llega a la empresa

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s