We need smarter “real-time” search engines
There was an interesting article in the Guardian recently highlighting a study that claims that users mostly ignore “real-time” search results in Google. If you’re not familiar with the real time results, it’s those search results from twitter, facebook, etc. right at the very top before the web results.
The article goes on to say that it might be due to the fact that not much information is contained in short 140 character tweets, thereby not being very useful for searchers. I tend to agree. However, this isn’t necessarily a bad thing.
Historically, we’ve been used to reading news articles – either published in old media (newspapers, magazines, etc.), or new media (online news sites, blogs, etc.) – that always present a holistic view about the topic being reported on. The information contained is logically formatted into an intro, body and a conclusion. This format largely makes sense to us, and is easy to read and follow.
However, the new age of information dissipation is much more fragmented and dare I say, iterative. News about a topic does not necessarily contain all the facts at once, but each small snippet of news (aka a tweet) may add iteratively and chronologically to the overall topic. To understand a breaking news story, we may have to scan multiple tweets to understand context, get relevant and pertinent facts and then read the commentary. This is why I believe most real-time search engines are doing it wrong currently. Most of them present commentary (tweets) about popular (or not so popular) topics – which is great, but only for the person who is already familiar with the subject being reported on.
Last year, if I searched for “hudson plane crash”, there wasn’t a single real-time search engine that consolidated all the facts and presented a timeline of events. Instead, users had to piece together (very carefully and often mistakenly) the chronological episode of events. Most users gave up after reading the useless commentary (“omg… a plane has crashed into the hudson!”) and instead waited until news outlets like Yahoo! News, AP, etc. consolidated and reported the entire story.
But it doesn’t have to be that way. Fortunately, all the important and relevant pieces of information are out there – contained in short tweets. Real-time search engines should be getting smarter about collecting tweets, analyzing them for relevant and trustworthy content, and combining them to be ’super tweets’. These super tweets could be structured (depending on the subject), or unstructured. Maybe even use Natural Language Processing (NLP) to stitch tweets together to form ad-hoc news stories on the fly. So that when the user searches for “hudson plane crash”, they wouldn’t just happen to see the last 10 tweets about the topic, but instead a timeline of events, most recent pictures, videos, commentary and rich media like the plane’s path, local emergency numbers, affected airport/airline delays, etc.
Each and every day, we’re producing an incredible amount of new information. The future won’t just be about effectively searching that huge corpus, but consuming information in a concise and logical format – hopefully one that is created automatically.
Luckily, I get to work on these types of problems at Yahoo! Research. If you’re interested in working on similar problems, take a look at our open positions.
PrasSarkar.com
3 Comments, Comment or Ping
Great post, Pras. Take a look at TipTop http://FeelTipTop.com to see a number of things you are dreaming already working pretty well.
Mar 11th, 2010
Thanks for the tip Shyam, though I don’t see TipTop doing more than other real-time search engines. Am I missing something here?
Mar 11th, 2010
Really? I think you need to spend a bit more time on TipTop. There are at least a dozen powerful features in TipTop that no other search engine – real time or not – supports. If you cannot find them even after another another look, please check out our FAQ, blog posts, articles on the web, etc. Thanks.
Mar 16th, 2010
Reply to “We need smarter “real-time” search engines”