Thanks to some recent extra hours by our team we're ready to release our latest
labs prototype to the public in a hipness compliant alpha-state:
Infolust is a new context search engine. It's for all those moments when you sit in front of a website and think "Now that's interesting" and would like to know more. Now this "More" is just a single click away. It compares existing web-pages with all Wikipedia pages and fetches up to 10 related ones. It's a first step of bringing our Similarity Engine to consumers for free.While we're still working hard on improving things like the IE6 bookmarklet capabilities, especially for pages that contain frames, it's necessary to gain as much statistics (of course no personal or address related whatsoever) as possible to optimize it for additional content sources than Wikipedia and improve the backend efficiency.
So far the results are surprisingly good for a third of the queries, quite OK for another third and not so good for the last third. The main issue isn't so much to determine the similarity itself but distinguish between the actual content on a query page and things like navigation, footers etc. especially because we don't do any "lookup wikipedia page titles in text" tricks but rather rely on a full semantic analysis and comparison of pages.
That said we'd be more than happy to hear your suggestions /sightings / comments and hope you enjoy what Infolust is doing so far!
I got pretty good results for http://www.burkina.at, the system understood the site's theme "Burkina Faso", however failed to serve back the "important" articles on the country (such as the one about itself, or the one about the capital).
Asimovs Foundation Triologie at amazon.de
http://www.infolust.com/results?url=http%3A%2F%2Fwww.amazon.de%2FFoundation-Isaac-Asimov%2Fdp%2F3453164172%2F">Isaac
and the results were great: The publisher, the author, other books from the author, other authors from the same genre etc.
My blog
http://infolost.com/bookmarklet?url=http%3A//blog.bookworm.at/ and and the
System One Webpage gave pretty good results too.
It would be interesting to know if the tool can also "score" a text, e.g. if there's a lot of context or not. That way one could distinguish chatter from relevance. Don't know if that is useful somewhere.
Just some quick thoughts.
When displaying the first lines of a Wikipedia article it'll be great to run a search & replace in advance. Thus avoiding some strange characters :-)
Apart from related Wikipedia articles it would also be highly interesting for people to get related websites (there are various approaches possible to get that too) as well. Is this something you are currently working on?
ad popurls.com - the problem is that the content consists of 50 headlines from completely different stories - but out similarity tries to find an "overall/common idea" of the text
As meta-refresh is widely used out there I'd truly recommend this to be implemented.
What about integrating also related websites to the currently available Wikipedia articles?