Homepage
Infolust launched

Thanks to some recent extra hours by our team we're ready to release our latest labs prototype to the public in a hipness compliant alpha-state:

Infolust is a new context search engine. It's for all those moments when you sit in front of a website and think "Now that's interesting" and would like to know more. Now this "More" is just a single click away. It compares existing web-pages with all Wikipedia pages and fetches up to 10 related ones. It's a first step of bringing our Similarity Engine to consumers for free.

While we're still working hard on improving things like the IE6 bookmarklet capabilities, especially for pages that contain frames, it's necessary to gain as much statistics (of course no personal or address related whatsoever) as possible to optimize it for additional content sources than Wikipedia and improve the backend efficiency.

So far the results are surprisingly good for a third of the queries, quite OK for another third and not so good for the last third. The main issue isn't so much to determine the similarity itself but distinguish between the actual content on a query page and things like navigation, footers etc. especially because we don't do any "lookup wikipedia page titles in text" tricks but rather rely on a full semantic analysis and comparison of pages.

That said we'd be more than happy to hear your suggestions /sightings / comments and hope you enjoy what Infolust is doing so far!
16 comments on this yet, add yours.
Found a funny bug: http://www.infolust.com/results?url=http%3A%2F%2Fwww.infolust.com -> "Hungarian: We're sorry, but currently Infolust only supports English and German texts." ;-)

I got pretty good results for http://www.burkina.at, the system understood the site's theme "Burkina Faso", however failed to serve back the "important" articles on the country (such as the one about itself, or the one about the capital).
There is definitely room for improvement on this, thanks for the feedback. How do you like the general idea?
I tried

Asimovs Foundation Triologie at amazon.de
http://www.infolust.com/results?url=http%3A%2F%2Fwww.amazon.de%2FFoundation-Isaac-Asimov%2Fdp%2F3453164172%2F">Isaac

and the results were great: The publisher, the author, other books from the author, other authors from the same genre etc.

My blog
http://infolost.com/bookmarklet?url=http%3A//blog.bookworm.at/ and and the
System One Webpage gave pretty good results too.






had also some hungarian bug and no results for most of the sites i searched - but anyway the idea is nice - maybe not alpha mode - keep it unreal
@Michael: While I don't see much direct use of the tool it's an impressing showcase, and can probably become useful in some tools, e.g. as a firefox extension, or as a "context" microformat. I imagine a technorati-like service aggregating texts across the web that share the same topic (without anyone having to choose tags).

It would be interesting to know if the tool can also "score" a text, e.g. if there's a lot of context or not. That way one could distinguish chatter from relevance. Don't know if that is useful somewhere.

Just some quick thoughts.
Detecting meta-refresh would be a highly appreciated feature - for example: http://www.gamper.com
When displaying the first lines of a Wikipedia article it'll be great to run a search & replace in advance. Thus avoiding some strange characters :-)

Apart from related Wikipedia articles it would also be highly interesting for people to get related websites (there are various approaches possible to get that too) as well. Is this something you are currently working on?
When tying http://popurls.com/ I had expected more - any idea why it's not working that effective in this case? Forgot to mention that it's a really nice tool. I'm sure many will enjoy!
ad gamper.com - no we don't support <meta> refresh currently - only "real" http redirects, but even if we did, there's not much text on gamper.com which could be used for similarity
ad popurls.com - the problem is that the content consists of 50 headlines from completely different stories - but out similarity tries to find an "overall/common idea" of the text
Thank you for the feedback.
As meta-refresh is widely used out there I'd truly recommend this to be implemented.
What about integrating also related websites to the currently available Wikipedia articles?
Does the parsing also get "Markenf&uuml;hrung" correctly? When trying http://www.brandflow.at the term "Markenführung" is found in the title,meta-information as well as the text. Infolust focuses on "Wachstum" which is somewhat strange.
There seems to be a problem with www.hypotirol.com, the server delivers a 302 to www.hypotirol.com/index.shtml. Infolust obviously doesn't get this correctly (I get "Unknown: We're sorry, but currently Infolust only supports English and German texts.").
Hypotirol does two redirects and both seem JavaScript based - but infolusts results on the "real" is fine: http://www.infolust.com/results?url=http%3A%2F%2Fwww.hypotirol.com%2Fde%2Findex.shtml
What about &uuml; or %ouml; and the Brandflow website?
yes we are converting all Html entities to their Unicode equivalents and when I look at the debug output I see that the content of brandflow gets parsed perfectly
Good to hear - any idea why "Markenführung" isn't named in the results list then?
Bei http://www.xyzmo.com gibt es noch Probleme.
Please log in or register to comment.