The MetaSieve Blog

July 29, 2008

Cuil the Media Marvel

Filed under: Uncategorized — Tags: , , , — metasieve @ 5:19 pm

If you were following the news in the last days you could get the impression that the end of Google is imminent.
Cuil Exits Stealth Mode With A Massive Search Engine
Google Beats Cuil Hands Down In Size And Relevance, But That Isn’t The Whole Story
And….Cuil Goes Offline
Do Not Mistype Cuil

The traditional media already speculated about the death of Google:
The Google Killer is here..it’s gone..it’s back again
Will Cuil Kill Google?
Cuil Hopes to be a Google-Killer

This made me very curious, so I started reading a lot of articles about Cuil. The search engine business is not a world of miracles, the knowledge is just not spread very far, so if someone claims to do something completely new like “indexing at 10% cost of google”, it is something that calls for attention.

One interesting fact is, that it seems to be very difficult for most media organizations to get their facts straight. I read about as many explanations of the difference between Cuil and Google as I read reports. It seems most media outlets don’t know what the PageRank algorithm is, but don’t have a problem to explain it wrongly. ZDNet Germany writes for example: “The most important difference between Cuil and Google is the method with which the search engine evaluates the relevance of search results. While Google captures all clicks on a search result and calculates the relevance of a page in terms of the PageRank, the startup uses a contextual evaluation of the results.”

This is just plain wrong. PageRank has nothing to do with clicking on search results. PageRank is an algorithm that uses the context of links to evaluate the relevance of the page they link to. It is based on the concept of a random surfer who travels the web by randomly clicking on the links of a page. If the random surfer arrives on a page, the relevance of the page for the keywords in the context in the link is increased. Oh and by the way: The name doesn’t derive from “page” as in web page, but from “Page” as in Larry Page.

Golem boldly asserts the difference is that Cuil “has collected 121 Billion Webpages and sorts them not only by link analysis and traffic, but also captures the content and tries to put it in context”. The article contradicts itself, contradicts other media and contradicts reality.

First, the article gives the impression that this is a large number compared to Google’s index size. Google stopped telling people how large its index is some years ago, so nobody can check that claim directly. But if you check how many results you get for common searches like “dog” or “house”, Cuil returns only half as many results as Google does. So either they are not very effective in finding anything in their pile of web pages or 121 billion is not as impressive a number as it sounds.

Next, the article claims they use link analysis. That sounded strange to me as they claim to NOT use PageRank, the most common scoring algorithm based on link analysis. So what type of link analysis do they use?

I had to do more research. Luckily, I found a great article by one of the Cuil protagonists that reveals a lot about how these people think about search engine technology. They do not use link analysis or scoring because “Page rank is lengthy analysis of a global nature and will cause you to buy more machines and get bogged down on this one complicated step”.

Obviously they are not smart enough to do a simple PageRank implementation (every student in our university builds one as part of the reinforcement learning courses) and to just ignore the well-known problem of spam-dexing.

Obviously, when the technology that Cuil uses was state of the art, the Cuil people were not yet in the industry. Otherwise they would know, that a whole search engine generation died, because they did not use PageRank. WebCrawler went bust, Altavista buys their search results from Yahoo.

So, they don’t use link analysis. Do they use traffic analysis? Cuil claims they don’t record any personal data about their users, so obviously they don’t analyze their own traffic. And they don’t provide a service like analytics that would let them capture the traffic on other sites. So, what kind of traffic do they analyze. I conclude that Ggolem was just adding some random attributes in hopes nobody would know it is wrong.

The last claim is that they only analyze the content. That is something I can believe. It is consistent with the Patterson paper that dismisses almost every achievement of search engine tech of the last ten years as too complicated and essentially unnecessary. At no point does the paper address the real problem of people who put a lot of popular keywords on their page just to lure people to porn. It seems all thought about Cuil is based on the premise that every website will cooperate nicely. When you see how much porn you already get at Cuil with simple searches, this does not promise a bright future.

Techcrunch spotlights that Cuil can crawl 90% cheaper than Google. This may be true as Cuil has eliminated almost every complex procedure from the Crawl. Though, when you read the Patterson paper, you discover that they just moved those procedure to query time. So if you add up all the costs of the search engine you don’t get any cost advantages. I would bet you will see that it gets more expensive because there are by far fewer ways to optimize at query time.

So, in the end the only achievement Cuil can claim is to sell a product that has been discredited 10 years ago and make people believe that it is a revolution. It is a bit like people who sell unpasteurized milk while claiming that it is just too expensive to get rid of bacteria and that people want it cheap.

If you want to read a good overview over the subject after all this ranting you should have a look at the new marketing blog.

July 22, 2008

Press Release: Game search engine provides you with more time for gaming

Filed under: Uncategorized — Tags: , , , , , , , — Björn Wilmsmann @ 4:57 pm

Bochum, Germany. If you’re passionate about computer and video gaming you now can keep yourself informed for free at GameSear.ch, using modern semantic search technology. GameSear.ch is for everyone who is looking for information about his or her favourite game, as well as companies in the gaming market who want to stay on top of the most recent developments.

The Problem:

Information about games is distributed throughout the Web. There are hundreds of sites about gaming. Keeping yourself informed in this area in order not to miss anything becomes more and more difficult and means hard work.

Someone searching for information about the latest game frequently has to stop playing his or her favourite game for hours. Many gamers simply don’t like that.

Now gamers can now leave this painful work to GameSear.ch.

The Idea:

GameSear.ch is a product by MetaSieve, the German technology leader when it comes to search engines. If you like gaming you don’t want to spend hours searching just because you need a cheat code, want to know something about the latest game or simply want to find like-minded people. Gamers want to play not search. Says Florian Dömges, CEO at MetaSieve: “I wouldn’t want to stop playing Bioshock for 2 hours just for searching for information. However, I like to stay informed. When I’m through with a game I’m quickly in for something new. GameSear.ch helps me to stay informed with minimum effort.”

It couldn’t be easier:

The gamer enters a search term in the search box at GameSear.ch. GameSear.ch then sends dozens of queries to the most important information sources and processes the results.

The gamer sees everything that’s relevant to the search term at the first glimpse. The results include news, reviews, patches, forum entries and products.

The user can switch between these categories using tabs. If you frequently search for the same term you can register and collect a list of bookmarks for easy access to search results.

GameSear.ch currently is available in an English and a German language version. Depending on the language different information, custom-tailored for the respective market, will be displayed.

GameSear.ch is a product by MetaSieve.

July 21, 2008

Better news on GameSear.ch

Filed under: Uncategorized — Tags: , , , , — metasieve @ 6:37 pm

Today we implemented another new feature on GameSear.ch.

The news search tab used to deliver news which were always a bit outdated.

We streamlined the processing pipeline and added a new scoring algorithm for news and now they are current like they should be. We will add more news sources in the next days, so stay tuned.

July 18, 2008

Gamesear.ch : All game information at your finger tips

Filed under: Uncategorized — Tags: , , , , , , , , — Björn Wilmsmann @ 11:47 am

We did it!

We built the most amazing game search engine and released it to the web today.

Gamesear.ch aggregates Information Sources like nothing else on the web. It is more than a meta-search, it is a meta-meta-search. It is built on our proprietary ContentMat.ch technology which enables us to create an awesome user experience.

We integrate Feed Information, News, Cheats, Reviews, just everything that is great for Gamers.

Now that we have a basic version on the site we will unleash a thunderstorm of new features that we will present on this blog.

Google gets less innovative from day to day

Filed under: Uncategorized — Tags: , , — metasieve @ 11:42 am

It seems Google is really losing it. I always expected they would have a bad awakening, but the riders of the apocalypse seem to be on their way to Mountain View. I don’t think it began with the day care story, but it was the strongest signal so far.

They have stopped being innovative a long time ago, but now it seems their only way to get at least a little growth is to buy it. I congratulate Rambler to their deal, but Google should be careful that they don’t get the same problem IBM and Microsoft got when they got too big.

But lets see if the 19.000-Nerd-Amoeba can find a way out of its misery.

Our new Blog

Filed under: Uncategorized — Tags: , , , , , — metasieve @ 11:28 am

We changed our blog to wordpress, because we didn’t want to build blog software. We want to build the best search software there is.

As usual it did not take us long to find the right one. Both Björn and I had already worked with wordpress, so when we decided this morning that we needed better blogging software, it was immediately clear what we would do.

We had experimented with Serendipity, but even the little management you have to do there would have kept us from our mission. Why spend 1 hour configuring a blogging system, when you can just sign up at wordpress?

Create a free website or blog at WordPress.com.