We have taken the plunge and decided to create our own search engine. Welcome to "Searchy". Admittedly not the most catchy of names but the domain was available and had a history of being a metacrawler since 2001 so we decided it will do for now.
We then decided to go hunting for open source search engine software and discovered Nutch and Solr. Unfortunately we could only index 500 million pages (in six weeks) and decided that it was not going to be the solution for us.
What we wanted was something faster and more adaptable.
We started writing our own code starting with the spider to fetch the data and then other software to correlate the data using our own algorithm.
Writing an algorithm is extremely simple until you work out that every rule needs to relate to about a 100 others and then all of them need to play ball with other rules off site that need to then be jumbled together to make a score that will list the data in a sequence that makes sense and is correct.
Like the old search engines Searchy has fallen foul of rewarding the over optimised and keyword stuffers on the internet. But then we had a brainwave and began to scrap most of the rules we had adopted from Google and began creating our own set of rules based on logic and language,
It was interesting that our search results created a very different set of results with the arguably better (and a little cranky) websites coming to the fore.
I think we have a long way to go especially as we underestimated the size of the internet. However we now have a copy of the UK internet and have it indexed and searchable. It is interesting to note how many of the UK's websites do not see the light of day under Google and how so many of the old SEO algorithmic favorites (bolds, H1s etc) are still obviously favoured still by Google, regardless of its claims to be state of the art and all singing and dancing!
In terms of Backlinks what was interesting was that for some searches "Searchy" came close to replicating Google search results even before we began adding the off page rules into the algorithm.
We will keep you posted here on our developments! If search is your thing and you want to get involved then give us a call on 020 8405 6418, Although the search engine is not profitable yet we are looking for Angel investors to get involved now that stage one is complete.