To best serve our clients we need a better understanding of Google's algorithm. In the past we have tested individual ideas and concepts regarding optimisation to give an idea of what works and what does not. However this approach is unscientific.
Whilst we used to publish our findings and try to educate the other Search Agencies who use our white label services, we soon realised that the UK Search Arena is not ready for our version of SEO.
The first things we need to do is to copy the Internet. Whilst we have access to over 50 servers, and the storage capacity runs into the hundreds of terabytes, we're not Google. All we can do is make a copy of part of the Internet, for example selecting UK websites, defined by SLD (.co.uk etc) and by geographic IP.
We have 1,000,000 also complete websites in our database, (incomplete with images, scripts removed compressed etc), as our test bed. We try and avoid websites like bbc.co.uk with its 60 million pages and Wiki, instead indexing the more manageable sites.
As we are experienced in search, especially optimising for Google, and unpaid search in general, we already know the vast majority of the 200 to 300 variables that are included in Google's algorithm. From previous research we have a good idea of the weights that Google puts on particular variables, which variables are currently deemed the most important etc.
We decided to score of variables using a ratio between -100 and 100. The minus ratio is important to be able to set up a system where for example, H1 is used excessively or if a website is guilty of keyword stuffing, then over optimising a website will have clear scoring detractions.
After copying the UK Internet, and writing the algorithm for the first time, we have begun to create our own search engine, which far from looking to replace Google as an alternative in UK search, we are looking to emulate Google's current results.
Of course the problems begin to arise around data storage. When we index some e-commerce sites, the amount of data can become huge even with all of our page stripping. Where are we going to draw the line? Because if our test data is too small, then our algorithm is inaccurate.
If you're reading this page with interest and want to get involved, then please call Benedict on 020 8405 6418, preferably later in a working day, when we have time to talk about these things. It is one of the disadvantages of UK Internet search research that we almost find ourselves in isolation, and often have to resort to our American cousins for information and assistance.
The projects is at the stage where we have written the code necessary to continue copying the UK Internet. The bots have already copied nearly 100 TB of data. What we are working on now is the algorithm.
One of the advantages of having nine years experience of SEO is the Benedict Sykes can list off variables, that include many of the best practise SEO techniques and more importantly many than not utilised currently by your average digital agency.
However, one can imagine when writing the algorithm, the difficulties that Brin and Larry faced, when creating any list of rules, how best to make the algorithm as accurate as possible, whilst presenting the user with exactly the information they are seeking.
Fortunately, with endless dictionaries, list of synonyms, access to endless search engine back link checkers, and of course with our own data on the 3000 odd websites we have created, we are in with a chance.
It is proving to be an exercise of endless tweaking. We have managed to emulate Google search results, with a 50% accuracy. Whilst we can get the first two or three pages of a Google search very similar, we fall down when search queries use more generic expressions generic expressions.
What is slowing the process is of course when websites appear in a Google search, which we have not identified, nor copied.
What we hope to end up with is of course the complete algorithm. Of course we realise that at the end of every research period Google no doubt will make changes.
However that is technology, and we believe that as more and more of our algorithm is accurate, we will be able to take a website, currently listing on page 10 of Google for a keyword or key term, and know exactly, not through hunches or through past research, not through ideas and concepts, but EXACTLY what that site is lacking in terms of advanced optimisation and off page metrics, in order to rank number one in Google for that search term.
Fortunately we can recreate THE MAJORITY of external metrics, backlink, social media etc. In order to negate external factors that Google can take into consideration, as it has a copy of the entire Internet we have built 3000 websites to work as an substitute.
Why are we doing this, we already know how to translate digital data into Google friendly data. We already know the balance in the content optimisation, site structure, internal and external use of anchor text, and the creation of effective invaluable backlink. I think what we are really focusing on, is efficiency.
Are we a successful SEO company, because we are better at identifying and implementing higher score algorithm variables than other companies, or is it a case that the work we currently do, contains more positives, then negatives? Perhaps the answer lies in the fact you FOUND this site and are reading this page?
"I think we are striving for the most efficient use of resources; where we can improve on what we are doing well, and are being rewarded for by the Googlebot, where can we weed out, work which is both algorithmically inefficient, or just plain wrong?"