|
Wednesday, April 07, 2010
I never tire of listening to experts like Mike Grehan speaking about the new signals search engines are beginning to look at, because it's so important to bust the myths about how search engines work.
To hear many people talk, today's major engines are faced with little more than a slightly-beefed- up, slightly larger, version of a closed database search. Need the medical records for your patient Johnny Jones, from your closed database of 500 medical records, just type in johnny or jones or johnny jones, and you're good to go. Isn't that search, in a nutshell? It is: if you can guarantee that you're referring to a nutshell like that. But with web search, it's nothing like that.
The World Wide Web now has a trillion pages or page-like entities... that Google knows about. (They don't know what to do with all of them, but they'll admit to the trillion.) Some observers estimate that there will soon be five trillion of these in total, too many to index or handle. Who knows, maybe 10% of that could be useful to a user or worthy of indexing. But until some signal tells the search engine to index them in earnest, they'll just sit there, invisible. That's out of necessity: there's just too much.
The difference isn't only quantitative, it's also qualitative. User queries have all sorts of intents, and search engines aren't just trying to show you "all the pages that match". There are too many pages that match, in one way or another. The task of measuring relevancy, quality, and intent is far more complex than it looks at first.
And on top of that, people are trying to game the algorithm. Millions of people. This is known as "adversarial" information retrieval in an "open" system where anyone can post information or spam. The complexity of rank ordering results on a particular keyword query therefore rises exponentially.
In light of all this, search engines have done a pretty good job of looking at off-page signals to tell what's useful, relevant, and interesting. The major push began with the linking structure of the web, and now the effort has vastly expanded to many other emerging signals; especially, user behavior (consumption of the content; clickstreams; user trails) and new types of sharing and linking behavior in social media.
This is a must, because any mechanical counting and measuring exercise is bound to disappoint users if it isn't incredibly sophisticated and subtle. Think links. Thousands of SEO experts are still teaching you tricks for how to get "authoritative" inbound links to your sites & pages. But do users want to see truly remarkable content, or content that scored highly in part because someone followed an SEO to-do list? And how, then, do we measure what is truly remarkable?
Now that Twitter is a key source of evidence for the remarkability of content, let's consider it as an interesting behavioral lab. Look at two kinds of signal. The first is where you ask a few friends to retweet your article or observation, and they do. A prickly variation of that is where you have a much larger circle of friends, or you orchestrate semi-fake friends to do your bidding, with significant automation involved.
But another type of remarkable happens when your contribution truly makes non-confidantes want to retweet and otherwise mention you. When your article or insight achieves "breakout" beyond your circle of confidantes, and further confirming signals of user satisfaction later on when people stumble on it.
Telling the difference is an incredible challenge for search engines. Garden variety tactical optimization will work to a degree, mainly because some signals of interest will tend to dwarf the many instances of "zero effort or interest". But we should all hope that search engines get better and better at sniffing out the difference between truly remarkable (or remarkably relevant to you the end user) and these counterfeit signals that can be manufactured by tacticians simply going through the motions.Labels: search engine relevancy
Posted by
Andrew Goodman
View Posts by Category |
|


|