|
Wednesday, April 07, 2010
I never tire of listening to experts like Mike Grehan speaking about the new signals search engines are beginning to look at, because it's so important to bust the myths about how search engines work.
To hear many people talk, today's major engines are faced with little more than a slightly-beefed- up, slightly larger, version of a closed database search. Need the medical records for your patient Johnny Jones, from your closed database of 500 medical records, just type in johnny or jones or johnny jones, and you're good to go. Isn't that search, in a nutshell? It is: if you can guarantee that you're referring to a nutshell like that. But with web search, it's nothing like that.
The World Wide Web now has a trillion pages or page-like entities... that Google knows about. (They don't know what to do with all of them, but they'll admit to the trillion.) Some observers estimate that there will soon be five trillion of these in total, too many to index or handle. Who knows, maybe 10% of that could be useful to a user or worthy of indexing. But until some signal tells the search engine to index them in earnest, they'll just sit there, invisible. That's out of necessity: there's just too much.
The difference isn't only quantitative, it's also qualitative. User queries have all sorts of intents, and search engines aren't just trying to show you "all the pages that match". There are too many pages that match, in one way or another. The task of measuring relevancy, quality, and intent is far more complex than it looks at first.
And on top of that, people are trying to game the algorithm. Millions of people. This is known as "adversarial" information retrieval in an "open" system where anyone can post information or spam. The complexity of rank ordering results on a particular keyword query therefore rises exponentially.
In light of all this, search engines have done a pretty good job of looking at off-page signals to tell what's useful, relevant, and interesting. The major push began with the linking structure of the web, and now the effort has vastly expanded to many other emerging signals; especially, user behavior (consumption of the content; clickstreams; user trails) and new types of sharing and linking behavior in social media.
This is a must, because any mechanical counting and measuring exercise is bound to disappoint users if it isn't incredibly sophisticated and subtle. Think links. Thousands of SEO experts are still teaching you tricks for how to get "authoritative" inbound links to your sites & pages. But do users want to see truly remarkable content, or content that scored highly in part because someone followed an SEO to-do list? And how, then, do we measure what is truly remarkable?
Now that Twitter is a key source of evidence for the remarkability of content, let's consider it as an interesting behavioral lab. Look at two kinds of signal. The first is where you ask a few friends to retweet your article or observation, and they do. A prickly variation of that is where you have a much larger circle of friends, or you orchestrate semi-fake friends to do your bidding, with significant automation involved.
But another type of remarkable happens when your contribution truly makes non-confidantes want to retweet and otherwise mention you. When your article or insight achieves "breakout" beyond your circle of confidantes, and further confirming signals of user satisfaction later on when people stumble on it.
Telling the difference is an incredible challenge for search engines. Garden variety tactical optimization will work to a degree, mainly because some signals of interest will tend to dwarf the many instances of "zero effort or interest". But we should all hope that search engines get better and better at sniffing out the difference between truly remarkable (or remarkably relevant to you the end user) and these counterfeit signals that can be manufactured by tacticians simply going through the motions.Labels: search engine relevancy
Posted by
Andrew Goodman
Friday, July 04, 2008
Those well versed in the search game can easily chronicle how the act of social linking gave way to the link economy. When Google gave quality links value, the game was all about how to get them. Can't get favorable external mentions? Swap links. Google discounting reciprocal linking? Join up in a large, elaborate interlinking scheme that passes PageRank to members. Oh, but you knew Google would get wise to that too, didn't you? And that they would introduce Other Ranking Factors and spam tests to try to get the real good stuff to bubble back up to the top again?
Now, word's out that Yelp management won't take such schemes lying down when it comes to local business owners banding together to write positive reviews in order to boost each others' reputations and rankings in category listings. The business owners protest; Yelp sticks to its guns. Is it Orwellian? Consumer friendly? Or should anyone get their shorts in a knot about a few glowing reviews of the local pull-taffy-and-bubble-tea emporium? I mean, who doesn't like taffy?
Well, maybe it's a bit of both. Google's practices and philosophy are very similar. To paraphrase: "We reserve the right to torch your rankings if we suspect any shenanigans. Sorry."
It all boils down to the fact that neither Google nor Yelp ratings are literally "correct." Both are open to interpretation and game-playing. However, in a more comprehensive sense, businesses can develop strong reputations by being visible on these properties, and they can do so without cheating. For now, the publishers' attempt to stem cheating will be tinged with arbitrariness. Some howls of protest might be legit. Longer term, these sites will allow for deeper probing into claims: peers will be able to find peers and get a better sense of what's real.
Make no mistake about it, though: relevancy rankings, and business ratings and reviews, are serious business. Consumers depend on them. Businesses with strong ratings often deserve them. It would be a huge shame if the Googles and Yelps of the world were forced to give into scaremongering about their imperfect technology; perhaps left to plaster For Entertainment Value Only across their pages, like some cheap carnival psychic.
In case you missed it: National Taffy DayLabels: local search, relevance, reviews, search engine relevancy
Posted by
Andrew Goodman
Monday, May 14, 2007
Picked up that little hack from TheGrokDotCom, to show Google SERP's with no ads.
It can't be news to Google that the ads have to be more relevant to some users than the adjacent organic results, at least some of the time, or Google's main cash cow is kaput.
Luckily for them, they've been thinking about it a long time.
Would I rather see something like:

or...

At the very least, it's not a slam dunk either way.
You get the feeling Google has thought a fair bit about the relative attractiveness of the organic and paid listings on commercially-oriented queries.
Users, not me, not Google, have to agree, or they'd be out of business. But Google can do plenty to gently tip the balance towards the ads, to ensure that Jakob Nielsen's "box blindness" scenario (now four years old!) doesn't sink the company. Part of that is how do you regulate and display the ads. But surely another side of it has to do with assessing the attractiveness of how the organic SERP's are displayed: placements, usefulness of text snippets, and yep, even what counts as "relevancy." In the most generous interpretation, Google has it neatly bifurcated so more commercially-oriented searchers get what they need, while informational searchers also get theirs.Labels: search engine advertising, search engine relevancy, usability
Posted by
Andrew Goodman
Friday, April 13, 2007
Teoma and even Direct Hit before that were always sentimental favorites in the race for relevancy. Good to hear they're revamping both technologies for inclusion in a new algorithm code-named Edison.
It's always been the way to build a better search engine company. Build a better search engine.Labels: ask.com, direct hit, edison, search engine relevancy
Posted by
Andrew Goodman
View Posts by Category |