Taking Disambiguation Seriously
Today's search engines do a lot of things well, but one thing they don't do well - as astutely noted by erudite literary critic and National Post columnist Robert Fulford - is understand the precise meaning of the words you type. (Fulford's a crafty one. He was using Dogpile as early as 1997, and like any other sensible person wanting specific answers about search, he visits Search Engine Watch.)
I've been listening to the "disambiguation technology" story since 1999 when fledgling semantic technology companies like Oingo (now Applied Semantics) and Simpli (acquired by Netzero and then sold to pay-per-click keyword advertising service Search123) were launched. The examples are easy to grasp. A user types in "genesis," and it could mean dozens of things including a rock band, a biblical book, or a product. Java can mean coffee or a programming language.
I'm no linguist, but it's clear enough that there is a part of our brain where we file all these different meanings, storing them as, for all intents and purposes, different words. Insofar as a search engine is too stupid to help me disambiguate my query about "storage," for example, (I am searching for shelves, but a lot of the search results give me info about esoteric RAID computer disk storage), they do a poor job of putting the right kind of information in front of the user. They might even do a poor job putting the right ads in front of users, which could ultimately cost them money and frustrate advertisers. It makes little sense, for example, to subject sellers of low-margin shelving products to the higher prices that may be paid by advertisers wanting to advertise fancy RAID memory storage. But there is no way around it as long as the search engine does only "dumb" keyword matching. As far as our brains are concerned it's relatively inconsequential that the word storage is used to describe a shelf, and the same sequence of letters describes a system for filing terabytes of data. We don't see them as the same thing, although they might be related. What would be needed, in the ideal world, would be a unique code (or "word") for every substantially different concept and sub-concept. So the storage relating to data might be called "storbage" or something.
Well that might be impractical, but scientists *can* lend support to efforts to link researchers more closely with the concepts they're seeking. The old-fashioned way, the way that information science has done it for centuries, is to hard-wire an ontology (or classification system) and make everyone conform to it. Another approach, the more contemporary one, is to work with conceptual "meaning maps" to help classify information.
To be sure, end users can disambiguate their own queries to a certain extent. But should they be required to do all the heavy lifting of weeding out all the wrong meanings of words like java, storage, genesis, etc.? And will such efforts sometimes exclude pages because the phrases typed aren't matched on otherwise high-quality pages about the topic?
Given how diligently I've been trying to puzzle through all these issues, then, perhaps it isn't unfair or one-sided of me to reprint excerpts from a recent commentary by Applied Semantics - applied in this case as a critique of Google's much-ballyhooed Content Targeted Advertising.
"Google enters the contextual targeting advertising arena... Yahoo! upgrades its search technologies... Overture claims to have similar technology... So, what does all this mean?
"It means that there is an increasing demand and need for an innovative, durable contextual targeting solution for online advertisers. Major players like Google, Yahoo! and Overture are raising the stakes by entering the space. Online publishers and advertisers have to realize that a company's popularity, size and name recognition is not always indicative of its solution. In fact, many of these mega-company's contextual targeting applications are not effectively and efficiently getting the job done.
"Amongst these huge players is a small giant, packing the punch with the best contextual targeting product in the marketplace - AdSense. AdSense extracts the meaning of a web page and dynamically generates ads comprised of P4P (pay-for-placement) search terms and results on the fly. What makes AdSense technology different from the competition is its filtering capabilities, technology approach and partnership structure.
"In a head-to-head comparison between AdSense and Google's content targeting product, we found that AdSense has superior technology and obvious advantages in many categories:
If Google's advertising program lacks semantic technology, then this hole must exist in its search technology as well.
Whether it's brought about by a new era of robust metadata protocols, or technology such as Applied Semantics' CIRCA, it will be a pleasure to be able to type in the phrase "green jacket" and only receive results relating to the Masters golf tournament, or alternatively, only results NOT related to golf but rather to green jackets in general, according to my preference.
And advertisers will be pleased the day that they can reach customers who are really interested in disk storage, for example, rather than having their ad show up on all kinds of pages relating to storage cabinets, just because some keywords happen to match.
Disclaimer: Applied Semantics' comments about Google are mostly speculative.
Posted by Andrew Goodman
| | Permalink
| The Traffick Search Engine Directory :: |
| » Internet Marketing » Internet Tools » Search Engines |
» Web Browsers » Web Portals » Webmaster Tools |
» About the Directory » Add URL » Traffick Report: Flock |

