Traffick - Search Engine Enlightenment

Search Engine Enlightenment

Search »
 
    Home  |   About Traffick   |   Traffick Directory   |   Article Archive   |   Internet News   |   RSS   |   Contact Us

Tuesday, April 15, 2003

Taking Disambiguation Seriously

Today's search engines do a lot of things well, but one thing they don't do well - as astutely noted by erudite literary critic and National Post columnist Robert Fulford - is understand the precise meaning of the words you type. (Fulford's a crafty one. He was using Dogpile as early as 1997, and like any other sensible person wanting specific answers about search, he visits Search Engine Watch.)

I've been listening to the "disambiguation technology" story since 1999 when fledgling semantic technology companies like Oingo (now Applied Semantics) and Simpli (acquired by Netzero and then sold to pay-per-click keyword advertising service Search123) were launched. The examples are easy to grasp. A user types in "genesis," and it could mean dozens of things including a rock band, a biblical book, or a product. Java can mean coffee or a programming language.

I'm no linguist, but it's clear enough that there is a part of our brain where we file all these different meanings, storing them as, for all intents and purposes, different words. Insofar as a search engine is too stupid to help me disambiguate my query about "storage," for example, (I am searching for shelves, but a lot of the search results give me info about esoteric RAID computer disk storage), they do a poor job of putting the right kind of information in front of the user. They might even do a poor job putting the right ads in front of users, which could ultimately cost them money and frustrate advertisers. It makes little sense, for example, to subject sellers of low-margin shelving products to the higher prices that may be paid by advertisers wanting to advertise fancy RAID memory storage. But there is no way around it as long as the search engine does only "dumb" keyword matching. As far as our brains are concerned it's relatively inconsequential that the word storage is used to describe a shelf, and the same sequence of letters describes a system for filing terabytes of data. We don't see them as the same thing, although they might be related. What would be needed, in the ideal world, would be a unique code (or "word") for every substantially different concept and sub-concept. So the storage relating to data might be called "storbage" or something.

Well that might be impractical, but scientists *can* lend support to efforts to link researchers more closely with the concepts they're seeking. The old-fashioned way, the way that information science has done it for centuries, is to hard-wire an ontology (or classification system) and make everyone conform to it. Another approach, the more contemporary one, is to work with conceptual "meaning maps" to help classify information.

To be sure, end users can disambiguate their own queries to a certain extent. But should they be required to do all the heavy lifting of weeding out all the wrong meanings of words like java, storage, genesis, etc.? And will such efforts sometimes exclude pages because the phrases typed aren't matched on otherwise high-quality pages about the topic?

Given how diligently I've been trying to puzzle through all these issues, then, perhaps it isn't unfair or one-sided of me to reprint excerpts from a recent commentary by Applied Semantics - applied in this case as a critique of Google's much-ballyhooed Content Targeted Advertising.

"Google enters the contextual targeting advertising arena... Yahoo! upgrades its search technologies... Overture claims to have similar technology... So, what does all this mean?

"It means that there is an increasing demand and need for an innovative, durable contextual targeting solution for online advertisers. Major players like Google, Yahoo! and Overture are raising the stakes by entering the space. Online publishers and advertisers have to realize that a company's popularity, size and name recognition is not always indicative of its solution. In fact, many of these mega-company's contextual targeting applications are not effectively and efficiently getting the job done.

"Amongst these huge players is a small giant, packing the punch with the best contextual targeting product in the marketplace - AdSense. AdSense extracts the meaning of a web page and dynamically generates ads comprised of P4P (pay-for-placement) search terms and results on the fly. What makes AdSense technology different from the competition is its filtering capabilities, technology approach and partnership structure.

"In a head-to-head comparison between AdSense and Google's content targeting product, we found that AdSense has superior technology and obvious advantages in many categories:

  • "AdSense technology processes actual content of web page, whereas Google technology is only based on URL/web log statistics
  • "AdSense uses its CIRCA(tm) ontology to understand and extract key themes of a page, whereas Google relies solely upon user trends and patterns which are often inconsistent
  • "AdSense aims to balance relevancy with CPC value to maximize effective CPM; Google has less effective CPM maximization
  • "Ability to discern ambiguous terms - AdSense disambiguates, unlike Google who has no disambiguation capability
  • "AdSense returns granular, precise keyword results; Google offers inconsistent results based on user search data; performs best on broad/general topics
  • "AdSense provides advanced filtering technology; Google has no objectionable or competitive filtering mechanisms
  • "AdSense offers customized ad designs tailored to customer preferences; Google has no ad design customization capabilities
  • "AdSense offers a revenue share model, ensuring that we all share in the upside and are committed to developing the optimal implementation to ensure the highest user satisfaction."

    If Google's advertising program lacks semantic technology, then this hole must exist in its search technology as well.

    Whether it's brought about by a new era of robust metadata protocols, or technology such as Applied Semantics' CIRCA, it will be a pleasure to be able to type in the phrase "green jacket" and only receive results relating to the Masters golf tournament, or alternatively, only results NOT related to golf but rather to green jackets in general, according to my preference.

    And advertisers will be pleased the day that they can reach customers who are really interested in disk storage, for example, rather than having their ad show up on all kinds of pages relating to storage cabinets, just because some keywords happen to match.

    Disclaimer: Applied Semantics' comments about Google are mostly speculative.

    Posted by Andrew Goodman
    | | Permalink

    Digg this Traffick post Grab the Traffick RSS feed  

     

    View Recent Posts

     

    The Traffick Search Engine Directory ::
    » Internet Marketing
    » Internet Tools
    » Search Engines
    » Web Browsers
    » Web Portals
    » Webmaster Tools
    » About the Directory
    » Add URL
    » Traffick Report: Flock

Traffick RSS feed

:: STAY CONNECTED ::


:: SEM 2.0 GROUP ::


Join the SEM 2.0 discussion group
1,500 high quality members, and growing!

 


:: PREVIOUSLY ::

 Recent Posts

:: FRIENDS O' TRAFFICK ::


» Battelle's Searchblog
» HighRankings
» IE Blog
» Inside AdWords
» Matt Cutts' Blog
» MozillaZine
» PaidContent.org
» Search Engine Blog
» Search Engine Guide
» Search Engine Watch
» SEM 2.0 Group
» Seth's Blog

» Yahoo! Search Blog




© 1999 - 2007 Traffick.com. All Rights Reserved

Home - About Traffick - Newsfeeds - Directory - Articles - Site Map - Send to a Friend - RSS Feeds