Traffick - The Business of Search Engines & Web Portals
Blog Categories (aka Tags) Archive of Traffick Articles Our Internet Marketing Consulting Services Contact the Traffickers Traffick RSS Feed

Tuesday, April 15, 2003

Taking Disambiguation Seriously

Today's search engines do a lot of things well, but one thing they don't do well - as astutely noted by erudite literary critic and National Post columnist Robert Fulford - is understand the precise meaning of the words you type. (Fulford's a crafty one. He was using Dogpile as early as 1997, and like any other sensible person wanting specific answers about search, he visits Search Engine Watch.)

I've been listening to the "disambiguation technology" story since 1999 when fledgling semantic technology companies like Oingo (now Applied Semantics) and Simpli (acquired by Netzero and then sold to pay-per-click keyword advertising service Search123) were launched. The examples are easy to grasp. A user types in "genesis," and it could mean dozens of things including a rock band, a biblical book, or a product. Java can mean coffee or a programming language.

I'm no linguist, but it's clear enough that there is a part of our brain where we file all these different meanings, storing them as, for all intents and purposes, different words. Insofar as a search engine is too stupid to help me disambiguate my query about "storage," for example, (I am searching for shelves, but a lot of the search results give me info about esoteric RAID computer disk storage), they do a poor job of putting the right kind of information in front of the user. They might even do a poor job putting the right ads in front of users, which could ultimately cost them money and frustrate advertisers. It makes little sense, for example, to subject sellers of low-margin shelving products to the higher prices that may be paid by advertisers wanting to advertise fancy RAID memory storage. But there is no way around it as long as the search engine does only "dumb" keyword matching. As far as our brains are concerned it's relatively inconsequential that the word storage is used to describe a shelf, and the same sequence of letters describes a system for filing terabytes of data. We don't see them as the same thing, although they might be related. What would be needed, in the ideal world, would be a unique code (or "word") for every substantially different concept and sub-concept. So the storage relating to data might be called "storbage" or something.

Well that might be impractical, but scientists *can* lend support to efforts to link researchers more closely with the concepts they're seeking. The old-fashioned way, the way that information science has done it for centuries, is to hard-wire an ontology (or classification system) and make everyone conform to it. Another approach, the more contemporary one, is to work with conceptual "meaning maps" to help classify information.

To be sure, end users can disambiguate their own queries to a certain extent. But should they be required to do all the heavy lifting of weeding out all the wrong meanings of words like java, storage, genesis, etc.? And will such efforts sometimes exclude pages because the phrases typed aren't matched on otherwise high-quality pages about the topic?

Given how diligently I've been trying to puzzle through all these issues, then, perhaps it isn't unfair or one-sided of me to reprint excerpts from a recent commentary by Applied Semantics - applied in this case as a critique of Google's much-ballyhooed Content Targeted Advertising.

"Google enters the contextual targeting advertising arena... Yahoo! upgrades its search technologies... Overture claims to have similar technology... So, what does all this mean?

"It means that there is an increasing demand and need for an innovative, durable contextual targeting solution for online advertisers. Major players like Google, Yahoo! and Overture are raising the stakes by entering the space. Online publishers and advertisers have to realize that a company's popularity, size and name recognition is not always indicative of its solution. In fact, many of these mega-company's contextual targeting applications are not effectively and efficiently getting the job done.

"Amongst these huge players is a small giant, packing the punch with the best contextual targeting product in the marketplace - AdSense. AdSense extracts the meaning of a web page and dynamically generates ads comprised of P4P (pay-for-placement) search terms and results on the fly. What makes AdSense technology different from the competition is its filtering capabilities, technology approach and partnership structure.

"In a head-to-head comparison between AdSense and Google's content targeting product, we found that AdSense has superior technology and obvious advantages in many categories:

  • "AdSense technology processes actual content of web page, whereas Google technology is only based on URL/web log statistics
  • "AdSense uses its CIRCA(tm) ontology to understand and extract key themes of a page, whereas Google relies solely upon user trends and patterns which are often inconsistent
  • "AdSense aims to balance relevancy with CPC value to maximize effective CPM; Google has less effective CPM maximization
  • "Ability to discern ambiguous terms - AdSense disambiguates, unlike Google who has no disambiguation capability
  • "AdSense returns granular, precise keyword results; Google offers inconsistent results based on user search data; performs best on broad/general topics
  • "AdSense provides advanced filtering technology; Google has no objectionable or competitive filtering mechanisms
  • "AdSense offers customized ad designs tailored to customer preferences; Google has no ad design customization capabilities
  • "AdSense offers a revenue share model, ensuring that we all share in the upside and are committed to developing the optimal implementation to ensure the highest user satisfaction."

    If Google's advertising program lacks semantic technology, then this hole must exist in its search technology as well.

    Whether it's brought about by a new era of robust metadata protocols, or technology such as Applied Semantics' CIRCA, it will be a pleasure to be able to type in the phrase "green jacket" and only receive results relating to the Masters golf tournament, or alternatively, only results NOT related to golf but rather to green jackets in general, according to my preference.

    And advertisers will be pleased the day that they can reach customers who are really interested in disk storage, for example, rather than having their ad show up on all kinds of pages relating to storage cabinets, just because some keywords happen to match.

    Disclaimer: Applied Semantics' comments about Google are mostly speculative.

    Posted by Andrew Goodman




    View Posts by Category

 

Speaking Engagement

I am speaking at SMX West

Need Solid Advice?        

Google AdWords book


Andrew's book, Winning Results With Google AdWords, (McGraw-Hill, 2nd ed.), is still helping tens of thousands of advertisers cut through the noise and set a solid course for campaign ROI.

And for a glowing review of the pioneering 1st ed. of the book, check out this review, by none other than Google's Matt Cutts.


Posts from 2002 to 2010


07/2002
08/2002
09/2002
10/2002
11/2002
12/2002
01/2003
02/2003
03/2003
04/2003
05/2003
06/2003
07/2003
08/2003
09/2003
10/2003
11/2003
12/2003
01/2004
02/2004
03/2004
04/2004
05/2004
06/2004
07/2004
08/2004
09/2004
10/2004
11/2004
12/2004
01/2005
02/2005
03/2005
04/2005
05/2005
06/2005
07/2005
08/2005
09/2005
10/2005
11/2005
12/2005
01/2006
02/2006
03/2006
04/2006
05/2006
06/2006
07/2006
08/2006
09/2006
10/2006
11/2006
12/2006
01/2007
02/2007
03/2007
04/2007
05/2007
06/2007
07/2007
08/2007
09/2007
10/2007
11/2007
12/2007
01/2008
02/2008
03/2008
04/2008
05/2008
06/2008
07/2008
08/2008
09/2008
10/2008
11/2008
12/2008
01/2009
02/2009
03/2009
04/2009
05/2009
06/2009
07/2009
08/2009
09/2009
10/2009
11/2009
12/2009
01/2010
02/2010
03/2010
04/2010

Recent Posts


Canadian Golf Hero Trails Legendary Psycho in Buzz...

Everything You Wanted to Know about Black Holes ...

Opera "Got Better," But Does it Matter? There a...

Happy Tenth to the Browser An interesting artic...

Zeitgeist, Lies, and Videotape For a certain se...

NY Times is Last Major U.S. Publication to Write A...

Lycos 50 Not the Only Search Engine Spy in Town ...

Chandler's Winning Friends Already World-famous...

Search Engine Watch Enters the 21st Century Sea...

Yahoo Search Syntax Courtesy Tara Calishain and...

 


Traffick - The Business of Search Engines & Web Portals

 


Home | Categories | Archive | About Us | Internet Marketing Consulting | Contact Us
© 1999 - 2013 Traffick.com. All Rights Reserved