Friday, August 01, 2008
So basically, what SearchMonkey does is allow publishers to push their customized content look and feel right into the Yahoo Search results. Well, not push, actually, but make available, based on open formats. So for users who opt into the widget for any given company's rich content, if the listing for that content comes up in search results, custom look-and-feel info included. So if it's a Yelp review, you get a bit of the Yelp "richness, look, and feel" right on Yahoo.
When I search for "g h johnson toronto," a review from HomeStars comes up in the SERP's. The user (if they have opted in) sees a small inobtrusive HomeStars logo. It goes no farther than that unless they also click on the down arrow. This provides custom, rich information without leaving the search results page: our custom star rating icon, the average rating, a synopsis of the most recent review, and three links so that a searcher could then click for photos, all reviews in that city, or top rated companies in the furniture category.
It's pretty cool Yahoo is allowing for this level of richness for custom content right in the SERP's, but in a mostly opt-in environment. This could go in any number of directions, but for now, unsurprisingly, they are taking a cautious route.
Labels: metadata, search
Wednesday, July 16, 2008
This very timely post by Google Fellow Amit Singhal gives us a brilliant capsule summary of past and present trends in information retrieval. Most of those who pay close attention will be familiar with the high level trends, as well as some of the bells and whistles that search engines have added that do a great job of guessing at user intent.
In nearly every bullet point though, there is an unspoken assumption that is at odds with the thinking of at least half the "folks new to search" I encounter - those just now thinking about beefing up their business' visibility on search engines. Namely: Singhal's post assumes that the search technology has something substantive to index, consider, and rank for the benefit of those seeking information. Today's search engines could care less about ranking "sites," "pages," and "companies," outside of any context like "there is something textual on that site, page, and to do with that company that we can search for."
In other words, doing better in search isn't primarily about labeling, it's about substance, and most of that substance - this is what Singhal's bullet points merely assume, but do not spell out in the painfully scolding type of language that will be needed to get the message to sink in for those seeking magic ranking elixirs - is text.
Yes, we can specifically find images, videos, and yadda yadda yadda, if they are indeed helpfully labeled and we go looking for them. Universal and blended search will include these in results, too.
That being said, newbies to search still overfocus on the zen of labeling and ranking, even when it comes to pages, sites, and presences which are a largely empty vessels.
For search engines to find and rank and care about you - you've got to have a strategy to go ahead and create that (still largely textual, written, compelling, relevant, useful) content. Seems obvious, right? Apparently, not to everyone.
This reminds me of the very recent post by Vanessa Fox where she expresses skepticism about Google being "better" at "crawling Flash." Tain't nothin' wrong with animations, and someday heck yeah, wouldn't it be nice if the technology got better at OCR, image recognition, and mind-reading. But for now, all you're doing is sort of better-labeling and better-handling what is really, in the end, still an empty vessel when it comes to the core rank-and-feature-textual-content ethos of search engines as narrated by Singhal. The indignation expressed by some of the commenters on Fox's post indicates that the message hasn't really sunk in. People like to think, hope -- even demand! -- that search engines should care about empty pages that people merely label. (Sorry for calling all your ponderous Flash introduction homepages "empty." That's just the way I feel. Hoping that'll get noticed on search engines because you spent a bunch of cash making an animation... well, that's just a cop-out. Is it really content?)
Now, as before, first-party-metadata-only reasons to rank and feature pages and sites are weak: it comes back to the fact that people lie about the value and relevancy of their own pages, and everyone wants to "rank high." Slightly better handling of non-textual content doesn't change that. The shortage of compelling textual content continues, in spite of a sea of tens of billions of thin, uninteresting pages; that shortage is especially acute among companies just beginning to dip toes in the water of "how am I going to rank on the search engines,"and those revisiting how to "spruce up" their flagging rankings. Would that there were a quick fix.
Labels: metadata, search
Monday, February 18, 2008
I have a pet peeve with Basecamp.
If you use this app, you know that when you create a message, you are automatically going to slot it into a "category". This works like tagging - all messages in a category are grouped for future reference. Things are also findable by project and chronologically.
But the default is that you MUST enter a category, and although you can make your own categories, the one that's alphabetically first and always staring you in the face is the oddly-named "Assets." What lazy people do on my teams (including me) is leave a bunch of stuff in "assets." I also created my own custom category called "blah."
Of course, the default category should be "None." Or at the very least, "Miscellaneous."
Labels: basecamp, metadata
On the general concept of metadata, Steve Yegge: "Metadata is any kind of description or model of something else. The comments in your code are just a a natural-language description of the computation. What makes metadata meta-data is that it's not strictly necessary. If I have a dog with some pedigree paperwork, and I lose the paperwork, I still have a perfectly valid dog."
Sunday, March 04, 2007
Tom Foreski in SiliconValleyWatcher lists all the ways that we the people are expected to help search engines do their jobs - so search technology isn't pulling its weight as compared with the human element.
Eye-opening at first, but rather than a blow-by-blow response, why not sum it up this way:
More detailed response:
- Should you do extra work to label your content or install sitemaps? Hmm, only if you want to be found.
- We're talking about publishers, not people. Therefore, purveyors of information (and/or products and offers) in a hugely competitive, open environment. Tom's article, for example, sports ads for conferences as well as Edelman, the world's largest PR firm. Seems like a tag or two might be a decent tradeoff for the exposure. You don't expect the engine to actually write the content for you, so what's a bit of extra metadata between friends? As for "people," it's the users that are getting a good deal out of the extra work you might do to label your content
In short, the claim that "people should just find me" is a bit like building an all-graphics site and hoping people will find you when they search for "guitar pick." Or sitting on your back porch strumming "Galveston" and praying you'll be invited onto American Idol.
- Tags or labels are indispensable when it comes to some kinds of content, such as videos or photos
- If something is useful or popular enough, depending on the community, third-party tagging can be helpful. What's the incentive to do this? Interesting question. What's my incentive to type this sentence? But yes I think there is a huge bunch of unlabeled stuff that probably will stay unlabeled because there is no incentive to label it. That doesn't mean search engines aren't going to try to "organize it and make it universally accessible."
- The article's general tone seems to suggest that the search engines are stingy about "sending their robots around." Far from it! Even relatively unpopular sites are spidered frequently nowadays.
- Search engines have advanced in many ways over the past few years. One of them is their sheer storage capacity. Index size is a huge challenge, which brings us to:
- The claim that corporate search engines are doing a better job of letting publishers take the lazy way out is a bit odd. It's a much smaller dataset, so stuff is much easier to find. But I'll grant that it is interesting that some of these technologies are quite good at recognizing industry-specific patterns, and autocategorizing content -- no user tagging required. But that's a whole internal debate in the info retrieval field. I'm sure some companies use human categorization!
- Search is a bit like matchmaking, and the meaning of what "search" is has expanded. Take the emerging field of local search. Now add the premise that "refine is the new search" (I don't think it really is on its own, but users definitely want to be able to "drill down" to get exactly what they want by telling the search engine). And hey, why not toss in the idea of geolocation & mapping. So I'm a user and I'm looking for a hardware store, let's say. Let's say I also want to find a hardware store that sells a certain brand of doorbell. I'd prefer it be within 20 minutes driving distance. And I want to find one that is "open 24 hrs." (just for argument's sake). None of that is ever going to be findable without a huge amount of research, unless of course the "publisher" (hardware store owners) is willing to upload their information in a structured format. By uploading that info, buyer and seller connect more easily. By not uploading it, you choose "not to be on the map." It's your choice.
- Things like Google Base are arguably research projects to help Google find out what are some common categorization schemas in a given industry - or a whole category, like brick and mortar retail. (If "open 24 hrs." is a common one, then maybe it'll come up more often in search and navigation databases as a yes/no item down the road, let's say.)
Labels: galveston, google base, local search, mapping, metadata, search engines, sitemaps
View Posts by Category