Thursday, January 15, 2009

Gag Me with a Spoon! - Latent Semantic Indexing?


For years it has been thought that Google is using word relationship technologies, one of which has been dubbed “latent semantic indexing,” or LSI, and when you think about it, this concept is really something out of a Sci-Fi movie. Not only does Google index certain words that appear in a document, but it examines the document collection as a whole, comparing it to other documents in order to determine which documents contain similar word choice. The really amazing thing is how well it correlates these semantically similar pages in a way that is strikingly close to the way a human would classify the same information.

We recently discovered an excellent example of this that almost makes it seem like an actual human being made changes to a search result because of the relevancy of the result. We did a search for “gag me with a spoon.” We all remember that phrase right? Well, some of the younger folk these days have no idea what it means, so a simple Google search is the logical solution. As it appears below, the sixth result for this query is a Wikipedia entry about “Valspeak.”



It just so happens that Valspeak is the term used to describe the kind of speech, or sociolect, associated with the phrase, “gag me with a spoon.” In other words, it is the language of “valley girls.” So that doesn’t seem so uncommon so far because you might think there are some examples of Valspeak, of which one would be “gag me with a spoon.” But the amazing thing is that this phrase does not appear even once on the entire page. Scour it as much as you like, but the phrase we queried is nowhere to be found. This is simply an excellent example of latent semantic indexing in which Google has taken terms that do appear, such as “valley girls,” “surfer slang,” “Southern California,” or even “Clueless,” and compared it to pages containing the phrase “gag me with a spoon.” As you might guess, there are probably a large number of commonalities with these pages, and thus, Google succeeds in placing a search result that is actually quite relevant to the query but that does not even contain that term.

So what does this mean for search marketers? Anyone can easily do a search to find the terms that Google considers relevant to certain keyphrases. Simply do a search for ~search marketing. The ~ character causes semantically related terms to appear in bold in the search results so that terms like online marketing and Internet marketing appear. It might be a good idea to include some of this terminology along with target keyphrases in order to take full advantage of latent semantic indexing and increase the relevancy of your pages.


About the Author: Peter Hamilton is the Project Manager in charge of the Seattle office of ArteWorks SEO. His interest and experience in search engine optimization is largely focused on social media optimization and multi-media facets of exposure specifically video SEO. To learn more about this search engine optimization company, visit www.arteworks.biz.


Labels: , ,

6 Comments:

At January 16, 2009 12:31 PM ,
Blogger theGypsy said...

HI gang, Dave here... I wouldn't be throwing around LSA/I too quickly, it is how the myth get's perpetuated. Google originally purchased Applied Semantics in 2003 for the Ad Serving stuff (AdSense/Awords) and there are many legitimate arguments why it never made it too the regular organic side of things.

I find it best to talk about it in terms of semantic systems, not LSI per se. You might want to look into PLSI, HTMM and even phrase based indexing and retrieval as well. They are all things that relate and have also been looked at by Google.

Of note is the phrase based stuff as Google also purchased related technology (from Anna Patterson) shortly after the Applied purchase... so we could infer that is being used (equally dangerous assumptions).

In the end it's best to simply talk about semantic relations and how they apply in SEO... not misleading peeps with that which we DON'T know.

... have a great weekend

  At January 16, 2009 3:40 PM ,
Blogger Peter Hamilton - Arteworks SEO said...

Thanks so much for the insight Dave! After hearing your argument, I would agree that we "don't know," exactly how Google handles semantic relations. I would also agree that considering how semantic relations impact SEO is definitely the moral of the story. I don't know much about Hidden Topic Markov Models, so thanks for mentioning them. I'll check 'em out!

  At January 16, 2009 4:47 PM ,
Blogger theGypsy said...

Not a problem at all - feel free to get in touch and talk shop, compare notes and the like.

I sent you a lead via Twitter and I would also check out some phrase based indexing and retrieval... and Microsoft has some interesting semantic papers and patents. The main thing to express is the related concepts, more than us guessing at specifics. Besides, simply understanding these concepts can be some heavy lifting, one tries to make it more malleable for the general SEO public.

Its always great to see some more technical topics getting tackled, so kudos on that.

Talk soon... Dave

  At January 16, 2009 5:45 PM ,
Anonymous Anonymous said...

Did you ever think to look at the site link profile? There are quite a bit of sites linking to that page using the term "gag me with a spoon" as the anchor text.

  At January 16, 2009 9:54 PM ,
Anonymous Online Internet Faxing said...

At least he who has the most links will generally be at the top before all the related phrases. I doubt it would be that hard at all to beat out that wiki page for the phrase with that kind of keyword density.

  At January 17, 2009 12:07 AM ,
Blogger Matt Foster, CEO, ArteWorks SEO said...

But isn't that the point Anonymous? That LSI uses such things as inbound linkage, etcetera to determine related terms?

 

Post a Comment

Thanks for your Comment!

<< Home