Sunday, December 9, 2007

Google Spellchecker

Google also has a built-in spellchecker, and when Google thinks it can spell individual words or complete phrases in your search query better than you can, it suggests a "better" search, hyperlinking it directly to a query. Google sometimes takes the liberty of "correcting" what it perceives to be a spelling error in your query. Most of us couldn't communicate with the outside world without a spellchecker. As you send off an email or put the finishing touches on a document, a trusty spellchecker makes sure you haven't made any blatant errors.

Suggestions aside, Google assumes that you know of what you speak and returns your requested results, provided your query gleaned results. For example, if you search for hydrecefallus, Google will ask if you meant hydrocephalus.

If your query found no results for the spellings you provided and Google believes it knows better, it will automatically run a new search of its own suggestions. Thus, a search for hydrecefallus finding (hopefully) no results sparks a Google-initiated search for hydrocephalus.

Mind you, Google does not arbitrarily come up with its suggestions, but builds them based on its own database of words and phrases found while indexing the Web. If you search for nonsense like kweghgjdlsggaa, you'll get no results and be offered no suggestions.

This is a lovely side effect and a quick and easy way to check the relative frequency of spellings. Query for a particular spelling, and note the number of results. Then click on Google's suggested spelling and note the number of results. It's surprising how close the counts are sometimes, indicating an oft-misspelled word or phrase.

If you find yourself turning to Google to compare spellings, you might want to automate the process of comparing phrases.

Embrace Misspellings

Don't make the mistake of automatically dismissing the proffered results from a misspelled word, particularly a proper name. I've been a fan of cartoonist Bill Mauldin for years now, but I repeatedly misspell his name as "Bill Maudlin." And judging from a quick Google search, I'm not the only one. There is no law stating that every page must be spellchecked before it goes online, so it's often worth taking a look at results despite misspellings.

As an experiment, try searching for two misspelled words on a related topic, such as normotensive hydrocephalis. What kind of information did you get? Could the information you got, if any, be grouped into a particular online genre?

At the time of this writing, the search for normotensive hydrocephalis gets only three results. The content here is generally from people dealing with various neurosurgical problems. Again, there is no law that states that all web materials have to be spellchecked.

Use this to your advantage as a researcher. When you're looking for layman accounts of illness and injury, the content you desire might actually be more often misspelled than not. On the other hand, when looking for highly technical information or references from credible sources, filtering out misspelled queries will bring you closer to the information you seek.

Spelling on the Command Line

The fact that Google gathers its spellings from across the Web instead of a dictionary means it can out-spell most email and word-processor spellcheckers. An email spellchecker won't catch that you've just misspelled the name of comedian Dave Shapel (or is it Dave Chapelle?), while Google's spellchecker will catch the error.

While this hack won't replace your standard spellcheckers with Google, the code in this section will show you how to bring the spellchecker a bit closer to your desktop.

The code

This code contacts the Google API and asks for a spelling suggestion for the supplied word or phrase. If you're not already accustomed to using the command line to get things done, this hack probably won't make contacting Google any easier than opening a web browser. But for command-line junkies, it's a quick way to tap the power of Google spelling.

Save the following code as spell.pl, and be sure to replace insert your key with your own Google API key:

#!/usr/local/bin/perl

# spell.pl

# Contact Google for spelling suggestions!

# Usage: perl spell.pl

# Your Google API developer's key.

my $google_key='insert your key';

# Location of the GoogleSearch WSDL file.

my $google_wsdl = "./GoogleSearch.wsdl";

use strict;

# Use the SOAP::Lite Perl module.

use SOAP::Lite;

# Take the query from the command line.

my $query = join(' ',@ARGV) or die "Usage: perl spell.pl \\n";

# Create a new SOAP::Lite instance, feeding it GoogleSearch.wsdl.

my $google_search = SOAP::Lite->service("file:$google_wsdl");

# Query Google.

my $results = $google_search ->

doSpellingSuggestion($google_key, $query);

# No results?

if ($results) {

print $results;

}

This script is similar to any bare-bones Perl script for contacting the Google API, but it uses the doSpellingSuggestion method instead of the standard search method.

Running the code

Run the script from the command line, passing in any word or phrase you want to check, like this:

% perl spell.pl

insert word or phrase

By passing in Dave Shapel, you can see how Google suggests you spell his name:

% perl spell.pl Dave Shapel

Dave Chapelle

If you pass in a correct spelling, the script simply returns no suggestions at all.

You still need to figure out which words are questionable to use this script, but when you need to double-check a name or phrase quickly, you can think of Google as your own personal lexiconographer (or is that lexicographer?).

No comments: