Tuesday, October 18, 2005

Updation Celebration

Twice recently people have forwarded email messages with a term of note:

The problem I am having is to have the changes reflected back to the database table. i.e. updation.

We suggest you to please contact xxxxx for the updation.

It's pretty clear to me where this is coming from. In the first instance especially, updation is one of the four basic database operations: select, insert, delete, update. The other three have nice nominal forms: selection, insertion, deletion. Update is, uh, ... update. The temptation to make a form parallel with the rest is very strong.

In fact, it's a strong enough temptation that Google gets 195,000 hits, with much un-self-conscious use for the term:

I note that most of the instances I see are in informal contexts -- blogs or forum posts.

In searching for cites, I also found a forum post entitled "Updation is not a word." Yeah, well ... not yet.

Monday, October 17, 2005

Invite, your comments

I'm on a GoogleFight kick, since watching terms slug it out in Flash feels like you're doing research, sort of, but hey, fun. But the question actually came first, which was this: when did invite (1,580,000) start becoming a serious contender for invitation (15,400,000)? Such as I'll send you an invite, a sentence I might hear several times a day.[1]

It's clearly not a new phenomenon, since most dictionaries list it as a noun, "informal." I suppose my real question is whether people who use an invite regularly also use an invitation in more formal contexts or whether it's more dialectical -- you use one or the other more-or-less exclusively. (I'll send you a wedding invite.)

[1] Mind you, no one's inviting me, and certainly not if I persist in asking "Did you just say an invite?"

Let's vote on how to spell that

Technically not an Evolving English thing per se, although it might pertain to the subspecies of evolving English that we might call Evolving Orthography. ("Evolving English Lite" would capture it in spirit.)

Anyway, Joel Spolsky comments today in passing on how Google does things fundamentally differently than other companies. Here's the cite that caught my eye:
Look at how Google does spell checking: it's not based on dictionaries; it's based on word usage statistics of the entire Internet, which is why Google knows how to correct my name, misspelled, and Microsoft Word doesn't.
As anyone who's mistyped a phrase in Google knows, Google is eerily good at DWIM searching.[1]

What's interesting to me about Joel's comment is that from Google's perspective, the accuracy of a term -- specifically its spelling -- is effectively a democratic process. Put another way, Google does not care what any given authority might say about the correctness of a particular spelling; instead, it is the ultimate in descriptivist empiricism -- the term with the most usage is the "more correct" term.

Given this hypothesis, let's see if I can devise a way to test it. Using GoogleFight, I'll compare some terms whose official spelling, speaking very broadly, might be open to debate:

light (468,000,000) versus lite (53,100,000)
Not a close contest, but that's still a respectable number of hits for a comparatively new variant. (One in nine, right?)

night (421,000,000) versus nite (8,780,000)
Clearly lite has made more inroads into light than nite has into night.

dependent (131,000,000) versus dependant (7,430,000)
I guess I can tell my writers that empirical evidence overwhelmingly favors the first. (And, may I add, whew.)

checkbox (9,880,000) versus check box (7,670,000)
It appears that common high-tech usage has not yet had broad influence. Aha. Now I can go back to our editorial committee and tell them to get with the 21st century.

Update: Someone pointed out (see Comments) that I was searching for "check+box" (two words on same page), not on the literal string "check box". Stats updated, conclusion updated. (I thought I'd looked it up as a literal, but guess not.)

collectable (7,600,000) versus collectible (15,900,000)
Bet you didn't think that one would be this close, did you?

canceling (5,210,000) versus cancelling (3,950,000)
Ooh, close one. Brits, obviously.

through (2,350,000,000) versus thru (35,600,000)
I'm sure purists everywhere are relieved.

cachable (95,300) versus cacheable (362,000)
This, if I read it right, contradicts what we're told in our corporate styleguide (2,100,000) or style guide (25,900,000).

dialog (71,900,000) versus dialogue (124,000,000)
The former I would bet is both more American and definitely far, far more prevalent in computers ("dialog box").

vendor (172,000,000) versus vender (8,980,000)

hiccup (2,060,000) versus hiccough (166,000)
Well, that seems pretty clear.

donut (3,330,000) versus doughnut (1,760,000)
The historical spelling is the loser here.

I don't think there are too many surprises in there, and nothing that would contradict what a reasonably contemporary authority would say. But of course the point is that Google cares not a whit for what authorities say (the learnèd opinions of an august "Usage Panel," say); it's going entirely by what people actually use. Then again, do people actually use what an authority (AHD, for example) says they should? Well, mostly, but people do what they want, and no one is going to put lite back into the can.

[1] In finding a link for DWIM, I found ran across the Wikipedia entry, which says this: "Obviously, no real-life implementations of DWIM exist for any platform." I wonder if one could say that Google goes some way toward refuting this statement.