to B or not to b
(FORTUNE Magazine) – like many people, I don't use capital letters when I type e-mail. but when I got a new computer a few months ago, it had Microsoft software that automatically capitalizes the first letters of some words. (I'm using it now.) early on, I noticed some oddities. I was writing an e-mail to a friend about the campaign-financing scandals of the Clinton administration, and I referred to a very peripheral figure named Pauline konchanalak, whose last name I inadvertently misspelled. on second reference, I happened, with equal inadvertence, to spell her name correctly. but this time the surname popped up as Kanchanalak! Microsoft knew to capitalize Kanchanalak and yet not konchanalak!
soon I was noticing other peculiarities. for instance, most over-the-counter drugs were capitalized, like Excedrin and Tylenol, but prescription drugs were much harder to predict. thus, Claritin is up, but celebrex is down. Zoloft is up, prozac and paxil down. does Microsoft ask Pfizer or merck to pay for brand-name recognition? was there an upper-case shakedown going on?
given names also held surprises. why Karen and Sharon, but not nancy or mary? Stephen, but not steven?
or consider these shockers: Muhammad, Mohamed, Buddha, and Confucius are up, but not allah, jesus, or moses!
it got worse. all these policies were unstable over time! capitalization practices changed even as I experimented with them. in fact, I've had to manually capitalize several words in this story because they've lost their capacity to do it themselves. had I worn them out? stephen and zoloft and microsoft itself no longer perform for me! even viagra is spent.
it occurred to me that many of the anomalies had something to do with word length. the longer the word, the more likely it was to be capitalized. four-letter words were always down, as far as I could tell. five-letter words, on the other hand, seemed to be right on the fulcrum. most were down--like jesus, moses, and allah--yet there were exceptions, like Xerox and Karen. maybe there was something special about the letters 'x' and 'k' that threw such words into a different category. I eagerly tested my new theory, but with bitterly disappointing results. kafka, kadar, kemal, xhosa, and hoxha. yet Kodak, Exxon, and Akaka. meanwhile, unaccountably exalted outliers sprang from my control group: Helen, Miami, Judah.
had microsoft considered the repercussions of meting out all these preferences and slights? leaving allah down while honoring Exxon--was that prudent?
I called microsoft and spoke with simon marks, the 30-year-old, London-born product manager for the microsoft office division. light streamed in, and order was restored, as marks opened my eyes to the structure of microsoftian capitalism.
marks was a gentleman too: even as he dashed my pathetically wrongheaded hypotheses, he bucked up my self-esteem. as I had so keenly picked up, he explained, the lengths of words were 'absolutely key.' and yet there was nothing determinative about the number of letters that any word contained.
'let's take a step back,' he suggested. capitalization was just a narrow aspect of the broader function performed by the spell-checking software, he explained. when microsoft's spell-check notices a word it doesn't recognize, it regards it as a possible mistyping. but it does not presume to automatically correct anything unless it feels very confident that it knows what was intended. so in most cases, spell-check merely alerts the reader to an array of possibilities, by underlining the putatively mistyped word in red. when I type moses, for instance, spell-check puts a red squiggly line beneath the uncapitalized prophet's name. (how could I have written a whole piece on this subject and failed to notice the red squiggly lines?) if I then pursue the matter further in the tools menu, I discover spell-check's ample grounds for hesitation; for all it knows, I may be trying to type mosses, moss, Moses, muses, moseys, modes, musses, muss, or mossy! only when the spell-checker's algorithms develop a much higher degree of certainty about what I am trying to say would it dare to 'auto-correct' me.
the instability I thought I had observed simply reflected my having inadvertently toggled off the auto-correct feature for certain words whenever, in the course of my research, I used the backspace key in a certain manner. (even marks wasn't sure why the auto-correct wasn't resuming for those words when I rebooted, as it was supposed to.)
as for the miraculous recognition of Kanchanalak, marks explained that microsoft is always updating the spell-check lexicon to keep up with words that are in common current usage. Kanchanalak had been in the news in about 2000, when the lexicon for my 2002 version of microsoft word was being compiled. the surname might not be recognized by earlier lexicons--or even later ones. though the lexicon is continually revised, it is not continually expanded. rather, it is maintained at a ceiling of about 200,000 words. if it becomes too inclusive--recognizing obscure words rather than interpreting them as likely mistypings--it becomes less useful for the majority of users.
similarly, the seemingly willy-nilly capitalization of drug brand names was determined by the popularity of those brands at the time my lexicon was being compiled, together with the usual issues posed by the resemblance of the brand name to other possibly intended words. microsoft certainly doesn't ask companies to pay for capitalization, marks noted, taking no offense.
and so it was that in the space of about ten minutes, marks righted my orthographically toppling world. overarching, benevolent algorithms brought harmony and meaning to it all.
except, of course, the part about how to toggle the auto-correct back on for stephen and zoloft and microsoft. but marks said he'd have a tech guy get back to me on that.