Heard the latest about forensic linguistics?

It’s not often that forensic linguistics makes the news. It’s not nearly as sexy or yucky as the forensics that originates in the pathologist’s lab or at the murder site. There’s actually a scientific book, called Men, Murder, and Maggots, that tells you how to determine when someone was killed, on the basis of the type of parasites that are now feasting on the corpse.

 

Compared to this, nailing someone by their writing habits is not great TV. But it does happen, most recently in the case of the authorship of e-mails alleged to have been written by Mark Zuckerberg, the subject of  a New York Times article (7/24/11) at http://www.nytimes.com/2011/07/24/opinion/sunday/24gray.html.  Apparently someone is trying to claim the right to half of the Facebook fortune, on the basis of e-mails supposedly written by Zuckerberg.

 

It just goes to prove that Zuckerberg and Facebook are truly endless sources of machination and intrigue…and that they spread out, in an evil, endless spiral, involving every person, activity, and discipline on earth. Forensic linguistics was bound to get involved somehow.

 

The consulting linguist, big gun Gerald McMenamin, emeritus professor of linguistics at California State University, Fresno, studied the e-mails, found 11 different style markers, across punctuation spelling and grammar, and opined that indeed Zuckerberg could not have been the author.

 

Once again, heavy-hitting forensic linguists weigh in with their opinions. The splendidly credentialed Ronald Butters questions whether the (non-)capitalization of Internet has any value, given that there are only three examples (yet he himself practices this very kind of qualitative analysis).

 

The debate flares up again: does one have a written idiolect? No one thinks he/she has a writing style, the fact is that we all do (except me and other forensic linguists, of course).  All your individual writing practices, all those choices made on the fly, do have a pattern.

 

At the very simplest, you chose to do one thing, but not the other, many times each sentence: to use one word but not another to express a particular meaning, to write dates and numbers in a certain way…and so on. You can see why the number of features can be very large and why even as few as 11 significant differences could point to the answer to questions of authorship.

 

So the answer to the “linguistic fingerprint/DNA” question is yes and no (yes, if the quantity and quality of the data are sufficient). People are largely unaware that they have a writing style – and of how many decisions are made on the fly. Indeed, enough of these decisions (even allowing for intra-writer variants), can define an individual’s writing style; the profiler just needs enough information.

 

The circumstances for definitive judgment must be right. There must be enough data. Carole Chaski is right when she says there are hundreds of features. A text can contain enough of them, in sufficiently similar patterns, to identify a writer. Quantity of data is important. A couple of examples can only point to likelihood of authorship.

 

But the rarity/peculiarity of the style-markers (something the article didn’t mention) is important – I’ve made authorship judgment calls on the basis of just one marker, if it’s sufficiently rare or idiosyncratic.  One writer repeatedly used “puke” as noun, verb and adjective; it appeared in everything he wrote. Another writer used a grammatical construction which occurs in British and Canadian (but not American) English.

 

How about computers? Don’t they count things really well? Sure, but computers can never acquire anything resembling a human being’s intuitions about language use. When you consider about human beings who have spent their lives studying language, you’re talking about a vast corpus of observed data, and the resultant ability to identify and weigh the qualitatively interesting, as noted just above, as opposed to just mindlessly counting things.

 

There are decisions that may differentiate one writer from another, broadly speaking. The one proposed by Cheski makes no sense at all to me. It says that your writing style is defined by where the main noun occurs in its phrase, at beginning, middle, or end. With all due respect to my erudite colleague, I fail to see how this represents any writer’s behavior, any choice, any defining marker.  The only factor that appears to govern the position of the main noun in English is emphasis, which pushes nouns to the beginning or the end, depending on the intended meaning.

 

There are broadly defined choices, however, of which the writer is unaware. One that I like to use is the writer’s preferred method for linking clauses, whether by starting a new sentence, or by coordination (and, but, etc.) subordination (that/which or adverb plus clause), or some combination of the three. This decision on how to end a rather large amount of information and begin another is, again, made on the fly, and writers tend to stick to the same pattern.

 

The Times article closes with the hope that technology will save us from the bumblings of the human brain. I maintain that there is no technology nearly equal to the sublime powers of human brain.