Lehrstuhl für Englische Sprachwissenschaft

    Close Up: Forensic Linguistics and Corpora

    What is Forensic Linguistics?

    Before we start with a close up on the topic “Forensic Linguistics and Corpora”, we first have to define what forensic linguistics is.

    One way to look at it could be that, as soon as a text or message plays a role in a legal context, it is a forensic text. This means that for example a letter or telephone call becomes a forensic text or audio file, as soon as it is connected to a criminal act. Objects like bills or even parking tickets that seem to contain not much linguistic information can become forensic texts, since they are legal texts (Olsson 2008: 1).

    Forensic linguistics is the application of linguistic knowledge in legal matters and deals with any use of language in connection with the law (Olson 2008: 3).

    In which fields are forensic linguists active?

    The areas forensic linguists cover are amongst other areas:

    (a) Forensic linguists give evidence in court:  the forensic linguist may be an expert witness and gives an objective scientific opinion on the respective piece of evidence  (Olsson 2008, 13-14) e.g. concerning authorship attribution of a ransom note or the adequacy of consumer product warnings. They may also assess whether a text was plagiarised (Cotterill 2010: 578) orwhether the voice of a suspect on an answering machine can tell us something about where it comes from, or if his/her accent is fake rather than real.  

    (b) The forensic linguist describes the language of the law.

    How are corpora used in Forensic linguistics?

    To answer that question, we have to distinguish between two types of corpora. The first category are pre-existing general corpora that offer a vast amount of linguistic information and are representative of a certain language use e.g. everyday language, academic language etc. These pre-existing corpora are for instance used in cases of authorship attribution where salient language excerpts of the text in question are compared to pre-existing corpora like the BoE (Bank of English) (Cotterill 2010, 580f).  By doing so, rare grammatical phenomena or the usage of unusual words or word combinations may be identified as idiosyncratic usage that may point to one specific author, as shown in the sample case Letters to an actress.

    The other way corpora are used in forensic linguistics can once again be illustrated with an example from the field of authorship attribution. The authorities already have a suspect  in custody and consult a forensic linguist whether a certain text e.g. a letter was written by the suspect. The forensic linguist then creates a corpus out of all texts that can be contributed to the author like e-mails, diary entries, comments on social platforms, papers, etc. provided by the police. The next step is then again the comparison of the created corpus and the text in question to determine whether the texts are from the same author or written by different individuals.
    However, corpora in forensic linguistics are not only used for authorship attribution. A further area in forensic linguistics where corpora are used is language of the law and the legal discourse in general. Here, large corpora provide enough data to help to find patterns etc. in the legal language.

    Two case studies

    The Bentley case (as related by Coulthard 2006)

    One of the probably most widely known cases in which forensic linguistics played a role is the Bentley case. Derek Bentley, a 19-year old with developmental disorders, was incriminated with the murder of a policeman during a robbery in 1952 and hanged for the alleged crime in 1953. Due to protests from its family the case was re-opened in 1993. Part of the investigation team of the re-opened case was forensic linguist Malcolm Coulthard. The main evidence that was used against Bentley was a statement he was supposed to have made after he was arrested. However, Coulthard claimed that the statement had been tampered with by the policemen involved in the case. Under oath, the policemen had insisted at the time that the statement was the verbatim representation of a monologue by Bentley without interference from the police. As a result, Bentley had been found guilty and hence hanged.

    A legitimate statement?

    Forensic linguist Coulthard re-examined the statement to see whether it was made by Bentley himself without any interference from the authorities. While analysing, Coulthard found several rather specific lexical items and phraseological units that are more likely to be used by a policeman than a lay person, especially if this person has a low level of verbal communication skills as was the case with Bentley. In addition, Coulthard found several phrases with which Bentley denied various events. In a narrative, a speaker usually does not tell what did NOT happen but what did happen. This hinted at a questioning from the authorities rather than a monologue. 

    Further proof that Bentley did not phrase this statement entirely by himself was found in the use of then. Then is often used to describe continuing processes or events, usually in initial position then I when the speaker is referring to him- or herself. In the statement by Bentley however, then was usually found to appear after I: I then …  This combination is by far more rarely used - except in the language of policemen. Coulthard was sure that the authorities had interfered with the statement and at least partly had been the authors of the document. However, to be valid in front of a court, the sole intuition by an expert is not enough. To support his findings, Coulthard used corpora that were created for this case. One of the corpora consisted of witness statements and the other one consisted of statements of policemen. In the following step, Coulthard compared the frequencies of the use of then before and after I across the two corpora. What he found is that I then is used roughly ten times more often in the policemen-corpus than in the witness-corpus. This was eventually proof enough to disintegrate the validity of the statement that was used against Bentley and led to him being hanged. Bentley was posthumously pardoned.


    Letters to an actress (as related by Hyland et al. 2012)

    In the sample case, several anonymous emails and letters had been written to an actress, claiming that she was being observed by the government. At that time, the actress was involved in a criminal investigation. When investigating the letters sent to the actress, the police had the suspicion that she had written them herself to distract the police from the other case she was involved in. A forensic linguist was hired to scrutinise the letters.

    Who were the authors?

    The first step was to determine whether the anonymous texts had been written by different authors or if it was the same author all along. To investigate that, a forensic linguist analysed the anonymous letters, looking for salient features in the writing style. The first feature the linguist discovered was a case of cohesion among the anonymous letters, in various letters, the author referred to utterances used in pervious letters, which indicates that these letters are likely to have been written by the same author. Further, among the texts the phrase ‘your every move is being monitored’ reoccurred. The linguist did a check-up on the phrase in different general corpora and various search engines and detected that its usage is quite rare. Hence, chances that different authors use this a rarely formulated phrase were rather low. This again suggests a common author for the anonymous texts. Another feature the linguist identified as odd was the repetitive use of trace in connection with car. The linguist again searched in several general corpora for the use of trace in connection with car and found it to be exceptionally scarce. What seemed odd here was that trace usually is used on objects that are currently missing and looked for. In this case, the author most probably intended the term track. Track is used when movements of objects are being followed, which, in all probability, was the intended meaning of the author.

    Was the actress the author all along?

    The last question to answer was whether the actress had written the letters herself in order to distract the police from the main investigation that was going on at that time. To settle this, known letters written by the actress about the anonymous letters were used as forensic material. Eventually, the minor meaning confusion of track and trace was again found in the letters written by the actress. This was the evidence that determined that all the letters were in fact written by the actress herself. Apart from the trace – track feature, other rather rarely used expressions were found across the material, solidifying the previous findings of the forensic linguist.

    Further Aspects

    What can I do with it?

    As you can see, corpora offer a powerful basis for a variety of criminal investigations. If this chapter got you interested in corpus linguistics, you might want to try out for yourself whether you have the potential to be a forensic linguist. Just go ahead and ask a friend to provide you with three texts of which two are by the same author without letting you know the authors names. By using the methods you have just read about you can now go ahead and try to figure out which of the texts belong to the same author. This could also be a good opportunity for you to practice the use of corpora and corpus analytical tools and hence be helpful for your further studies, in which corpora in all probability will be a part.

    Problems and challenges

    Whatever the advantages of using forensic linguistics in criminal investigation are, it also has its limits. A few of these issues are examined below.


    A big issue the field of forensic linguistics has to face is its validity in court. Since the methodology of presenting corpus-linguistic evidence in court is rather new, it is often not regarded as reliable by lawyers. This issue is self-evident since most people, judges and authorities included, are not familiar yet with forensic linguistics. This makes it harder for linguists to present and support their findings in the court room. This issue of "unknowingness" of the topic is even more crucial in the American system in which the judgement is passed with the help of a jury of regular citizens. Presenting evidence with the help of unknown methods might rather irritate the jury than convince it, causing them to disregard the facts yielded from the linguistic analysis.

    Sample size

    Another challenge forensic linguists have to face is the size of the provided material. Often, the text in question only consists of a few lines which makes it harder to discern idiosyncratic language use. It obviously further lowers the chances of finding grammatical irregularities or rarely used terms and expressions, especially in times where short text messages and twitter messages are common modes of written communication.

    The future of forensic linguistics

    Experts are working on creating corpora that are especially made for forensic linguistic analyses, e.g. a corpus of suicide notes, and on improving the methods and results. With the help of further advancing technologies and software the work done by forensic linguists will become more accurate and reliable in near future.

    Hopefully, authorities as well as the broad public will become more familiar with the topic and the benefits of forensic linguistics as far as criminal investigations and linguistic evidence in front of the court are concerned.

    Forensic Linguistics adds a fairly new method to enforce the law among its applications. Especially in times where terroristic threats, ransom notes and such things have become part of our everyday life, forensic linguists can assist in analysing these threats and finding the responsible person or group behind these threats.