The Quantitative Big Picture

Computational approaches to authorial style and attribution are predominantly quantitative: the various techniques rely on the ability of computers to identify patterns in large datasets (i.e. frequencies of words across the plays of multiple authors), and for these patterns to align with the authorial style of one author or another. In our investigation of Behn’s authorial style, Alan Hogarth and I have been using various exploratory testing methods (Zeta, T-tests, Principal Components Analysis) to establish how distinct Behn is as a dramatic writer when compared to her contemporaries (other genres will follow shortly). The results suggest that Behn does, indeed, have a linguistic profile that stands in contrast to her contemporaries. Whilst a future blog post will talk more about these high-level patterns and comparisons, the present discussion will introduce some of the specific words that characterise Behn’s authorial style, on the basis of their contrastive frequencies in her works versus those of contemporary playwrights. This work introduces a qualitative dimension to the investigation, with the objective being to better understand how and why some words appear as marker-words for one author, rather than another.

Figure 1: PCA results for 33 Restoration comedies using 100 most frequent function words

Figure 1 shows a Principal Components Analysis of the 100-most frequent function words in a corpus of 33 comedies written by five Restoration dramatists, including Behn. This statistical sorting hat organises the dependent variables (the plays) according to the most dominant continuities or trends of the results for the independent variables (the words). The right-hand-side of the line highlights the area when most of Behn’s comedies cluster, largely distinct from the comedies of Dryden, D’Urfey, Shadwell and Ravenscroft. The plays of D’Urfey (green crosses) intersect with some of Behn’s works, suggesting stylistic similarities; the plays of Dryden and Ravenscroft, by comparison, are much more distinct. Looking at the linguistic features that underlie this distribution of plays helps to explain these points of overlap between authors.

Figure 2 shows the same distribution of plays, now with the words underlying their organisation. Within the Behn area of the chart, we find second-person pronoun forms (thou, thee, thy), connectives (and, so) and interjections (oh, ha). The rest of this post will discuss the functionality and distribution of interjections in Behn’s plays, compared with those of her contemporaries, and consider what this information may suggest about her approach to dramatic writing.


Figure 2: PCA results for 33 Restoration comedies using 100 most frequent function words, showing distributional relationship between word frequencies and play texts.


Interjections are powerful small words. They prototypically signal the emotional attitude of the speaker towards their situation, and this affective function has implications for their use in dramatic language. They can be grouped into two types: Primary forms, whose function is solely as an interjection and generally have an atypical phonological and graphological structure (e.g. ugh) and Secondary forms, which are words transferred from other word classes, including euphemistic and taboo words (e.g. damn, blimey).

Interjections have been discussed by various scholars of the history of the language. Corpus-based studies show an increase in forms and frequencies of use in drama and other literary genres over the early modern period. This is thought to reflect a cultural shift towards the subjective expression of emotion, which was conveniently encapsulated through the use of interjection forms (see the work of Irma Taavitsainen (1995, 1997) and Culpeper and Kytö (2010).

Whilst interjections can convey a speaker’s state of mind (e.g. oh may signal a speaker’s shock and surprise), they also have other pragmatic meanings. Interjections can be conative; that is, they instigate a hearer’s reaction (verbal or behavioural) to the speaker’s utterance. They can also organise the discourse, signalling the end of a turn, for example.

In the PCA results, Behn’s authorial style was differentiated from that of the four comparison authors on the basis of the frequencies of oh and ha.  These interjections are the two most-frequent forms in the comedy corpus as a whole (i.e. the most common forms in the 33 comedies combined). So why are they particularly significant in Behn’s plays? Is she using them differently when compared with the four male dramatists?


Oh has an undisputed position as the top interjection in the English language. It is the most frequent form in the extant literary canon for the early modern period, and retains this position in corpus-based analyses of present-day English (for discussion of these distributional trends, see Taavitsainen 1995; Norrick 2011). In Behn’s drama it has three main functions:

  • speaker-oriented: to signal primarily negative emotions and comprehension
  • addressee-oriented: to attract attention and direct a proposition, often combined with a vocative e.g. ‘oh Alphonso’
  • audience-oriented: to introduce an expository frame that licenses the narration of events on stage e.g. ‘oh I am weary of this’

These functions are comparable with previous surveys of early modern English drama (e.g. Culpeper and Kytö 2010). Indeed, Behn’s use of oh does not appear particularly distinctive when compared to her contemporaries: statistically, the frequency of oh in Behn’s comedies is not significantly different to that of the four male writers (Student’s t-test, p > 0.05). This suggests that, quantitatively, oh is a relatively consistent feature across Restoration comedy, which fits its status as the prototypical interjection in English. Its placement in the PCA chart therefore suggests that these pragmatic functions intersect with other stylistic choices associated with the 100 most frequent function words, such as the second-person pronouns and connectives, and that it is this co-textual environment which distinguishes oh in Behn’s linguistic style from that of other authors. We plan on looking further into this feature using corpus linguistic techniques, such as collocate analysis.


Ha is the second most frequent interjection in the 33 comedy corpus overall. In previous studies of early modern texts (e.g. Culpeper and Kytö 2010), ha shows a similar functional range as oh, used for speaker, addressee and textual-oriented functions, including the signalling of laughter in play texts. However, in the comedy corpus, ha shows significant differences in its distribution and use at an authorial level, particularly when compared with oh. Figure 3 shows the frequency of ha for each play for each author. Of these, D’Urfey uses ha more frequently than other authors (p > 0.05). Behn’s usage contrasts significantly with Dryden’s, as does Shadwell’s. Therefore, the frequency profile for ha suggests that D’Urfey is a high-frequency user; Behn, Shadwell and Ravenscroft are mid-frequency users; and Dryden uses ha the least.

Figure 3: Frequency of HA in comedies of Behn and four contemporary writers.

This quantitative picture is very different from that of oh. A qualitative perspective offers some light on what underpins these frequencies. The five plays representing Dryden include only two examples of ha, both of which occur in his adaptation of Molière’s Amphitryon (1668; Dryden 1690). Both are used for a conative function to elicit a response from the hearer, found in heated exchanges discussing sexual misdemeanours (arising from comic misunderstandings):

Amphitryon: Made haste to Bed: Ha, was it not so? Go on –  (Aside) And stab me with each Syllable thou speak’st (p.28)

Yet it seems that, for Dryden, the other functions associated with ha in the period were achieved through other methods, or were not required.

Conversely, the high frequency of ha in D’Urfey’s plays correlates with a broader functional base. The interjection is used for addressee-oriented functions, similar to Dryden, but also for speaker-oriented expression, such as signalling frustration. Its most prominent function in D’Urfey’s plays, however, is to signal laughter. The forms occur in strings of three to five forms (five being the longest strings of the five authors). In the next stage of analysis, these laughter strings will be categorised as form distinct to other uses of ha, to see how that effects the distribution of the interjection as a marker of authorial style.

Behn’s profile is similar to D’Urfey’s in her use of ha for speaker-oriented functions (e.g. comprehension, perception) and to mark laughter (strings of three and four forms). However, Behn does not make extensive use of the addressee-oriented function seen in the plays of Dryden and D’Urfey, suggesting her repertoire did not include, or require, the interjection for this purpose. This finding has a potentially diagnostic application when investigating her dramatic dubia.

Valeria: Ha, ha, ha – I laugh to think how thou art fitted with a Lover, a fellow that I warrant loves every new Face he sees. (The Rover, p.30)


Interjections and Authorial Style

The functional similarities and differences identified in relation to oh and ha provides an important perspective on the identification of these items as marker words in the quantitative analysis of Behn’s dramatic style, and that of her contemporaries. In this post, I’ve discussed only a few examples of the rich dataset that represents the interjectional forms and functions found in the comedy corpus. In the next steps, we will develop these findings to help inform our investigation of the dubia plays; in particular, the forms and functions of interjections (especially oh) provides an instructive perspective on the suspected editorial interference of Charles Gildon within Behn’s posthumously published play The Younger Brother (1696). We also plan on exploring interjections across genres, to establish the extent to which authors show continuities in their stylistic choices when working within different generic conventions, and the ramifications this may have on investigating authorship in mixed-genre datasets.


  • Culpeper, Jonathan, and Merja Kytö. 2010. Early Modern English Dialogues: Spoken Interaction as Writing. Studies in English Language. Cambridge, UK ; New York: Cambridge University Press.
  • Norrick, Neal R. 2009. ‘Interjections as Pragmatic Markers’. Journal of Pragmatics 41 (5): 866–91.
  • Taavitsainen, Irma. 1995. ‘Interjections in Early Modern English: From Imitation of Spoken to Conventions of Written Language’. In Historical Pragmatics: Pragmatic Developments in the History of English, edited by Andreas H Jucker, 439–66. Amsterdam; Philadelphia: John Benjamins Publishing Company.
  • ———. 1997. ‘Genre Conventions: Personal Affect in Fiction and Non-Fiction in Early Modern English’. In English in Transition: Corpus-Based Studies in Linguistic Variation and Genre Styles, edited by Matti Rissanen, Merja Kytö, and Kirsi Heikkonen. Topics in English Linguistics 23. Berlin ; New York: Mouton de Gruyter.