Assignment 2 Working With A Corpus
Part I: Picking A Corpus - Jane Austen
I chose Jane Austen’s books because they are available on Project Gutenberg, and I enjoy the film adaptations of her writing. Keira Knightley in Pride and Prejudice, Kate Winslet in Sense and Sensibility, as well as Dakota Johnson in Persuasion are among my all time favourite text adapted film characters. I can use the films as a base for comparison to what I find in the books through the analysis tools, which can give me a richer understanding of how adaptations work. Jane Austen also submitted her works under an alias male name because of the time period’s distaste for female writers. I wanted to see whether that could have possibly affected her way of writing. Although I am aware that with only Austen’s books I am unable to grasp the differences in feminine versus masculine writing trends, I might be able to extract surface level topics or subjects Austen discusses that pertain to feminine matters.
Jane Austen (1775-1817) was an English novelist who wrote during a time when female writers faced challenges and were often looked down upon in the literary world. Austen’s novels often explored the social and economic aspects of her time, focusing on the lives and relationships of the British landed gentry. While her novels were published anonymously, she gained recognition posthumously, and her works have since become classics.
Part II: General Corpus Research
Pride and Prejudice (1813):
Book Reference: 42671
Pride and Prejudice follows the story of Elizabeth Bennet, one of five sisters, as she navigates the social hierarchy and expectations of marriage in early 19th-century England. The novel explores themes of love, class, and personal growth, and it is particularly known for its portrayal of the complex relationship between Elizabeth and Mr. Darcy.
Sense and Sensibility (1811):
Book Reference: 21839
Sense and Sensibility revolves around the Dashwood sisters, Elinor and Marianne, who face financial difficulties after their father’s death. The novel contrasts Elinor’s practical and reserved demeanour (sense) with Marianne’s passionate and emotional nature (sensibility). The story explores the challenges and rewards of balancing reason and emotion in matters of love and societal expectations.
Persuasion (1818):
Book Reference: 105
Persuasion tells the story of Anne Elliot, who, persuaded by others, broke off an engagement with Captain Frederick Wentworth years earlier. The novel explores themes of second chances and the consequences of societal expectations. As Anne and Captain Wentworth cross paths again, they must confront their past and navigate the complexities of love and social status.
Part III: Corpus Analysis
With the analysis on R, the most frequently used words in all three texts seem to mainly consist of character names. However, as we page through the list, we can extract what the main subject of Austen’s texts revolve around. Austen writes mostly about feelings, courtship, and family dynamics. Words like pleasure, happiness, comfort, and heart make it to the list. While the rest mainly consist of characters. There seems to also be a focus on female characters throughout all the texts. When compared to the films, it does stand true that most of Jane Austen’s work is about women and their dynamics either in family or in love. She explores topics of sisterhood in all her texts and that is showcased in the analysis. What I can see through the analysis of the MFW on R is that elizabeth is the most common word across all three texts. While looking deeper into the individual plots of each story, it does make sense since Austen chooses to name two of her main characters in different books with that name.
Moving on to the linear analysis of the words, I chose to compare both texts to Pride and Prejudice. “Affections” and “Acquainted” appear to be some of the most common words to overlap between the texts. Again going back to the main plots, this makes sense since most of Austen’s writing revolves around courtship, affection, and acquaintance. Relationships is a major theme in all three texts. However, an interesting finding in the linear analysis is that family does not appear on the plot between Pride and Prejudice and Persuasion. In looking at the plots of both books further, Pride and Prejudice heavily focuses on family, whereas Persuasion looks into the life of Anne Elliot and her broken engagement. We see the rise of the word family between Sense and Sensibility and Pride and Prejudice because both focus on the dynamics of courtship within the sphere of sisterhood and close family. The name Elizabeth also becomes an overlapping name across the latter books as we discussed previously with the MFW analysis.
Overall, through the analysis on R, Austen’s themes are well represented through a linear distant analysis. Austen’s writing does appear feminine in the sense that it focuses on emotions and women more than it would actions and movement. It does make me wonder whether her feminine writing under a masculine pen name propelled her books in her society. Could a man writing about emotions have an effect on women wanting to read more about it? Would women be more inclined to pick up the books to understand how men see them and the interplay of emotions of the opposite sex? The books cover romance and family, so it does beg the question of feminine versus masculine writing. It would be interesting to take Austen’s book and compare them further against a male writer of the same time period to compare the prevalence of emotional adjectives across both.
Voyant Tools is a web-based text analysis tool that allows users to visualise and analyse textual data. It provides various insights and patterns within a text corpus. We previously concluded that Jane Austen tends to focus on the dynamics of love and family relationships in her stories. As a result, we decided to compare the dynamics of common words related to this theme by examining their “Bubbliness” on Voyant Tools across the entire text. We compiled a set of words for this analysis, including pleasure, happiness, heart, love, family, and affection.
It’s crucial to note that some words may have multiple definitions, and their interpretation depends on the context in which they are used. This addresses the limitations of the tools employed in this assignment. While Voyant Tools offers collocation tools, it doesn’t account for the nuanced meanings of words. A potential enhancement to our system in the future would involve leveraging advanced language models such as GPT-4 for a more in-depth analysis of words within their context. Like unique words used in each of the text and opposite we could see similar words and patterns.
Persuasion (1818):
Sense and Sensibility (1811):
Pride and Prejudice (1813):
Based on the analysis of the above three texts, we can identify some common patterns based on the frequency of certain words. The bubbliness, or the frequency distribution, of Pride and Prejudice and Sense and Sensibility texts appears to be quite similar, with corresponding patterns observed in various words such as “happy” and “family.” However, when examining Persuasion, we observe distinct patterns and frequencies, indicating differences in style, themes, and topics. Persuasion, for instance, delves into themes of regret, second chances, and societal expectations. In contrast, Sense and Sensibility explores the interplay between reason and emotion, while Pride and Prejudice is renowned for its exploration of love, social class, and personal growth. Our inspiration for this analysis comes from applications like R and Voyant Tools, which aid in uncovering relationships and patterns among texts, leading to deeper insights and conclusions. Notably, we were surprised by how rapidly we could identify patterns and detect stylistic outliers within Jane Austen’s various texts.
In conclusion, it is crucial to conduct a comprehensive analysis when working with a corpus. Leveraging computer technologies, such as R and Voyant Tools, allows us to discern clearer patterns and statistical data, ultimately leading to a better understanding of the textual data.