Unheard Voices
Exploring the Complex Relations Between Indigenous Peoples and Settlers in American Colonialism
Brief
A Computational Textual Analysis of 18th & 19th-Century Native American Understandings of Identity
Client
DH199
Online Statistical Computing Reference (OSCR)
Duration
Winter 2020
10 weeks
Roles
Textual Analysis
Word Frequency & Collocation Analysis
Team: Eustina Kim, Michelle Lee, Priyana Patel, Vicki Truong
Project Overview
The Corpus
American State Papers (1789 to 1838)
A collection of legislative and executive documents from Congress. The analysis used documents categorized as “Indian Affairs.”
Treaty Council Notes (1784 to 1814)
The United States and Native Americans used peace treaties to foster mutual respect and discuss land acquisition.
Research Questions
How do Native leaders talk about being “Indian” and notions of difference?
What are the dominant themes in the corpus, and which documents are strongly correlated with the themes?
Text Preprocessing
Using the Natural Language Toolkit Library to Clean and Prepare the Documents
First, we tokenized the corpus, which breaks up a document into words known as tokens. We then converted these tokens to lowercase to ensure the analysis is not case-sensitive. We also omitted non-alphabetical tokens and removed stop words. Stop words are commonly used words that don’t add much significance to our research, such as “the,” “and,” “is,” etc.
Word Frequency
Word Frequency Quantifies the Frequency and Occurrences of Particular Words Across a Corpus
Using the wordcloud function in Python, we can visualize the most commonly used words throughout the documents. Based on the 50 most frequently occurring terms within the corpus, I broke down the results by pronoun for “i,” “we,” and “you.”
Collocation Analysis
Collocation Analysis Is the Process of Examining Statistically Significant Word Pairings
I first computed the 15 most frequently occurring bigrams (two adjacent terms) and trigrams (three adjacent terms) to understand common themes between speaking parties. To better understand differences by speaker, I created an ngram filter for “i,” “we,” and “you.”
Topic Modeling
Topic Modeling Is a Type of Statistical Modeling for Discovering Abstract Topics (Gensim & PyLDAvis)
To determine how many topics, we compared coherence scores, which measure the degree of semantic similarity between high-scoring words within the topic. We created our LDA model to obtain six groups of relevant terms and then labeled each group.
Findings
Word Frequency
‘i’ was the most frequently occurring pronoun, and ‘you’ was the least.
Collocation Analysis
The highlighted examples indicate an effort to create distance, underscoring differences between the speaker and the person/people they address.
Topic Modeling
Topic 2 has the most significant proportion and the greatest number of documents most strongly related to it. Topic 5 stands out with relevant terms that are more community-based in nature.
Discussion
There’s an Ongoing Issue With History Being Presented From a Euro-American Perspective
Our research reveals the hierarchies of power within Native American communities in contrast to the American government. Native Americans communicated using familial language (father, brother) and unifying pronouns (we, us) to speak on behalf of their communities. The topic labels reveal a contested relationship between the Euro- and Native-Americans, especially over land and treaties. The textual analysis emphasizes the dynamics between these two groups of people in meetings, times of war, and land negotiations.
A Precursor for Gender Roles in Native American Communities
With a better understanding of the familial power dynamic in Native communities, we can take a more in-depth look at how gender defined communal roles. Men took the responsibility as decision-makers: they attended council meetings and acted as their tribe’s representative, making decisions about trade, land, and war. These roles support the male-dominated terms and significant themes within our findings. Future research can answer how power and control translate across duties and responsibilities within tribes.