Research Guides: Text Analysis: Definitions

What is Text Analysis?

Definition of Text Analysis:

Text analysis is the process of sorting and analyzing data contained in text for research purposes.

Text mining entails cleaning, marking up, organizing, and parsing content of a corpus.

By using digital text analysis tools, we can easily search and examine word frequencies, patterns, and relationships.

Source: Introduction to Text Mining Presentation by Mitch Fraas and Katie Rawson (2013)

Commonly Used Terms:

APIs (Application Programming Interfaces): Written by the owners of the content to give a clean, machine readable version of the content. Many databases or websites with large amounts of data will make their APIs available for people to reuse the data. For instance, Twitter has a public API.

Corpus/Corpora: A corpus (text) is a collection of documents, e.g. web pages, journal articles.

Crawling: A method used to automatically find links within a website, going to those links and scraping the information from those links.

Parsing: Refers to the process of (syntactic) analysis of text, i.e. identifying how a sentence follows the grammatical rules of a language. It breaks down a unit/sentence into its component parts. You can also parse files into their component parts.

Scraping: Scraping information from a website is similar to manually going to a website and highlighting and copying that information and pasting it somewhere else.

Text (and data) mining: Text mining is the data analysis of natural language works, such as articles and books, using text as a form of data. It is often joined with data mining, the numeric analysis of data works, like filings and reports, and referred to as "text and data mining" or, simply, "TDM."

Source: Elsevier's Text & Data Mining Glossary

List of Free Digitized Texts:

List of Text Analysis (TA) & Data Visualization (DV) Tools:

DiRT Directory: Text Mining
A list featuring a variety of text mining tools with brief descriptions of each.
Google Books Ngram Viewer
Ngram produces graphs which display word frequencies over time.
Voyant Tools
Voyant is a free web-based tool for text analysis and visualization.
Word Counter
Word Counter displays most common words and phrases used.
Wordle
Wordle generates "word clouds" from text you provide.

Examples of How Text Analysis is Used:

How to Do Super Simple Textual Analysis by Aleszu Bajak
A Storybench staff member follows an example of an in-class exercise used by Derek Willis in his data reporting class at Georgetown University.
Matthew Jockers' Blog
Matthew Jockers' outlines various text analysis methods in his blog. His book "Text Analysis with R for Students of Literature" is a good overview of a specific tool for text analysis
Sapping Attention
Ben Schmidt's blog on text analysis, digital humanities, and 19th century literature.
The Stone and the Shell
This is Ted Underwood's blog. It includes descriptions of text analysis methods and discussions of text analysis as a methodology. "We don't already understand the broad outlines of literary history" is a good jumping-off point.
Wine Dark Sea
Wine Dark Sea is a blog that includes analysis and visualization of Early Modern texts. See, in particular, the categories "Quant Theory," and "Visualizing English Print."

The text analysis process involves:

Acquiring a corpus

Preparing the text

Choosing an analytical tool

Analyzing the results