Text Analysis

Using text analysis you can create word clouds, do proximity searches, and show frequency of a word across data.

What is Text Analysis?

Definition of Text Analysis:

Text analysis is the process of sorting and analyzing data contained in text for research purposes.

Text mining entails cleaning, marking up, organizing, and parsing content of a corpus.

By using digital text analysis tools, we can easily search and examine word frequencies, patterns, and relationships.

Source: Introduction to Text Mining Presentation by Mitch Fraas and Katie Rawson (2013)

Commonly Used Terms:

APIs (Application Programming Interfaces): Written by the owners of the content to give a clean, machine readable version of the content. Many databases or websites with large amounts of data will make their APIs available for people to reuse the data. For instance, Twitter has a public API.

Corpus/Corpora: A corpus (text) is a collection of documents, e.g. web pages, journal articles.

Crawling: A method used to automatically find links within a website, going to those links and scraping the information from those links.

Parsing: Refers to the process of (syntactic) analysis of text, i.e. identifying how a sentence follows the grammatical rules of a language. It breaks down a unit/sentence into its component parts. You can also parse files into their component parts.

Scraping: Scraping information from a website is similar to manually going to a website and highlighting and copying that information and pasting it somewhere else.

Text (and data) mining: Text mining is the data analysis of natural language works, such as articles and books, using text as a form of data. It is often joined with data mining, the numeric analysis of data works, like filings and reports, and referred to as "text and data mining" or, simply, "TDM."

Source: Elsevier's Text & Data Mining Glossary

List of Text Analysis (TA) & Data Visualization (DV) Tools:

Examples of How Text Analysis is Used:


The text analysis process involves:​

Acquiring a corpus

Preparing the text

Choosing an analytical tool

Analyzing the results

