Skip to Main Content

Text Analysis

Using text analysis you can create word clouds, do proximity searches, and show frequency of a word across data.

"R" logo

R is a statistical programming language that has recently seen increased used in text analysis. RStudio is the Integrated Development Environment (IDE) for working on R projects.

Why R?

R's powerful data analysis tools thrive on what is known as "unstructured data," and from a data perspective, narrative writing is so much "unstructured text." R has two packages designed specifically for unstructured text: Sentiment (now archived, but still usable) and TM. In addition to these two packages, text analysts and digital humanists continue to discover novel and exciting ways to use R's powerhouse of data tools for text analysis:

  • Assess token distribution to see where words appear across a text
  • Analyze vocabulary richness with word frequenciestype-token ratios, and hapax singles
  • Parse XML to work with TEI
  • Use the mallet package for topic modeling
  • Demonstrate your results with visualization tools such as wordcloudsdistribution plots, and text mining models

Text Analysis Resources

R Programming

For general questions about programming in R, the R-Bloggers community is the go-to resource. StackOverflow is another helpful troubleshooting community, and you may find yourself asking questions of your own as you begin new projects.

The Florida State University Libraries

© 2022 Florida State University Libraries | 116 Honors Way | Tallahassee, FL 32306 | (850) 644-2706