Text Mining and Self Organizing Maps: Visualizing Shakespeare

After my previous exploration of Self Organizing Maps, I decided to use the tool for an application of text mining: Can we visualize how Shakespeare’s characters and plays are similar or different from each other based on an analysis of their words?

This tutorial walks through a couple examples using R and suggests some further exploration. It’s split into two sequential parts:

Self Organizing Maps and Text Mining – Visualizing Shakespeare (Part 1)

Self Organizing Maps and Text Mining – Visualizing Shakespeare (Part 2)

 

Introduction to Self Organizing Maps in R

This semester I’ve been playing around with Self Organizing Maps (SOMs) using the “kohonen” package in R. SOMs allow you to visualize very high dimensional data in a simplified two dimensional map which preserves proximity. I’ve written up an introductory tutorial on getting started making SOMs using the kohonen package:

https://clarkdatalabs.github.io/soms/SOM_NBA

This workshop plays around with NBA player stats from the 2015/2016 season. Disclaimer: I know next to nothing about basketball.

Self Organizing Map depicting NBA Player Position Predictions

If you like this post, keep an eye out for the next one. In the next month I’ll put out a tutorial on using SOMs to visualize the text-mined works of Shakespeare. Disclaimer: I know next to nothing about the works of Shakespeare.

 

Creating R Tutorials Using RMarkdown: Code Chunk Options

Two of us here in the Digital Project Studio have recently been working through an R script developed for a workshop on doing some basic mapping in R. The goal was to turn the script, which was used alongside in-person instruction, into a usable self-directed tutorial.  To do this we used R Markdown, an authoring platform that turns R scripts into reproducible and dynamic documents, presentations, and webpages. Our introductory tutorial will get you set up and started to using R Markdown. On this post we’ll share some of the additional features we’ve learned using this platform.

To find the actual R mapping workshop we created, the instructions and file downloads are accessible here: http://clarkdatalabs.github.io/mapping_R/

In the zipped file package you can find our R Markdown file for creating the instructions of the workshop: script_markup.Rmd

Continue reading “Creating R Tutorials Using RMarkdown: Code Chunk Options”

Working with Large Data Sets

This tutorial provides a walk-through of managing a large data set in R.  The sample data set used is on Precipitation in the Great Lakes Region retrieved from GLERL.  It is a multi-tab excel file that needs to be cleaned up in R before it can be used efficiently.  General methods of dealing with large datasets and the problems one can run into are included so that information in this tutorial can be applied to various types of data.

Continue reading “Working with Large Data Sets”