Introduction

Stanford Natural Language Processing (NLP) group at Stanford University has an open suite of language analysis tools that are available for the public to use. Most of the tools are only available in English but some have been translated into Chinese, Spanish, German, and Arabic. This tutorial will focus on the English tool sets, specifically the Named Entity Recognizer and the Parts of Speech Tagger. This is helpful is being able to pinpoint and extract specific locations / organizations from a text; Or if you wanted to look at the complexity of sentence structure; Or even looking for hesitations in transcripts for english as a second language learners and where they pause the longest. There are various applications to this technology in research and learning.

Named Entity Recognizer

The Named Entity Recognizer (or NER) will label words in the text that are names of things, such as a person, organization, location, and even gene and protein names. The output once your text is run through NER will look something like the image below with the NER output on the left and the Terminal output on the right:

Parts Of Speech Tagger

The Parts of Speech Tagger will allow you to copy and paste large quantities of text into the tagger and the tagger will assign parts of speech to each word such as noun, verb, adjective, etc. This tool tags parts of speech with 96.97% accuracy. The output when this is run will look something like what you see below:

Let’s get started with these tools!

Getting Started

Installing Java

You’ll need to have Java version 1.8 or later installed on your computer to run the Stanford NLP (Natural Language Processing) Software. To install Java go to Oracle’s website, click the Agree to Terms button and then choose the product you’re installing Java on.

Here are some additional instructions on how to install Java if you run into difficulties.

Part 1: Using the Named Entity Recognizer (NER)

Download the Named Entity Recognizer (NER) Software

The Named Entity Recognizer (or NER) will label words in the text that are names of things, such as a person, organization, location, and even gene and protein names. To use this free software you can download it here.

Make sure to save the NER files on your Desktop or some easily accessible place on your computer. Once the file is done downloading, unzip the file by double clicking it:

I like to rename the file to just stanford-ner, so that it’d easier to call the file from the Terminal window.

Using NER Through Terminal

Next open up Terminal and navigate to the stanford-ner folder.

To access Terminal on a Mac or Command Prompt on Windows you can check out the tutorials below:

If you have a Mac check out this video to learn more
If you have Windows 8 check out this video to learn more
This post shows how to open the command prompt for pre-Windows 8 systems

After you’re in the stanford-ner folder in Terminal, copy and paste the following into the Terminal window:

java -mx1000m -jar stanford-ner.jar

GitHub Gist: stanford-ner

Doing this should cause the Stanford Named Entity Recognizer to open:

Inside of this box you can delete the current text and paste your own text into the box. Next we need to run a classifier, which is a machine learning tool that takes the data items and places them into one of the k classes (what’s a k class???). To do this go to “Classifier” and “Load CRF from File”:

Next, select the “english.muc.7class.distsim.crf.ser” classifier from the classifier folder and click “Open”:

Several tags should now appear in the NER window on the right hand side of the screen and the NER button at the bottom should be highlighted now. Go ahead and click it.

The Results

After you click “Run NER” two things should happen. One the NER window should now have highlighted the corresponding tags on the right within the text like so:

And two, the terminal window should also list all the tags for location, organization, date, money, persons, time, etc:

And you’re done learning now to use Stanford’s Named Entity Recognizer! Now onto the Parts of Speech Tagger.

Part 2: Using the Parts of Speech Tagger

Download the Parts Of Speech Tagger

The Parts of Speech Tagger will allow you to copy and paste large quantities of text into the tagger and the tagger will assign parts of speech to each word such as noun, verb, adjective, etc. If you need to tag the parts of speech in your document you can download it here.

Go ahead and click the “basic English Stanford Tagger” since we’ll only be analyzing text in English.

Many of the steps that we do here are similar to what’s described above. This tagger uses the ‘english-left3words-distsim.tagger’ model which has a 96.97% accuracy when tagging the text you input. You can read more about common questions on the Parts of Speech Tagger here.

Using the Parts of Speech Tagger Through Terminal

Open up a Terminal window and navigate to the “stanford-postagger” folder that you just downloaded. There are instructions above on how to use Terminal and navigate to a folder using it. Once you’re in the folder, copy and paste the following command into the Terminal window:

java -mx1000m -jar stanford-postagger.jar

GitHub Gist: stanford-postagger

Once this line of code finishes running, the following window will appear:

You can copy and paste the text you’d like to tag in the first text box and click “Tag Sentence!”

The Results

The output will look something like this:

You’ll notice that all the tags for the parts of speech are attached to the word with an “_”. The tags are based on the University of Pennsylvania Treebank Tag-set, which the University of Leeds has a good decrypter available here (i.e. JJ = adjective, NN = Noun, etc).

Additional Sources

If you’d like to learn more about Stanford’s Natural Language Processing software and other free software tools, you can learn more at their home site where they have links to additional resources as well.

Thanks for reading!

Stanford’s Natural Language Processing Software: Text Tagging and Finding Named Entities

Introduction

Named Entity Recognizer

Parts Of Speech Tagger

Getting Started

Installing Java

Part 1: Using the Named Entity Recognizer (NER)

Download the Named Entity Recognizer (NER) Software

Using NER Through Terminal

The Results

Part 2: Using the Parts of Speech Tagger

Download the Parts Of Speech Tagger

Using the Parts of Speech Tagger Through Terminal

The Results

Additional Sources

Leave a Reply Cancel reply