Date

I used the nltk ngram functions to pull the common phrases from a group of student essays. I also filtered out the commonly used words (stopwords) from the text body. This ended up being a fairly quick exercise thanks to the popularity of nltk and some stack overflow posts. I'm not sure how useful it is yet, but it seems pretty simple to set up so maybe I'll try it on a few low-stakes assignments this fall.

import nltk
from nltk.corpus import stopwords
import yaml

def print_ngrams(num_words=2, depth=100):
    grams = nltk.ngrams(tokens, num_words)
    ngfd = nltk.FreqDist(grams)
    for ng in ngfd.most_common()[:depth]:
        print(str(ng[1]).rjust(3), ':', ' '.join(ng[0]))
# load in essay text

f = open('essays.yaml')
essays = yaml.load(f)

raw = ''
for e in essays:
    if e['assignment'] == 'HW20':
        raw += e['Essay']
# tokenize with nltk
tokens = nltk.word_tokenize(raw)
# filter out punctuation
tokens = [token.lower() for token in tokens if token.isalpha()]
# think about how to deal with stop words
tokens = [token for token in tokens if not token in stopwords.words('english')]
for num in [1, 2, 3]:
    print(str(num) + '-grams')
    print_ngrams(num_words=num, depth=20)
1-grams
232 : project
212 : data
147 : would
122 : group
120 : work
114 : team
103 : able
 99 : class
 79 : also
 76 : could
 75 : one
 73 : course
 73 : learned
 73 : time
 71 : semester
 69 : proud
 69 : think
 64 : skills
 61 : get
 59 : feel
2-grams
 19 : team agreements
 18 : feel like
 16 : sage math
 14 : would say
 13 : data analysis
 12 : skills developed
 12 : math cloud
 11 : beginning semester
 11 : sonoma state
 10 : working team
 10 : group members
 10 : data sets
 10 : blower door
  9 : team leader
  9 : able get
  9 : learned lot
  8 : team agreement
  8 : energy data
  8 : also learned
  8 : real world
3-grams
 12 : sage math cloud
  4 : skills developed course
  4 : semester long project
  4 : worked well together
  3 : would team leader
  3 : adhere team agreements
  3 : first homework assignment
  3 : team agreements first
  3 : environmental technology center
  3 : blower door test
  2 : amount time work
  2 : great learning experience
  2 : developed throughout course
  2 : feel proud work
  2 : future jobs projects
  2 : ability work others
  2 : proud poster put
  2 : work project semester
  2 : information able find
  2 : give us different