I used the nltk ngram functions to pull the common phrases from a group of student essays. I also filtered out the commonly used words (stopwords) from the text body. This ended up being a fairly quick exercise thanks to the popularity of nltk and some stack overflow posts. I'm not sure how useful it is yet, but it seems pretty simple to set up so maybe I'll try it on a few low-stakes assignments this fall.
import nltk from nltk.corpus import stopwords import yaml def print_ngrams(num_words=2, depth=100): grams = nltk.ngrams(tokens, num_words) ngfd = nltk.FreqDist(grams) for ng in ngfd.most_common()[:depth]: print(str(ng).rjust(3), ':', ' '.join(ng))
# load in essay text f = open('essays.yaml') essays = yaml.load(f) raw = '' for e in essays: if e['assignment'] == 'HW20': raw += e['Essay']
# tokenize with nltk tokens = nltk.word_tokenize(raw) # filter out punctuation tokens = [token.lower() for token in tokens if token.isalpha()] # think about how to deal with stop words tokens = [token for token in tokens if not token in stopwords.words('english')]
for num in [1, 2, 3]: print(str(num) + '-grams') print_ngrams(num_words=num, depth=20)
1-grams 232 : project 212 : data 147 : would 122 : group 120 : work 114 : team 103 : able 99 : class 79 : also 76 : could 75 : one 73 : course 73 : learned 73 : time 71 : semester 69 : proud 69 : think 64 : skills 61 : get 59 : feel 2-grams 19 : team agreements 18 : feel like 16 : sage math 14 : would say 13 : data analysis 12 : skills developed 12 : math cloud 11 : beginning semester 11 : sonoma state 10 : working team 10 : group members 10 : data sets 10 : blower door 9 : team leader 9 : able get 9 : learned lot 8 : team agreement 8 : energy data 8 : also learned 8 : real world 3-grams 12 : sage math cloud 4 : skills developed course 4 : semester long project 4 : worked well together 3 : would team leader 3 : adhere team agreements 3 : first homework assignment 3 : team agreements first 3 : environmental technology center 3 : blower door test 2 : amount time work 2 : great learning experience 2 : developed throughout course 2 : feel proud work 2 : future jobs projects 2 : ability work others 2 : proud poster put 2 : work project semester 2 : information able find 2 : give us different