I used the nltk ngram functions to pull the common phrases from a group of student essays. I also filtered out the commonly used words (stopwords) from the text body. This ended up being a fairly quick exercise thanks to the popularity of nltk and some stack overflow posts. I'm not sure how useful it is yet, but it seems pretty simple to set up so maybe I'll try it on a few low-stakes assignments this fall.
import nltk
from nltk.corpus import stopwords
import yaml
def print_ngrams(num_words=2, depth=100):
grams = nltk.ngrams(tokens, num_words)
ngfd = nltk.FreqDist(grams)
for ng in ngfd.most_common()[:depth]:
print(str(ng[1]).rjust(3), ':', ' '.join(ng[0]))
# load in essay text
f = open('essays.yaml')
essays = yaml.load(f)
raw = ''
for e in essays:
if e['assignment'] == 'HW20':
raw += e['Essay']
# tokenize with nltk
tokens = nltk.word_tokenize(raw)
# filter out punctuation
tokens = [token.lower() for token in tokens if token.isalpha()]
# think about how to deal with stop words
tokens = [token for token in tokens if not token in stopwords.words('english')]
for num in [1, 2, 3]:
print(str(num) + '-grams')
print_ngrams(num_words=num, depth=20)
1-grams
232 : project
212 : data
147 : would
122 : group
120 : work
114 : team
103 : able
99 : class
79 : also
76 : could
75 : one
73 : course
73 : learned
73 : time
71 : semester
69 : proud
69 : think
64 : skills
61 : get
59 : feel
2-grams
19 : team agreements
18 : feel like
16 : sage math
14 : would say
13 : data analysis
12 : skills developed
12 : math cloud
11 : beginning semester
11 : sonoma state
10 : working team
10 : group members
10 : data sets
10 : blower door
9 : team leader
9 : able get
9 : learned lot
8 : team agreement
8 : energy data
8 : also learned
8 : real world
3-grams
12 : sage math cloud
4 : skills developed course
4 : semester long project
4 : worked well together
3 : would team leader
3 : adhere team agreements
3 : first homework assignment
3 : team agreements first
3 : environmental technology center
3 : blower door test
2 : amount time work
2 : great learning experience
2 : developed throughout course
2 : feel proud work
2 : future jobs projects
2 : ability work others
2 : proud poster put
2 : work project semester
2 : information able find
2 : give us different