We have a Netatmo datalogger installed in our Environmental Technology Center. As a first project, we want to see how accurately we can infer the number of occupants from the observed carbon dioxide levels in the room.

Humans exhale carbon dioxide at a rate of approximately 250 mL per minute. In the classroom space where the sensor is installed, this corresponds to a 0.18 ppm per minute per person increase in the carbon dioxide level. By measuring the slopes of the carbon dioxide signal during class, we can infer the number of students.

The Netatmo interface allows us to download a CSV file which I can load using the pandas library.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data = pd.read_csv('Main Room_6-2-2015.csv', 
                   skiprows=3,       # ignore the first 3 rows of data
                   sep=';',          # semicolon is used to separate data values
                   index_col=1,      # use column 1 as the dates to index the data
                   parse_dates=True) # convert the date string into a date object
In [2]:
<matplotlib.axes._subplots.AxesSubplot at 0x107525210>

I can define a function that takes the timestamp and the carbon dioxide for a range of time and perform a linear fit and infer the student attendance.

In [3]:
def process_data(begin, end, data, plot=False):
    x = data['Timestamp'][begin:end]
    x = x - x[0]
    y = data['CO2'][begin:end]

    fit = np.polyfit(x,y,1)
    room_ppm_per_second = fit[0]
    room_ppm_per_minute = room_ppm_per_second * 60
    ppm_per_student_per_minute = 0.176
    num_students = room_ppm_per_minute / ppm_per_student_per_minute

    if plot:
        yfit = np.polyval(fit, x)
        plt.plot(x, y)
        plt.plot(x, yfit)

    print('Start time = {}'.format(begin))
    print('End time = {}'.format(end))
    print('Carbon dioxide rate of increase {:.2f} ppm per minute'.format(room_ppm_per_minute))    
    print('Estimated number of students: {:.0f}'.format(num_students))

Using this function, I can quickly look at different class times and infer the occupancy.

In [4]:
class_starts = (
                ('2015-01-28 09:10', '2015-01-28 10:00'),
                ('2015-01-29 10:00', '2015-01-29 11:00'),
                ('2015-01-26 09:10', '2015-01-26 10:30')

for begin, end in class_starts:
    process_data(begin, end, data, plot=True)
Start time = 2015-01-28 09:10
End time = 2015-01-28 10:00
Carbon dioxide rate of increase 7.03 ppm per minute
Estimated number of students: 40

Start time = 2015-01-29 10:00
End time = 2015-01-29 11:00
Carbon dioxide rate of increase 9.72 ppm per minute
Estimated number of students: 55

Start time = 2015-01-26 09:10
End time = 2015-01-26 10:30
Carbon dioxide rate of increase 6.34 ppm per minute
Estimated number of students: 36