1 Ellenberg Reading
Please read the introduction to “How Not To Be Wrong” and answer the following questions.
- What was the overall population in the example using planes.
- Explain why some of the scientists and engineers were looking at the wrong sample of this population.
- What other issues of looking at the “wrong” population does this remind you of?
2 CES Sub-populations
Look at the columns of data available in CalEnviroscreen in the CES 4.0 Indicators Webpage You can also look at the CES 4.0 Data Dictionary in the Canvas course materials.
- Pick an outcome you are interested in the CalEnviroScreen data.
- Discuss why that outcome interests you.
- What subpopulations would you like to compare distributions of in this data?
- Why do you think those subpopulations are interesting to investigate? What question might the comparison answer?
3 Wheelan CLT Reading
Read Chapter 8 from Wheelan, Naked Statistics at this link. You can also use this link.
- What does the central limit theorem tell us?
- Come up with an example not present in the reading of the central limit theorem
4 CES Descriptive Statistics
Use your spreadsheet with the CES data.
- Compute the mean, median, and standard deviation of the asthma data and the PM2.5 data.
- Choose two other columns and compute these.
- Place your results in a small table and take a screenshot.
5 Wheelan Inference Reading
Read Chapter 9 of Wheelan, Naked Statistics on Statistical Inference. You can find the reading at this link.
- Come up with your own example of a statistical inference
- What is a hypothesis you’d like to test? How would you state the null hypothesis?
- What difference are you looking for?
- How will you measure this quantitatively?
6 CES Subpopulation Spreadsheet Preparation
In this exercises, we will plan out a spreadsheet computation to find the difference in means between a CES indicator for an entire population and for a subpopulation. You can refer to the notes for a reminder of how the query to make a sample works.
Make a copy of the data at this link in your Google Drive account. (For this exercise, we’ll use the CES 3.0 data set.)
- Write your hypothesis and a null hypothesis you would like to test
- Choose a variable you are interested in and note the name and column
- Choose another variable to create a sample or subpopulation and note the name and column
- Describe the numerical condition you will use to create your subpopulation
- Describe, in words, the query you’ll use to create your subpopulation
- Briefly say why you chose your variable and subpopulation and what you expect to find
- Describe how you will compute the standard error from your population and subpopulation
7 Define Terms
In the upcoming class, we will quantify the likelihood that the differences in means you observed in assignment 6 could have occurred at random. In that context, define the following terms using your choices in assignment 6.
- total population
- subpopulation(s)
- population mean
- sample mean
- standard deviation
- standard error
8 CalEnviroScreen Subpopulation Spreadsheet
We are using the data from this link.
Use your example from exercise 6.
- Explain what your population mean and sample means are.
- What is the (numerical) difference between your sample mean and the population mean?
- Use the size of your sample to compute the standard error given your sample size.
- Compare the difference in means to the standard error and comment on how likely it is that your observed difference was by chance.
- Compute the mean and median of the population and the sample
- Attach screenshots showing the formulas you used.
Quiz 1
The standard error quantifies
- the spread of the population distribution
- the spread of the individual samples
- the spread of the averages of repeated samples
- none of the above
Quiz 2
You have measurements of a population. You select 10 samples of the population at random and take the average. You also select 100 samples of the population at random and take an average. Which of these is true?
- the average of 10 samples is likely to be closer to the mean than the average of 100
- the average of 100 samples is likely to be closer to the mean than the average of 10
- the average of 10 samples is closer to the population average than the average of 100
- the average of 100 samples is closer to the population average than the average of 10
- none of the above
Quiz 3
You take a sample with a size of N and measure each and take the average. You then take another sample, twice as large (2N) and take the average. Which are true?
- The standard error of the second sample is larger than the first.
- The standard error of the second sample is smaller than the first.
- The average of the second sample is larger than the first.
- The average of the second sample is smaller than the first.