Data:
First, fetch 100 random samples, get the standard deviation and mean:
1 | def makeHist(data, title, xlabel, ylabel, bins = 20): |
Result:
1 | Population mean = 16.298769461986048 |
Try it 1000 times and plot the sample means results
1 | random.seed(0) |
Result:
1 | Mean of sample Means = 16.294 |
To get a tighter bound, we tried:
Then use pylab.errorbar()
function to plot different sample sizes [50, 100, 200, 300, 400, 500, 600]:
1 | pylab.errorbar(xVals, sizeMeans, \ |
Result:
s
is the sample standard deviation (i.e., the sample-based estimate of the standard deviation of the population).
n
is the size (number of observations) of the sample.σ
is the standard deviation of the population.1 | def sem(popSD, sampleSize): |
1 | def getDiffs(population, sampleSizes): |
Does the Distribution of Population matter?
1 | def compareDists(): |
Does Population Size Matter?
1 | popSizes = (10000, 100000, 1000000) |
skew
: A distribution is skewed if one tail extends out further than the other. A distribution has a positive skew (is skewed to the right) if the tail to the right is longer. It has a negative skew (skewed to the left) if the tail to the left is longer.Are 200 Samples Enough to Estimate the Mean of Population?
1 | random.seed(0) |
What if we use continuous 200 samples ?
1 | for t in range(numTrials): |
Conclusion for the last two tests