# Lecture 6-7 - Stochastic Programs & Inferential Statistics

## Stochastic Programs

• A stochastic model is one in which randomness is present, and variable states are not described by unique values, but rather by probability distributions. The behavior of this model cannot be entirely predicted.
• events are independent of each other
• Probabilities are always in the range 0 to 1. 0 if impossible, and 1 if guaranteed.
• If the probability of an event occurring is p, the probability of it not occurring must be 1-p.

## Random Walk

• The reason we look at random walk is:
• it is a good example of how to use abstract data types and inheritance to structure programs in general and simulations in particular.

### Defines

• Location
• x, y : coordinates.
• move(deltaX, deltaY): move self with deltaX, deltaY
• getX(), getY()
• distFrom(otherLocation): check the distance between current location with otherLocation
• Field
• drunks: list of drunk objects
• addDrunk(drunk, loc): add a drunk with its location into Fields’ drunks
• moveDrunk(drunk): move the drunk’s location
• getLoc(drunk): get the drunk’s location
• Drunk(object)
• name: it’s name
• UsualDrunk(Drunk)
• takeStep(): move one step to left/right/up/down direction
• ColdDrunk(Drunk)
• takeStep(): move two steps towards south, the others are the same with UsualDrunk

### Simulations

• try 10, 100, 1000, 10000 steps with UsualDrunk and ColdDrunk • plot every final locations ## Inferential Statistics

• Definition:
• use simulations to estimate the value of an unknown quantity by making use of the principle of inferential statistics.
• Detail explanation
• Population: a set of examples
• Sample: a proper subset of a population
• Goal: Estimate some statistic about the population based on statistics about the sample
• Key fact: If the sample is random, it tends to exhibit the same properties as the population from which it is drawn
• Law of Large Number: In repeated independent experiments (e.g., flipping a fair coin 100 times and counting the fraction of heads) with the same expected value (0.5 in this case), the average value of the experiments approaches the expected value(平均值趋近于期望值) as the number of experiments goes to infinity.
• Gambler’s Fallacy: If deviations from expected behavior occur, these deviations are likely to be evened out by opposite deviations in the future.
1. that “large” is a relative concept.
2. the law of large numbers does not imply that the absolute difference between the number of heads and the number of tails decreases as the number of flips increases. In fact, we can expect that number to increase.
3. What decreases is the ratio of the absolute difference to the number of flips.

### Simulation

#### The ratio heads/tails (the law of large number)

• How many samples do we need to look at before we can have justified confidence in our answer?

• variance is measure of how much spread there is in the possible different outcomes, which can help us to justify.

• $\text{variance}(X) = \frac{\sum_{x\in{X}}{(x-\mu)^{2}}}{\lvert{X}\rvert}$
• standard deviation tells us what fraction of the values are close to the mean. If many values are relatively close to the mean, the standard deviation is relatively small.

• $\sigma(X) = \sqrt{\frac{1}{\lvert{X}\rvert}\sum_{x\in{X}}{(x-\mu)^{2}}}$

• where |X| is the size of the collection and ($\mu$) its mean.
• For example, flip coins exponentially, from 2^4 to 2^20, each times we flip 20 times to get the mean values with standard deviation function.  • Conclusion

• The ratio heads/tails is converging towards 1 and log of standard deviation is falling linearly with the log of the number of flips per trail.
• By the time we get to about 10^6 coin flips per trail, the standard deviation is roughly three decimal orders of magnitude smaller than the mean.
• As we flip more coins, not only do we have a more precise answer, but more important, we also have reason to be more confident that it is close to the right answer.
• The absolute difference between the numbers of heads and tails grows with the number of flips. Talk more in next section.

#### The absolute difference between heads and tails (gambler’s fallacy)

• coefficient of variation is the standard deviation divided by the mean, commonly used in fields such as engineering or physics when doing quality assurance studies. In general, distributions with a coefficient of variation of less than 1 are considered low-variance.
• For example, the expression “The standard deviation is 15% of the mean” is a CV. • Conclusion
• the plot of coefficient of variation for the heads/tails ratio is not much different from the plot of the standard deviation, cause the mean is close to 1.
• dispersion in the values of abs(heads – tails) is independent of the number of flips. As the numbers of tails goes up, the mean of abs(heads – tails) also keep growing, which proves Gambler’s Fallacy that abs(heads – tails) will never be even, only gets bigger.

### Distributions

• A histogram is a plot designed to show the distribution of values in a set of data. #### Normal Distributions and Confidence Levels

• formula of normal distribution: $f(x)=\frac{1}{\sigma\sqrt{2\pi}}*e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ ( μ is the mean, σ the standard deviation)

• Normal distributions are frequently used in constructing probabilistic models for three reasons:

1. they have nice mathematical properties,
2. many naturally occurring distributions are indeed close to normal
3. they can be used to produce confidence intervals.
• a confidence interval provides a range that is likely to contain the unknown value and a confidence that the unknown value lays within that range. • empirical rule for normal distributions:

• 68% of the data will fall within 1 standard deviation of the mean,
• 95% of the data will fall within 2 standard deviations of the mean,
• 99.7% of the data will fall within 3 standard deviations of the mean.
• For example:

• a political poll might indicate that a candidate is likely to get 52% of the vote ±4% (i.e., the confidence interval is of size 8) with a confidence level of 95%. What this means is that the pollster believes that 95% of the time the candidate will receive between 48% and 56% of the vote.
• 95%: 2 standard deviation
• 52%: i.e. 0.52, the mean
• 4%: i.e. 0.04. 1 standard deviation is 0.02
• Normal distributions can be easily generated by calling random.gauss(mu, sigma), which returns a randomly chosen floating point number from a normal distribution with mean mu and standard deviation sigma.

• the function gauss is short for Gaussian Distribution which is the same as Normal Distribution.

#### Extension

• Uniform Distribution
• If one were to plot the probability of each possible lottery number being chosen, it would be a flat line. Such distributions are called uniform.
• Exponential and Geometric Distributions
• check page 173 for more details
• Benford’s Distribution

## Words

• Law of Large Number 大数定律
• Gambler’s Fallacy 赌徒谬误
• Coefficient of Variation 变异系数
• Gaussian Distribution 高斯分布