# Lecture 13

## 1. Statistical Abuses

### 1.1. Anscombe’s Quartet

- Summary statistics for groups identical
- Mean x = 9.0
- Mean y = 7.5
- Variance of x = 10.0
- Variance of y = 3.75
- Linear regression model: y = 0.5x + 3

- Are four data sets really similar?
- Moral:
**Sometimes, Statistics about the data is not the same as the data**
**Use visualization tools to look at the data itself**

### 1.2. Lying with Pictures

- Telling the Truth with Pictures
- Moral:
**Look carefully at the axes labels and scales**

### 1.3. GIGO (Garbage In, Garbage Out)

- Moral:
**Analysis of bad data can lead to dangerous conclusions.**

### 1.4. Non-representative Sampling

- “Convenience sampling” not usually random, e.g.,
- Survivor bias, e.g., course evaluations at end of course or grading final exam in 6.00.2x on a curve
- Non-response bias, e.g., opinion polls conducted by mail or online

- Moral:
**Understand how data was collected, and whether assumptions used in the analysis are satisfied. If not, be wary.**

### 1.5. A Comforting Statistic?

- 99.8% of the firearms in the U.S. will not be used to commit a violent crime in any given year
- How many privately owned firearms in U.S.?
- 300,000,000
- 300,000,000*0.002 = 600,000

- Moral:
**Context matters. A number means little without context.**

### 1.6. Relative to What?

- Consider drugs X and Y for treating acne
- X cures acne twice as well as Y
- X kills twice as many acne patients as Y

- Do you want to take X or Y?
- Suppose Y kills 0.00001% of cases, and cures 50% of them

- Moral:
**Beware of percentages when you don’t know the baseline**

### 1.7. Lurking Variable

- Does going to school contribute to the spread of flu?
- Moral:
**Establishing Causation**
- Attempt to control for all variables other than the variables of interest
- Randomized control studies the gold standard
- Start with a population
- Randomly assign members to either
- Control group
- Treatment group

- Deal with two groups identically except with respect to the one thing being evaluated
- Very hard to do