## Question 1

Consider the dataset provided in data.csv. These data include two variables: V1 and V2.

a) Make a scatter plot comparing V1 and V2. Does it appear that V1 and V2 are positively correlated?

(4 points) b) Compute the Pearson correlation coefficient between V1 and V2, and also the *p*-value of this correlation (you may use the cor.test and/or cor functions in *R*).

(3 points) c) Is the correlation coefficient you reported in part b) positive or negative? Does the *p*-value reject the null hypothesis of no correlation at an *α*-level of 0.05? Does this agree or disagree with your observation in part a)? Why, or why not?

(3 points)

## Question 2

Consider the following contingency table describing smoking and cancer among people who may be ship builders (discussed in Lecture 18).

1 of 2

STAT305/605 E100 Fall 2020 SFU Due December 8 5PM PST

a) Collapse the above contingency table over the smoking levels to provide a 2×2 contingency table with ship building status along the columns and cancer status along the rows.

(3 points)

b) According to the 2×2 contingency table in question a), is cancer associated with ship building? You may use a Fisher exact test (you may use the command fisher.test in *R*) or you may use a chi-squared test (you may use the command chisq.test in *R*). Use nominal significance throughout this problem to assess significance.

(3 points) c) If we stratify by smoking status, is ship building still, no longer, never or now associated with cancer? You may use a Mantel-Haenszel test (*R* function mantelhaen.test).

(2 points) d) Interpret the above results: For example, could ship building cause cancer, possibly due to exposure to hazardous materials and insufficient PPE? Or could it be that ship builders are more likely to smoke than people not in that profession, leading to cancer? Or maybe there’s no association with cancer in any case? Or there’s association in both cases? Maximum two sentences.

(2 points)