STAT GU4206/GR5206 Lab 5 solved

$35.00

Category: You will receive a download link of the .ZIP file upon Payment

Description

5/5 - (1 vote)

Goal
The goal of this lab is to investigate the empirical behavior of a common hypothesis testing procedure through
simulation using R. We consider the traditional two-sample t-test.
Two-Sample T-Test
Consider an experiment testing if a 35 year old male’s heart rate statistically differs between a control group
and a dosage group. Let X denote the control group and let Y denote the drug group. One common method
used to solve this problem is the two-sample t-test. The null hypothesis for this study is:
H0 : µ1 − µ2 = ∆0,
where ∆0 is the hypothesized value. The assumptions of the two-sample t-test follow below:
Assumptions
1. X1, X2, . . . , Xm is a random sample from a normal distribution with mean µ1 and variance σ
2
1
.
2. Y1, Y2, . . . , Yn is a random sample from a normal distribution with mean µ2 and variance σ
2
2
.
3. The X and Y samples are independent of one another.
Procedure
The test statistic is
tcalc =
x¯q
− y¯ − ∆0
s
2
1
m +
s
2
2
n
,
where x, ¯ y¯ are the respective sample means and s
2
1
, s2
2 are the respective sample standard deviations.
1
The approximate degrees of freedom is
df =

s
2
1
m +
s
2
2
n
2
(s
2
1
/m)
2
m−1 +
(s
2
2
/n)
2
n−1
Under the null hypothesis, tcalc (or Tcalc) has a student’s t-distribution with df degrees of freedom.
Rejection rules
Alternative Hypothesis P-value calculation
HA : µ1 − µ2 > ∆0 (upper-tailed) P(tcalc > T)
HA : µ1 − µ2 < ∆0 (lower-tailed) P(tcalc < T)
HA : µ1 − µ2 6= ∆0 (two-tailed) 2 ∗ P(|tcalc| > T)
Reject H0 when:
P value ≤ α
Tasks
1) Using the R function t.test, run the two sample t-test on the following simulated dataset. Note that
the t.test function defaults a two-tailed alternative. Also briefly interpret the output.
set.seed(5)
sigma=5
Control <- rnorm(30,mean=10,sd=sigma)
Dosage <- rnorm(35,mean=12,sd=sigma)
#t.test()
2) Write a function called t.test.sim that simulates R different samples of X for control and R different
samples of Y for the drug group and computes the proportion of test statistics that fall in the rejection
region. The function should include the following:
• Inputs:
– R is the number of simulated data sets (simulated test statistics). Let R have default 10,000.
– Parameters mu1, mu2, sigma1 and sigma2 which are the respective true means and true
standard deviations of X & Y . Let the parameters have respective defaults mu1=0, mu2=0,
sigma1=1 and sigma2=1.
– Sample sizes n and m defaulted at m=n=30.
– level is the significance level as a decimal with default at α = .05.
– value is the hypothesized value defaulted at 0.
• The output should be a list with the following labeled elements:
– statistic.list vector of simulated t-statistics (this should have length R).
– pvalue.list vector of empirical p-values (this should have length R).
– rejection.rate is a single number that represents the proportion of simulated test statistics
that fell in the rejection region.
2
I started the function below:
t.test.sim <- function(R=10000,
mu1=0,mu2=0,
sigma1=1,sigma2=1,
m=30,n=30,
level=.05,
value=0,
direction=”Two”) {
#Define empty lists
statistic.list <- rep(0,R)
pvalue.list <- rep(0,R)
#for (i in 1:R) {
# Sample realized data
#Control <-
#Dosage <-
# Testing values
#testing.procedure <-
#statistic.list[i] <-
#pvalue.list[i] <-
#}
#rejection.rate <-
#return()
}
Evaluate your function with the following inputs R=10,mu1=10,mu2=12,sigma1=5 and sigma2=5.
3) Assuming the null hypothesis
H0 : µ1 − µ2 = 0
is true, compute the empirical size (or rejection rate) using 10,000 simulated data sets. Use the function
t.test.sim to accomplish this task and store the object as sim. Output the empirical size quantity
sim$rejection.rate. Comment on this value. What is it close to?
Note: use mu1=mu2=10 (i.e., the null is true). Also set sigma1=5,sigma2=5 and n=m=30.
4) Plot a histogram of the simulated P-values, i.e., hist(sim$pvalue.list). What is the probability
distribution shown from this histogram? Does this surprise you?
5) Plot a histogram illustrating the empirical sampling distribution of the t-statistic, i.e., hist(sim$statistic.list,probability
=TRUE). What is the probability distribution shown from this histogram?
6) Run the following four lines of code:
t.test.sim(R=1000,mu1=10,mu2=10,sigma1=5,sigma2=5)$rejection.rate
t.test.sim(R=1000,mu1=10,mu2=12,sigma1=5,sigma2=5)$rejection.rate
t.test.sim(R=1000,mu1=10,mu2=14,sigma1=5,sigma2=5)$rejection.rate
t.test.sim(R=1000,mu1=10,mu2=16,sigma1=5,sigma2=5)$rejection.rate
Comment on the results.
3
7) Run the following four lines of code:
t.test.sim(R=10000,mu1=10,mu2=12,sigma1=10,sigma2=10,m=10,n=10)$rejection.rate
t.test.sim(R=10000,mu1=10,mu2=12,sigma1=10,sigma2=10,m=30,n=30)$rejection.rate
t.test.sim(R=10000,mu1=10,mu2=12,sigma1=10,sigma2=10,m=50,n=50)$rejection.rate
t.test.sim(R=10000,mu1=10,mu2=12,sigma1=10,sigma2=10,m=100,n=100)$rejection.rate
Comment on the results.
8) Extra credit: Modify the t.test.sim() function to investigate how the power and size behave in the
presence of heavy tailed data, i.e., investigate how robust the t-test is in the presence of violations
from normality.
Hint: The Cauchy distribution and the students’ t-distribution with low df are both heavy tailed.
4