Regression HW 3

Advanced Regression Analysis
Author

SEOYEON CHOI

Published

November 21, 2022

고급회귀분석 과제, CH03,07


고급회귀분석 세번째 과제입니다.

제출 기한 : 11월 21일

제출 방법

(pdf 아닌 문서는 미제출로 간주)

주의사항

예) R에서 lm으로 beta의 추정량을 구하면 안 됨. 수업 시간에 배운 식으로 풀이를 적어야 함.

************ R을 이용해서 푸는 문제는, R 코드도 같이 업로드.


1.

어떤 큰 공장에서 동일한 기계들의 정비기록에 관한 표본자료를 취하였다. 이는 기계의 사용연도(age of machines)와 정비비용(maintenance cost) 간에 어떤 관계가 있는가를 밝혀내기 위한 것이다. 그 자료는 다음과 같다 (표본의 크기 \(n = 14\)).

사용연도\(X\)(단위:년) 정비비용\(Y\)(단위:1,000원)
3 39
1 24
5 115
8 105
1 50
4 86
2 67
6 90
9 140
3 112
5 70
7 186
2 43
6 126
dt <- data.frame(y=c(39,24,115,105,50,86,67,90,140,112,70,186,43,126), 
                 x=c(3,1,5,8,1,4,2,6,9,3,5,7,2,6))
x <- matrix(c(rep(1,14), dt$x), ncol=2)
y <- matrix(c(dt$y), ncol=1)

(1)

\(X^\top X, X^\top y, y^\top y\)\((X^\top X)^{-1}\)을 구하시오.

xx <- t(x) %*% x
xx
A matrix: 2 × 2 of type dbl
14 62
62 360
xy <- t(x) %*% y
xy
A matrix: 2 × 1 of type dbl
1253
6714
yy <- t(y) %*% y
yy
A matrix: 1 × 1 of type dbl
138197
xx_i <- solve(xx)
xx_i
A matrix: 2 × 2 of type dbl
0.30100334 -0.05183946
-0.05183946 0.01170569

(2)

\(\hat{\beta} = (X^\top X)^{-1}X^\top y\)를 구하고, 적합된 회귀선형을 써 보아라.

betahat <- xx_i %*%  xy
betahat
A matrix: 2 × 1 of type dbl
29.10702
13.63712

\(y = 29.1 + 13.6x\)

coef(lm(y~x, dt))
(Intercept)
29.1070234113712
x
13.6371237458194

(3)

\(\sigma^2\)을 MSE 로 추정할 경우, \(\hat{\beta}\)의 분산-공분산행렬의 추정은 \(\hat{Var}(\hat{\beta}) = (X^\top X)^{-1}\)(MSE)이다. 먼저 분산분석하여 MSE 를 구하고 \(\hat{Var}(\hat{\beta})\)을 구하시오

n <- 14
p <- 1
In = diag(rep(1,n))
H = x %*% xx_i %*% t(x)

\(SSE = y^\top(I-H)y\)

\(SST = y^\top y - n(\bar{y})^2\)

sse = t(y) %*% (In-H) %*% y
sse
A matrix: 1 × 1 of type dbl
10166.25
sst = t(y) %*% y - n * mean(y)^2
sst
A matrix: 1 × 1 of type dbl
26053.5
ssr = sst- sse
ssr
A matrix: 1 × 1 of type dbl
15887.25
msr = ssr/p
msr
A matrix: 1 × 1 of type dbl
15887.25
mse = sse/(n-p-1)
mse
A matrix: 1 × 1 of type dbl
847.1876
f0 = msr/mse
f0
A matrix: 1 × 1 of type dbl
18.75293
qf(0.95,p,n-p-1)
4.74722534672251

분산분석표

요인 제곱합 자유도(df) 평균제곱(MS) \(F_0\) 유의확률
회귀 15887.25 1 15887.25 18.75293 4.747225
잔차 10166.25 12 847.19
26053.5 13
hat_var_betahat <- mse[,] * xx_i
vcov(lm(y~x, dt))
A matrix: 2 × 2 of type dbl
(Intercept) x
(Intercept) 255.00629 -43.917750
x -43.91775 9.916911

2.

어떤 공정에서 나오는 제품의 강도\((kg/cm^2)\)가 그 공정의 온도와 압력에 어떠한 영향을 받는가를 조사하기 위하여 다음의 데이터를 얻었다.

공정온도\(x_1\)(단위:°C) 공정압력\(x_2\)(단위:\(psi\)) 강도\(y\)(단위:\(kg/cm^2\))
195 57 81.4
179 61 122.2
205 60 101.7
204 62 175.6
201 61 150.3
184 54 64.8
210 58 92.1
209 61 113.8
dt <- data.frame(y=c(81.4,122.2,101.7,175.6,150.3,64.8,92.1,113.8),
                 x1 = c(195,179,205,204,201,184,210,209),
                 x2 = c(57,61,60,62,61,54,58,61))
m2 <- lm(y~., dt)
summary(m2)

Call:
lm(formula = y ~ ., data = dt)

Residuals:
      1       2       3       4       5       6       7       8 
 -5.259 -14.775 -18.742  31.259  17.279  11.744  -3.723 -17.783 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept) -554.3112   196.9775  -2.814   0.0374 *
x1            -0.1797     0.7626  -0.236   0.8230  
x2            11.8600     3.2301   3.672   0.0144 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.64 on 5 degrees of freedom
Multiple R-squared:  0.7478,    Adjusted R-squared:  0.6469 
F-statistic: 7.412 on 2 and 5 DF,  p-value: 0.03195
anova(m2)
A anova: 3 × 5
Df Sum Sq Mean Sq F value Pr(>F)
<int> <dbl> <dbl> <dbl> <dbl>
x1 1 628.467 628.4670 1.342371 0.29894191
x2 1 6311.728 6311.7278 13.481505 0.01441752
Residuals 5 2340.884 468.1768 NA NA

(1)

선형회귀모형, \(y_j = \beta_0 + \beta_1 x_{1j} + \beta_2 x_{2j} + \epsilon_j\) 가 성립된다고 가정하고 데이터로부터 회귀모형을 추정하시오.

x = matrix(c(rep(1,8),dt$x1, dt$x2), ncol=3)
y = dt$y
xx <- t(x) %*% x
xx
A matrix: 3 × 3 of type dbl
8 1587 474
1587 315745 94108
474 94108 28136
xx_i <- solve(xx)
xx_i
A matrix: 3 × 3 of type dbl
82.8749894 -0.134598917 -0.945973490
-0.1345989 0.001242266 -0.001887521
-0.9459735 -0.001887521 0.022285408
xy <- t(x) %*% y
xy
A matrix: 3 × 1 of type dbl
901.9
179676.4
54034.3
yy <- t(y) %*% y
yy
A matrix: 1 × 1 of type dbl
110959
b <- xx_i %*% xy
b
A matrix: 3 × 1 of type dbl
-554.3112232
-0.1797396
11.8599927

\(y = -554.31 - 0.1797 x_1 + 11.86 x_2\)

(2)

오차분산이 \(\sigma^2 = 3\)이라 하면, \(Var(\hat{\beta}_0), Var(\hat{\beta}_1), Var(\hat{\beta}_2)\)\(Cov(\hat{\beta}_1, \hat{\beta}_2)\)는 무엇인가?

\(var(\hat{\beta}) = (x^\top x)^{-1} \sigma^2\)

xx_i * 3
A matrix: 3 × 3 of type dbl
248.6249683 -0.403796751 -2.837920471
-0.4037968 0.003726798 -0.005662562
-2.8379205 -0.005662562 0.066856223

\(var(\hat{\beta}_0) = 248.6250\)

\(var(\hat{\beta}_1) = 0.0037\)

\(var(\hat{\beta}_2) = 0.0669\)

\(cov(\hat{\beta}_1, \hat{\beta}_2) = -0.0057\)

(3)

\(x_1 = 200\)°C이고 \(x_2 = 59\text{psi}\)에서 평균 제품의 강도의 추정값 \(\hat{y}\)는 얼마인가? 이의 분산을 \(\sigma^2 = 3\)이라 가정하고 구하시오.

new_dt <- data.frame(x1=200, x2=59)
new_dt
A data.frame: 1 × 2
x1 x2
<dbl> <dbl>
200 59
predict(m2, newdata = new_dt)
1: 109.480424963516
x0 <- c(1,200,59)
x0
  1. 1
  2. 200
  3. 59
mu0 <- x0 %*% b
mu0
A matrix: 1 × 1 of type dbl
109.4804

\(\hat{\mu}_0 = 109.4804\)

(4)

추정된 회귀계수 \(\hat{\beta}_1, \hat{\beta}_2\)의 의미는 무엇인가?

\(\hat{\beta}_1\) : 공정압력이 일정하게 유지되었을 때, 공정온도가 1도씨 증가하면 강도는 0.1797만큼 감소한다.

\(\hat{\beta}_2\) : 공정온도가 일정하게 유지되었을 때, 공정압력이 1psi 증가하면 강도는 11.8600만큼 증가한다.

(5)

분산분석표를 작성하고 \(\alpha = 0.05\)\(F\)-검정을 행하시오.

anova(m2)
A anova: 3 × 5
Df Sum Sq Mean Sq F value Pr(>F)
<int> <dbl> <dbl> <dbl> <dbl>
x1 1 628.467 628.4670 1.342371 0.29894191
x2 1 6311.728 6311.7278 13.481505 0.01441752
Residuals 5 2340.884 468.1768 NA NA
I8 = diag(rep(1,8))
I8
A matrix: 8 × 8 of type dbl
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1
H = x %*% xx_i %*% t(x)
H
A matrix: 8 × 8 of type dbl
0.223303342 0.0473478224 0.0925307250 0.004932881 0.04854185 0.354021685 0.18034566 0.04897604
0.047347822 0.7875815614 0.0003377034 0.178850120 0.18539614 0.121730006 -0.28766297 -0.03358038
0.092530725 0.0003377034 0.1733021360 0.174906227 0.15025388 0.004944942 0.19895553 0.20476885
0.004932881 0.1788501200 0.1749062270 0.274444297 0.21838554 -0.166837528 0.08255641 0.23276205
0.048541845 0.1853961381 0.1502538806 0.218385537 0.18446745 -0.053127978 0.08195337 0.18412975
0.354021685 0.1217300062 0.0049449423 -0.166837528 -0.05312798 0.711046519 0.14493505 -0.11671270
0.180345664 -0.2876629720 0.1989555317 0.082556415 0.08195337 0.144935052 0.38255762 0.21635932
0.048976035 -0.0335803794 0.2047688541 0.232762052 0.18412975 -0.116712699 0.21635932 0.26329707

\(SSE = y^\top (I-H)y\)

\(SST = y^\top y - n(\bar{y})^2\)

sse = t(y) %*% (I8-H) %*% y
sse
A matrix: 1 × 1 of type dbl
2340.884
sst = t(y) %*% y - 8 * mean(y)^2
sst
A matrix: 1 × 1 of type dbl
9281.079
ssr = sst- sse
ssr
A matrix: 1 × 1 of type dbl
6940.195
msr = ssr/2
msr
A matrix: 1 × 1 of type dbl
3470.097
mse = sse/5
mse
A matrix: 1 × 1 of type dbl
468.1768
f0 = msr/mse
f0
A matrix: 1 × 1 of type dbl
7.411938
qf(0.95,2,5)
5.78613504334997
요인 제곱합 자유도(df) 평균제곱(MS) \(F_0\) 유의확률
회귀 6940.195 2 3470.097 7.4119 5.786135
잔차 2340.884 5 468.177
9281.079 7

결론 : 검정통계량의 관측값이 기각역에 속하므로 귀무가설 기각. 즉 모형이 유의수준 0.05하에서 유의하다고 할 수 있다.

(6)

결정계수 \(R^2\)을 구하시오.

summary(m2)$r.squared
0.747778895028607

\(R^2 = \frac{SSR}{SST}\)

ssr/sst
# 0.7478
A matrix: 1 × 1 of type dbl
0.7477789

(7)

수정된 결정계수 \(R^2_{adj}\) 을 구하시오.

summary(m2)$adj
0.64689045304005

\(R^2_{adj} = 1-\frac{SSE(n-p-1)}{SST/(n-1)}\)

1-(sse/5)/(sst/7)
# 0.6469
A matrix: 1 × 1 of type dbl
0.6468905

(8)

\(\sigma^2\)의 추정값 MSE 를 구하시오.

\(\sigma^2 = MSE\)

mse
##468.17687
A matrix: 1 × 1 of type dbl
468.1768

(9)

MSR의 기대값을 \(\sigma^2\)\(\beta_0, \beta_1, \beta_2\)의 함수로 표시하여라.

I8 <- diag(rep(1,8))
I8
A matrix: 8 × 8 of type dbl
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1
one1 <- rep(1,8)
one1
  1. 1
  2. 1
  3. 1
  4. 1
  5. 1
  6. 1
  7. 1
  8. 1
Jn <- one1 %*% t(one1)
Jn
A matrix: 8 × 8 of type dbl
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
t(x) %*% (I8 - Jn/8) %*% x
A matrix: 3 × 3 of type dbl
0 0.000 0.00
0 923.875 78.25
0 78.250 51.50

3.

어떤 공장에서 물의 소비량을 조사하기 위하여 매달의 물소비량(\(y\)), 평균온도(\(x_1\)), 작업일수(\(x_2\))와 작업량(\(x_3\))에 관한 데이터를 얻었다.

물소비량\(y\)(단위:1,000톤) 평균온도\(x_1\)(단위:°C) 작업일수\(x_2\)(단위:일) 작업량\(x_3\)(단위:1,000톤)
2.8 10 27 64
3.9 24 26 72
3.9 25 28 80
4.4 28 26 88
3.1 15 30 81
3.1 18 24 45
3.5 22 27 46
3.6 22 25 69
3.0 12 27 54
3.3 15 25 39
dt <- data.frame(y=c(2.8,3.9,3.9,4.4,3.1,3.1,3.5,3.6,3.0,3.3),
                 x1 = c(10,24,25,28,15,18,22,22,12,15),
                 x2 = c(27,26,28,26,30,24,27,25,27,25),
                 x3 = c(64,72,80,88,81,45,46,69,54,39))
m3 <- lm(y~., dt)
summary(m3)

Call:
lm(formula = y ~ ., data = dt)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.23490 -0.07744 -0.02166  0.08840  0.23442 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept)  2.409213   1.125954   2.140  0.07618 . 
x1           0.069788   0.012640   5.521  0.00149 **
x2          -0.024767   0.044830  -0.552  0.60060   
x3           0.005864   0.005052   1.161  0.28978   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.172 on 6 degrees of freedom
Multiple R-squared:  0.9202,    Adjusted R-squared:  0.8803 
F-statistic: 23.05 on 3 and 6 DF,  p-value: 0.001079
anova(m3)
A anova: 4 × 5
Df Sum Sq Mean Sq F value Pr(>F)
<int> <dbl> <dbl> <dbl> <dbl>
x1 1 2.004315887 2.004315887 67.73858599 0.0001737553
x2 1 0.002273071 0.002273071 0.07682153 0.7909535228
x3 1 0.039877142 0.039877142 1.34770234 0.2897755827
Residuals 6 0.177533900 0.029588983 NA NA

(1)

데이터로부터 회귀모형, \(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x_1 + \hat{\beta}_2 x_2 + \hat{\beta}_3 x_3\) 을 구하여라. 그리고 어째서 이 모형이 선택되었는가에 대하여 토의하시오.

x = matrix(c(rep(1,10),dt$x1, dt$x2, dt$x3), ncol=4)
y = dt$y
xx <- t(x) %*% x
xx
A matrix: 4 × 4 of type dbl
10 191 265 638
191 3971 5047 12620
265 5047 7049 17038
638 12620 17038 43324
xx_i <- solve(xx)
xx_i
A matrix: 4 × 4 of type dbl
42.8460778 -0.274517258 -1.666784957 0.1044984818
-0.2745173 0.005399764 0.009802163 -0.0013851965
-1.6667850 0.009802163 0.067921641 -0.0050213139
0.1044985 -0.001385196 -0.005021314 0.0008624387
xy <- t(x) %*% y
xy
A matrix: 4 × 1 of type dbl
34.6
686.3
916.0
2249.9
yy <- t(y) %*% y
yy
A matrix: 1 × 1 of type dbl
121.94
b <- xx_i %*% xy
b
A matrix: 4 × 1 of type dbl
2.409213010
0.069788005
-0.024766597
0.005864434

\(\hat{y} = 2.409 + 0.0698x_1 - 0.0248x_2 + 0.0059x_3\)

(2)

\(\hat{\beta}_1, \hat{\beta}_2\)\(\hat{\beta}_3\)의 의미는 무엇인가?

\(\hat{\beta}_1\) : 작업일수, 작업량이 변하지 않을 때, 평균온도가 1도씨 증가하면 물소비량은 69.8톤 증가한다.

\(\hat{\beta}_2\) : 평균온도, 작업량이 변하지 않을 때, 작업일수가 1일 증가하면 물소비량은 24.8톤 감소한다.

\(\hat{\beta}_3\) : 평균온도, 작업일수이 변하지 않을 때, 작업량이 1000톤 증가하면 물소비량은 5.9톤 증가한다.

(3)

\(Var(\hat{\beta}_3)\)을 구하시오. \(\sigma^2\)을 MSE 로 추정하면 \(\hat{Var}(\hat{\beta}_3)\)은 무엇인가?

summary(m3)$coefficients
A matrix: 4 × 4 of type dbl
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.409213010 1.125953765 2.1397087 0.076181015
x1 0.069788005 0.012640155 5.5211353 0.001485354
x2 -0.024766597 0.044830038 -0.5524554 0.600595338
x3 0.005864434 0.005051602 1.1609058 0.289775583

\(var(\beta_3) = (x^\top x)^{-1}_{(4,4)} \sigma^2\)

xx_i[4,4]
## 0.0009*sigma^2
0.000862438702127396
hat_var_b3 <- xx_i[4,4] * mse
hat_var_b3
A matrix: 1 × 1 of type dbl
0.4037738

\(var(\hat{\beta}_3) = 0.0000255\)

(4)

분산분석표를 작성하고, 결정계수 \(R^2\)을 구하시오.

anova(m3)
A anova: 4 × 5
Df Sum Sq Mean Sq F value Pr(>F)
<int> <dbl> <dbl> <dbl> <dbl>
x1 1 2.004315887 2.004315887 67.73858599 0.0001737553
x2 1 0.002273071 0.002273071 0.07682153 0.7909535228
x3 1 0.039877142 0.039877142 1.34770234 0.2897755827
Residuals 6 0.177533900 0.029588983 NA NA
I10 = diag(rep(1,10))
I10
A matrix: 10 × 10 of type dbl
1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 1
H = x %*% xx_i %*% t(x)
H
A matrix: 10 × 10 of type dbl
0.479007503 -0.005805719 -0.08066882 -0.01958705 0.26225012 0.09655116 -0.23996631 0.10866366 0.28730870 0.11224675
-0.005805719 0.186448902 0.16479959 0.27196857 0.01139329 0.10399383 0.06627062 0.18477003 -0.01335934 0.02952023
-0.080668821 0.164799589 0.33179614 0.23355629 0.24558622 -0.08987596 0.23998739 0.04338569 -0.01509426 -0.07347228
-0.019587049 0.271968566 0.23355629 0.48736361 0.01178437 0.05219047 -0.10292798 0.28777554 -0.11085118 -0.11127263
0.262250120 0.011393292 0.24558622 0.01178437 0.58743384 -0.22034983 0.06947229 -0.08527926 0.20954991 -0.09184095
0.096551163 0.103993827 -0.08987596 0.05219047 -0.22034983 0.36048096 0.08217297 0.20539705 0.10911272 0.30032662
-0.239966310 0.066270622 0.23998739 -0.10292798 0.06947229 0.08217297 0.69646084 -0.11029224 0.06925883 0.22956358
0.108663660 0.184770026 0.04338569 0.28777554 -0.08527926 0.20539705 -0.11029224 0.27283220 0.01617393 0.07657339
0.287308704 -0.013359337 -0.01509426 -0.11085118 0.20954991 0.10911272 0.06925883 0.01617393 0.25886069 0.18903998
0.112246749 0.029520231 -0.07347228 -0.11127263 -0.09184095 0.30032662 0.22956358 0.07657339 0.18903998 0.33931531

\(SSE = y^\top (I-H)y\)

\(SST = y^\top y - n(\bar{y})^2\)

sse = t(y) %*% (I10-H) %*% y
sse
A matrix: 1 × 1 of type dbl
0.1775339
sst = t(y) %*% y - 10 * mean(y)^2
sst
A matrix: 1 × 1 of type dbl
2.224
ssr = sst- sse
ssr
A matrix: 1 × 1 of type dbl
2.046466
msr = ssr/3
msr
A matrix: 1 × 1 of type dbl
0.6821554
mse = sse/6
mse
A matrix: 1 × 1 of type dbl
0.02958898
f0 = msr/mse
f0
A matrix: 1 × 1 of type dbl
23.05437
qf(0.95,3,6)
4.75706266308941
요인 제곱합 자유도(df) 평균제곱(MS) \(F_0\) 유의확률
회귀 2.0465 3 0.6822 23.0544 4.7571
잔차 0.1775 6 0.0296
2.2240 9

\(R^2\)

ssr/sst
#0.9202
A matrix: 1 × 1 of type dbl
0.9201736

(5)

\(x_1 = 20, x_2 = 27, x_3 = 60\)에서 평균 물소비량을 추정하시오. 이 추정된 소비량의 표준편차의 추정값을 구하시오.

new_dt <- data.frame(x1=20, x2=27, x3 = 60)
new_dt
A data.frame: 1 × 3
x1 x2 x3
<dbl> <dbl> <dbl>
20 27 60
predict(m3, newdata = new_dt)
1: 3.48814105556645
x0 <- c(1,20,27,60)
x0
  1. 1
  2. 20
  3. 27
  4. 60
mu0 <- x0 %*% b
mu0
A matrix: 1 × 1 of type dbl
3.488141

\(\hat{\mu}_0 = 3.4881\)

var_mu0 <- (t(x0) %*% xx_i %*% x0) * mse
var_mu0
A matrix: 1 × 1 of type dbl
0.005065205
sqrt(var_mu0)
# se(hat mu0) = 0.0712
A matrix: 1 × 1 of type dbl
0.07117026

(6)

\(x_1 = 20, x_2 = 27, x_3 = 60\)에서 어느 한 달의 물소비량 \(y_s\) 를 예측하여라. 이 예측된 물소비량의 표준편차의 추정값을 구하시오.

x0 <- c(1,20,27,60)
x0
  1. 1
  2. 20
  3. 27
  4. 60
mu0 <- x0 %*% b
mu0
A matrix: 1 × 1 of type dbl
3.488141

\(\hat{y}_s = 3.4881\)

var_ys <- (1+t(x0) %*% xx_i %*% x0) * mse
var_ys
A matrix: 1 × 1 of type dbl
0.03465419
sqrt(var_ys)
# se(hat ys) = 0.1862
A matrix: 1 × 1 of type dbl
0.1861564

(7)

\(y_j = \beta_0 + \beta_1 x_{1j} + \beta_2 x_{2j} + \epsilon_j\) 라 가정하고 회귀제곱합 SSR의 기대값을 \(\sigma^2\)\(\beta_i, i = 0, 1, 2, 3\)의 함수로 표시하여라.

I10 <- diag(rep(1,10))
I10
A matrix: 10 × 10 of type dbl
1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 1
one1 <- rep(1,10)
one1
  1. 1
  2. 1
  3. 1
  4. 1
  5. 1
  6. 1
  7. 1
  8. 1
  9. 1
  10. 1
Jn <- one1 %*% t(one1)
Jn
A matrix: 10 × 10 of type dbl
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
t(x) %*% (I10 - Jn/10) %*% x
A matrix: 4 × 4 of type dbl
6.661338e-16 1.160183e-14 1.751377e-14 4.096723e-14
-1.065814e-14 3.229000e+02 -1.450000e+01 4.342000e+02
-1.332268e-14 -1.450000e+01 2.650000e+01 1.310000e+02
-1.421085e-14 4.342000e+02 1.310000e+02 2.619600e+03

(8)

\(\beta_3\)의 95% 신뢰구간을 구하시오. 이 신뢰구간의 의미를 해석하시오.

confint(m3)
A matrix: 4 × 2 of type dbl
2.5 % 97.5 %
(Intercept) -0.345896601 5.16432262
x1 0.038858661 0.10071735
x2 -0.134461748 0.08492855
x3 -0.006496391 0.01822526

\(\beta_3 \pm t_{0.025}(6) se(\hat{\beta}_3)\)

b[4] - qt(0.975,6) * sqrt(hat_var_b3)
A matrix: 1 × 1 of type dbl
-1.548982
b[4] + qt(0.975,6) * sqrt(hat_var_b3)
A matrix: 1 × 1 of type dbl
1.56071
#(-0.0065, 0.0182)

이 신뢰구간이 \(\beta_3\)을 포함하고 있을 것이라고 95% 확신한다.

(9)

\(\beta_1\)의 99% 신뢰구간을 구하시오. 이 신뢰구간의 의미를 해석하시오.

confint(m3, level=0.99)
A matrix: 4 × 2 of type dbl
0.5 % 99.5 %
(Intercept) -1.76517953 6.58360555
x1 0.02292554 0.11665047
x2 -0.19097074 0.14143754
x3 -0.01286402 0.02459289

\(\beta_1 \pm t_{0.005}(6) se(\hat{\beta}_1)\)

hat_var_b1 = xx_i[2,2] * mse
hat_var_b1
A matrix: 1 × 1 of type dbl
0.0001597735
b[2] - qt(0.995,6) * sqrt(hat_var_b1)
A matrix: 1 × 1 of type dbl
0.02292554
b[2] + qt(0.995,6) * sqrt(hat_var_b1)
A matrix: 1 × 1 of type dbl
0.1166505
#(0.0229, 0.1167)

이 신뢰구간이 \(\beta_1\)을 포함하고 있을 것이라고 99% 확신한다.

(10)

\(x_1 = 20, x_2 = 27, x_3 = 60\)에서 평균 물소비량, \(E(y)\)의 95% 신뢰구간을 구하시오.

predict(m3, newdata = new_dt, interval = c("confidence"))
A matrix: 1 × 3 of type dbl
fit lwr upr
1 3.488141 3.313994 3.662288
mu0 - qt(0.975,6) * sqrt(var_mu0)
A matrix: 1 × 1 of type dbl
3.313994
mu0 + qt(0.975,6) * sqrt(var_mu0)
A matrix: 1 × 1 of type dbl
3.662288
#(3.3140, 3.6623)

(11)

가설 \(H_0 : \beta_1 = 0, H_1 : \beta_1 > 0\)\(\alpha = 0.05\)로 검정하시오.

summary(m3)

Call:
lm(formula = y ~ ., data = dt)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.23490 -0.07744 -0.02166  0.08840  0.23442 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept)  2.409213   1.125954   2.140  0.07618 . 
x1           0.069788   0.012640   5.521  0.00149 **
x2          -0.024767   0.044830  -0.552  0.60060   
x3           0.005864   0.005052   1.161  0.28978   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.172 on 6 degrees of freedom
Multiple R-squared:  0.9202,    Adjusted R-squared:  0.8803 
F-statistic: 23.05 on 3 and 6 DF,  p-value: 0.001079

검정통계량 : \(t_0 = \frac{\hat{\beta}_1}{se(\hat{\beta}_1)}\)

t0 <- b[2]/sqrt(hat_var_b1)
t0
A matrix: 1 × 1 of type dbl
5.521135

\(t_0 = 5.5211\)

기각역 : \(t_{0.05}(6) = 1.9432\)

qt(0.95,6)
1.9431802805153

결론 : \(H_0\) 기각 가능. 즉 유의수준 5%에서 \(\beta_2\)는 0보다 크다고 할 수 있다.

\(T = \frac{\hat{\beta}_i - \beta^0_i}{\hat{\sigma} \sqrt{c_{ii}}}\)

(12)

가설 \(H_0 : \beta_1 = \beta_2 = \beta_3\)\(\alpha = 0.05\)로 검정하시오.

library(car) 
Loading required package: carData
linearHypothesis(m3, matrix(c(0,1,-1,0,0,0,1,-1), byrow = T, nrow=2),c(0,0))
A anova: 2 × 6
Res.Df RSS Df Sum of Sq F Pr(>F)
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 8 1.1224764 NA NA NA NA
2 6 0.1775339 2 0.9449425 15.96781 0.003956509

\(H_0 : \beta_1 = \beta_2 = \beta_3\)

\(T= (0,1,-1,0) (0,0,1,-1)\)

\(y= \beta_0 + \beta_1 x_1 + \beta_1 x_2 + \beta_1 x_3 + \epsilon = \beta_0 + \beta_1(x_1+x_2+x_3)\)

z <- matrix(c(rep(1,10), dt$x1+dt$x2+dt$x3), ncol=2)
z
A matrix: 10 × 2 of type dbl
1 101
1 122
1 133
1 142
1 126
1 87
1 95
1 116
1 93
1 79
zz <- t(z) %*% z
zz
A matrix: 2 × 2 of type dbl
10 1094
1094 123754
zz_i <- solve(zz)
zz_i
A matrix: 2 × 2 of type dbl
3.04034002 -0.0268769654
-0.02687697 0.0002456761
zy <- t(z) %*% y
zy
A matrix: 2 × 1 of type dbl
34.6
3852.2
yy <- t(y) %*% y
yy
A matrix: 1 × 1 of type dbl
121.94
z %*% zz_i %*% t(z)
A matrix: 10 × 10 of type dbl
0.11733491 0.073997642 0.051297170 0.03272406 0.065742925 0.146226415 0.12971698 0.08637972 0.133844340 0.162735849
0.07399764 0.139003538 0.173054245 0.20091392 0.151385613 0.030660377 0.05542453 0.12043042 0.049233491 0.005896226
0.05129717 0.173054245 0.236831761 0.28901336 0.196246069 -0.029874214 0.01650943 0.13826651 0.004913522 -0.076257862
0.03272406 0.200913915 0.289013365 0.36109473 0.232950079 -0.079402516 -0.01533019 0.15285967 -0.031348270 -0.143474843
0.06574292 0.151385613 0.196246069 0.23295008 0.167698506 0.008647799 0.04127358 0.12691627 0.033117138 -0.023977987
0.14622642 0.030660377 -0.029874214 -0.07940252 0.008647799 0.223270440 0.17924528 0.06367925 0.190251572 0.267295597
0.12971698 0.055424528 0.016509434 -0.01533019 0.041273585 0.179245283 0.15094340 0.07665094 0.158018868 0.207547170
0.08637972 0.120430425 0.138266509 0.15285967 0.126916274 0.063679245 0.07665094 0.11070165 0.073408019 0.050707547
0.13384434 0.049233491 0.004913522 -0.03134827 0.033117138 0.190251572 0.15801887 0.07340802 0.166077044 0.222484277
0.16273585 0.005896226 -0.076257862 -0.14347484 -0.023977987 0.267295597 0.20754717 0.05070755 0.222484277 0.327044025
gammahat <- zz_i %*% t(z) %*% y
gammahat
A matrix: 2 × 1 of type dbl
1.66031840
0.01645047
e <- y - z %*% gammahat
e
A matrix: 10 × 1 of type dbl
-0.521816038
0.232724057
0.051768868
0.403714623
-0.633077830
0.008490566
0.276886792
0.031426887
-0.190212264
0.340094340
sse_rm <- t(e) %*% e
sse_rm
A matrix: 1 × 1 of type dbl
1.122476
f0 = ((sse_rm - sse)/(2))/(sse/6)
f0
A matrix: 1 × 1 of type dbl
15.96781
qf(0.95,2,6)
5.14325284978472

검정통계량 : \(\frac{(SSE_{RM} - SSE_{FM})/r}{SSE_{FM}/(n-p-1)}\)

관측값 \(f_0 = 15.968\)

기각역 : \(f_0 > 5.1433\)

결론 : 기각할 수 있다.

(13)

가설 \(H_0 : \beta_1 = \beta_2 + 3\)\(\alpha = 0.05\)로 검정하시오.

\(H_0 : \beta_1 = \beta_2 = \beta_3\)

\(T= (0,1,-1,0)\)

\(y = \beta_0 + (\beta_2+3)x_1 + \beta_2 x_2 + \beta_3 x_3 + \epsilon\) \(y - 3x_1 = \beta_0 + \beta_2(x_1 + x_2) + \beta_3 x_3 + \epsilon\)

yy <- y-3*dt$x1
yy
  1. -27.2
  2. -68.1
  3. -71.1
  4. -79.6
  5. -41.9
  6. -50.9
  7. -62.5
  8. -62.4
  9. -33
  10. -41.7
z <- matrix(c(rep(1,10), dt$x1+dt$x2,dt$x3), ncol=3)
z
A matrix: 10 × 3 of type dbl
1 37 64
1 50 72
1 53 80
1 54 88
1 45 81
1 42 45
1 49 46
1 47 69
1 39 54
1 40 39
zz <- t(z) %*% z
zz
A matrix: 3 × 3 of type dbl
10 456 638
456 21114 29658
638 29658 43324
zz_i <- solve(zz)
zz_i
A matrix: 3 × 3 of type dbl
6.76054651 -0.160413550 0.0102556645
-0.16041355 0.005038964 -0.0010871974
0.01025566 -0.001087197 0.0006163093
gammahat <- zz_i %*% t(z) %*% yy
gammahat
A matrix: 3 × 1 of type dbl
77.7140851
-3.1683286
0.2025345
e <- yy - z %*% gammahat
e
A matrix: 10 × 1 of type dbl
-0.6481331
-1.9801368
2.9045732
-4.0473741
6.5554097
-4.6583347
5.7174312
-5.1775193
1.9138690
-0.5797851
sse_rm <- t(e) %*% e
sse_rm
A matrix: 1 × 1 of type dbl
157.327
f0 = ((sse_rm - sse)/(1))/(sse/6)
f0
A matrix: 1 × 1 of type dbl
5311.082
qf(0.95,1,6)
5.9873776072737

검정통계량 : \(\frac{(SSE_RM - SSE_FM)/r)}{(SSE_FM/(n-p-1))}\)

관측값 \(f_0 = 5311.082\)

기각역 : \(f_0 > 5.9874\)

결론 : 기각할 수 있다.

(14)

\(x_1 = 20, x_2 = 27, x_3 = 60\)에서 \(E(y)\)에 관한 가설 \(H_0 : E(y) = 3.5, H_1 : E(y) ̸\neq 3.5\)\(\alpha = 0.05\)로 검정하시오.

sy 풀이

\(T_0 = \frac{\hat{y} - E(y)}{\sqrt{Var{\hat{y}}}} = \frac{\hat{y} - E(Y)}{\sqrt{x^\top (X^\top X)^{-1} x MSE}} \sim t(n-1)\)

x_indi <- matrix(c(1,20,27,60),nrow=1,ncol=4)
yhat <- x_indi %*% b
round(yhat,5)
A matrix: 1 × 1 of type dbl
3.48814

\(Var{y} = x_i (X^\top X)^{-1} x_i^\top \sigma^2\)

var_y <- ((x_indi %*%solve(t(x) %*% x) %*% t(x_indi) ) * mse)[,]
round(var_y,5)
0.00507
t_0 <- (yhat - 3.5)/sqrt(var_y)
round(t_0[,],5)
-0.16663
round(-pt(0.025,9),5)
-0.5097

\(F_0=-0.16663\) > \(t_{0.025}(9)=-0.5097\) 로 귀무가설을 기각하지 못하여 \(E(y) = 3.5\)라는 결론이 나온다.

4.

중회귀모형, \(y = X\beta + \epsilon, \epsilon \sim N(0_n, I\sigma^2)\)에서 \(X\)\(n \times (p + 1)\)행렬이고 rank가 \(p + 1\)이라면 적합된 모형 \(\hat{y} = X\hat{\beta}\)에 대하여 \[\sum^{n}_{j=1} Var(\hat{y}_j ) = (p + 1)\sigma^2\] 이 됨을 증명하여라.

\(\hat{\bf{y}}= X\hat{\beta} = X(X^\top X)^{-1} X^\top y = Hy\)

\(Var(X\hat{\beta}) = XVar(\hat{\beta})X^\top\)

\(Var(\hat{\beta}) = Var((X^\top X)^{-1} X^\top y) = (X^\top X)^{-1} X^\top Var(y)X(X^\top X)^{-1} = (X^\top X)^{-1} X^\top X(X^\top X)^{-1} \sigma^2 = (X^\top X )^{-1} \sigma^2\)

\(Var(\hat{y}) = Var(X\hat{\beta}) = X(X^\top X)^{-1} X^\top \sigma^2\)

\(\to Var(\hat{y}) = HVar(y) H^\top = HH\sigma^2 = H\sigma^2\)

\(\star H^\top = H, H^2 = H, Var(y) = I_n \sigma^2\)

\(\sum^{n}_{j=1} Var(\hat{y}_j) = tr(Var(\hat{y})) = tr(H\sigma^2) = \sigma^2 tr(H) = tr((X^\top X)^{-1} X^\top X)\sigma^2 = tr(I_{p+1})\sigma^2 = (p+1)\sigma^2\)

\(\star tr(X (X^\top X)^{-1} X^\top)\)

5.

중회귀모형에서 다음을 증명하시오.

(1)

\(Cov(e, y) = \sigma^2[I_n − X(X^\top X)^{-1}X^\top]\)

\(\bf{e} = y - \hat{y} = y - Hy = (I-H)y\)

\(Cov(\bf{e},y) = Cov((I-H)y,y) = (I-H)Var(y) = (I-H) \sigma^2\)

\(\star Cov(Ax,y) = A Cov(X,Y)\)

\(\star Var(y) = \sigma^2\)

(2)

\(Cov(e, \hat{y}) = \mathbb{O}_n\)

\(Cov(e(I - H)\bf{y},Hy) = (I-H) Cov(y,y)H^\top = (I-H)H\sigma^2 = \mathbb{O}_n \sigma^2\)

\(\star Cov(Ax,By) = ACov(X,Y)B^\top\)

\(\star H^\top= H\)

\(\star H^2 = H\)

(3)

\(Cov(e, \hat{\beta}) = \mathbb{O}_{n\times(k+1)}\)

\(Cov((I-H)\bf{y},(X^\top X)^{-1} y) = (I-H) Cov(y,y) X (X^\top X)^{-1} = (I - X(X^\top X)^{-1} X^\top ) X (X^\top X)^{-1} \sigma^2 = \{ X(X^\top X)^{-1} - X(X^\top X)^{-1} \} \sigma^2 = \mathbb{O}_{n \times ([+1)}\)

(4)

\(Cov(\epsilon, \hat{\beta}) = \sigma^2 X(X^\top X)^{-1}\)

\(\bf{\hat{\beta}} = (X^\top X)^{-1} X^\top y = (X^\top X)^{-1} X^\top (X\beta + \epsilon) = \beta + (X^\top X)^{-1} X^\top \epsilon\)

\(Cov(\bf{\epsilon} , \beta + (X^\top X)^{-1} X^\top \epsilon ) = Cov(\epsilon, \beta) + Cov(\epsilon, \epsilon)X(X^\top X)^{-1} = \sigma^2 X(X^\top X)^{-1}\)

\(\star Cov(\epsilon, \beta) = 0\)

\(\star Cov(\epsilon, \epsilon) = I_n \sigma^2\)

(5)

\(\sum^{n}_{j=1} e_j y_j = SSE\)

\(\sum^{n}_{j=1} e_j y_j = \bf{e^\top y} = \{ (I - H)y\}^\top y = y^\top (I-H)y\)

\(\star \bf{e}^\top (e_1, \dots , e_n) = (y-\hat{y})^\top\)

\(\star \bf{y}^\top = (y_1 , \dots , y_n)\)

\(SSE = \sum^n_{j=1} (y_i - \hat{y}_j)^2 = (\bf{y} - \hat{y} ) ^\top ( y - \hat{y} ) = e^\top e = \{ (I - H)y \}^\top \{ (I - H)y \} = y^\top (I - H) (I-H)y = y^\top (I - H)y\)

\(\star I - H_H+H^2 = I-H, H^2 = H\)

\(\therefore \sum^{n}_{j=1} e_j y_j = SSE\)

(6)

\(\sum^{n}_{j=1} e_j \hat{y}_j = 0\)

\(\sum^{n}_{j=1} \bf{e_j \hat{y}_j} = e^\top \hat{y} = y^\top (I-H) Hy = y^\top_{1\times n} \mathbb{O}_{n\times n} y_{n \times 1} = 0\)

\(\star H - H^2 = H - H = \mathbb{O}_{n \times n}\)

1 plus.

다음의 데이터에 대하여 모형 \(y_i = \beta_0 + \beta_1 x_{1j} + \beta_2 x_{2j} + \epsilon_j, \epsilon \sim N(0,\sigma^2)\)이 옳다고 가정하고 질문에 답하여라.

y 1 5 0 4 4 -1
x_1 1 2 1 3 3 3
x_2 1 1 2 1 2 3
library(car)
y <- c(1,5,0,4,4,-1)
x <- matrix(c(1,1,1,1,1,1,1,2,1,3,3,3,1,1,2,1,2,3), ncol=3)
dt <- data.frame(y=y,x1=x[,2],x2 = x[,3])
m <- lm(y~., dt)
summary(m)

Call:
lm(formula = y ~ ., data = dt)

Residuals:
      1       2       3       4       5       6 
-1.1395  1.3488  0.4651 -1.1628  1.4419 -0.9535 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   3.2326     1.9740   1.638   0.2000  
x1            1.5116     0.7713   1.960   0.1449  
x2           -2.6047     0.9288  -2.804   0.0676 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.599 on 3 degrees of freedom
Multiple R-squared:  0.7511,    Adjusted R-squared:  0.5852 
F-statistic: 4.527 on 2 and 3 DF,  p-value: 0.1242
anova(m)
A anova: 3 × 5
Df Sum Sq Mean Sq F value Pr(>F)
<int> <dbl> <dbl> <dbl> <dbl>
x1 1 3.040230 3.04023 1.188454 0.35537433
x2 1 20.118685 20.11868 7.864577 0.06760655
Residuals 3 7.674419 2.55814 NA NA

(1)

\(X^\top X, X^\top y y^\top y\)\((X^\top X)^{-1}\) 을 구하시오.

xx <- t(x) %*% x
xx
A matrix: 3 × 3 of type dbl
6 13 10
13 33 23
10 23 20
xx_i <- solve(xx)
xx_i
A matrix: 3 × 3 of type dbl
1.5232558 -0.34883721 -0.36046512
-0.3488372 0.23255814 -0.09302326
-0.3604651 -0.09302326 0.33720930
xx_i2 <- round(xx_i,2)
xx_i2
A matrix: 3 × 3 of type dbl
1.52 -0.35 -0.36
-0.35 0.23 -0.09
-0.36 -0.09 0.34
xy <- t(x) %*% y
xy
A matrix: 3 × 1 of type dbl
13
32
15
yy <- t(y) %*% y
yy
A matrix: 1 × 1 of type dbl
59

(2)

\(\beta_0, \beta_1, \beta_2, \sigma^2\)를 추정하시오.

coef(m)
(Intercept)
3.23255813953489
x1
1.51162790697674
x2
-2.6046511627907
anova(m)$Mean[3]
2.55813953488372
b <- xx_i %*% xy
b
A matrix: 3 × 1 of type dbl
3.232558
1.511628
-2.604651
b1 <- round(round(xx_i,2)%*% xy,2)
b1
A matrix: 3 × 1 of type dbl
3.16
1.46
-2.46
haty <- x %*% b
haty
A matrix: 6 × 1 of type dbl
2.13953488
3.65116279
-0.46511628
5.16279070
2.55813953
-0.04651163
haty_2 <- round(x %*% b1,2)
haty_2
A matrix: 6 × 1 of type dbl
2.16
3.62
-0.30
5.08
2.62
0.16
e <- y - haty
e
A matrix: 6 × 1 of type dbl
-1.1395349
1.3488372
0.4651163
-1.1627907
1.4418605
-0.9534884
e_2 <- y - haty_2
e_2
A matrix: 6 × 1 of type dbl
-1.16
1.38
0.30
-1.08
1.38
-1.16
sse <- t(e) %*% e
sse
A matrix: 1 × 1 of type dbl
7.674419
sse_2 <- t(e_2) %*% e_2
sse_2
A matrix: 1 × 1 of type dbl
7.7564
mse <- sse / 3
mse
A matrix: 1 × 1 of type dbl
2.55814
mse_2 <- sse_2 / 3
mse_2
A matrix: 1 × 1 of type dbl
2.585467

(3)

\(x_1 = x_2 = 2\)에서 \(E(y)\)의 95% 신뢰구간을 구하시오.

new_dt <- data.frame(x1=2, x2=2)
new_dt
A data.frame: 1 × 2
x1 x2
<dbl> <dbl>
2 2
predict(m, newdata = new_dt,interval = c("confidence"), level = 0.95)
A matrix: 1 × 3 of type dbl
fit lwr upr
1 1.046512 -1.345982 3.439005
x0 <- c(1,2,2)
mu0 <- x0 %*% b
t(x0) %*% xx_i %*% x0
A matrix: 1 × 1 of type dbl
0.2209302
var_mu0 <- (t(x0) %*% xx_i %*% x0) * mse
var_mu0
A matrix: 1 × 1 of type dbl
0.5651704
tse <- qt(0.975,3) * sqrt(var_mu0)
tse
A matrix: 1 × 1 of type dbl
2.392494
mu0 - (tse)
A matrix: 1 × 1 of type dbl
-1.345982
mu0 + (tse)
A matrix: 1 × 1 of type dbl
3.439005

(4)

\(\beta_1\)의 90% 신뢰구간을 구하시오.

summary(m)

Call:
lm(formula = y ~ ., data = dt)

Residuals:
      1       2       3       4       5       6 
-1.1395  1.3488  0.4651 -1.1628  1.4419 -0.9535 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   3.2326     1.9740   1.638   0.2000  
x1            1.5116     0.7713   1.960   0.1449  
x2           -2.6047     0.9288  -2.804   0.0676 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.599 on 3 degrees of freedom
Multiple R-squared:  0.7511,    Adjusted R-squared:  0.5852 
F-statistic: 4.527 on 2 and 3 DF,  p-value: 0.1242
confint(m, level = 0.9)
A matrix: 3 × 2 of type dbl
5 % 95 %
(Intercept) -1.4129961 7.8781124
x1 -0.3035404 3.3267962
x2 -4.7904032 -0.4188991
se_b1 <- sqrt(xx_i[2,2] * mse)
se_b1
A matrix: 1 × 1 of type dbl
0.7713081
tse <- qt(0.95,3) *se_b1
tse
A matrix: 1 × 1 of type dbl
1.815168
b[2] - tse
A matrix: 1 × 1 of type dbl
-0.3035404
b[2] + tse
A matrix: 1 × 1 of type dbl
3.326796

(5)

가설 \(H_0: \beta_1 = \beta_2, H_1: \beta_1 \neq \beta_2\)을 유의수준 \(\alpha = 0.05\)

library(car)
linearHypothesis(m, c(0,1,-1), 0)
A anova: 2 × 6
Res.Df RSS Df Sum of Sq F Pr(>F)
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4 30.092308 NA NA NA NA
2 3 7.674419 1 22.41789 8.763357 0.05952979
z <- matrix(c(1,1,1,1,1,1,x[,2]+x[,3]), ncol=2)
zz <- t(z) %*% z
zz
A matrix: 2 × 2 of type dbl
6 23
23 99
zz_i <- solve(zz)
zz_i
A matrix: 2 × 2 of type dbl
1.5230769 -0.35384615
-0.3538462 0.09230769
zy <- t(z) %*% y
zy
A matrix: 2 × 1 of type dbl
13
47
yy <- t(y) %*% y
yy
A matrix: 1 × 1 of type dbl
59
z %*% zz_i %*% t(z)
A matrix: 6 × 6 of type dbl
0.47692308 3.076923e-01 3.076923e-01 0.1384615 -0.03076923 -2.000000e-01
0.30769231 2.307692e-01 2.307692e-01 0.1538462 0.07692308 1.110223e-16
0.30769231 2.307692e-01 2.307692e-01 0.1538462 0.07692308 1.110223e-16
0.13846154 1.538462e-01 1.538462e-01 0.1692308 0.18461538 2.000000e-01
-0.03076923 7.692308e-02 7.692308e-02 0.1846154 0.29230769 4.000000e-01
-0.20000000 1.110223e-16 1.110223e-16 0.2000000 0.40000000 6.000000e-01
gammahat <- zz_i %*% t(z) %*% y
gammahat
A matrix: 2 × 1 of type dbl
3.1692308
-0.2615385
e <- y - z %*% gammahat
e
A matrix: 6 × 1 of type dbl
-1.646154
2.615385
-2.384615
1.876923
2.138462
-2.600000
sse_rm <- t(e) %*% e
sse_rm
A matrix: 1 × 1 of type dbl
30.09231
f0 = (sse_rm - sse)/(sse/3)
f0
A matrix: 1 × 1 of type dbl
8.763357
qf(0.95,1,3)
10.1279644860139

T검정 방법

q = c(0,1,-1)
var_qb = t(q) %*% xx_i %*% q*  mse
var_qb
A matrix: 1 × 1 of type dbl
1.933478
se_qb = sqrt(var_qb)
se_qb
A matrix: 1 × 1 of type dbl
1.390495
t0 = (t(q) %*% b )/se_qb
t0
A matrix: 1 × 1 of type dbl
2.960297
t0^2
A matrix: 1 × 1 of type dbl
8.763357

(6)

가설 \(H_0 : \beta_1 = 0, H_1 : \beta_1 \neq 0\)을 유의수준 \(\alpha = 0.1\)로 검정하시오. 위 (3)의 질문의 결과와 어떠한 연관을 지을 수 있는가?

t0 <- b[2] / se_b1
qt(0.95,3)
2.35336343480182

(7)

\(x_1 = 2, x_2 = 3\)에서 가설 \(H_0 : E(y) = 4,H_1 : E(y) \neq 4\)\(\alpha = 0.05\)로 검정하시오

new_dt <- data.frame(x1=2, x2=2)
predict(m, newdata = new_dt,interval = c("confidence"), level = 0.95)
A matrix: 1 × 3 of type dbl
fit lwr upr
1 1.046512 -1.345982 3.439005
x0 <- c(1,2,3)
mu0 <- x0 %*% b
t(x0) %*% xx_i %*% x0
A matrix: 1 × 1 of type dbl
0.8139535
var_mu0 <- (t(x0) %*% xx_i %*% x0) * mse
var_mu0
A matrix: 1 × 1 of type dbl
2.082207
t0 <- (mu0-4)/ sqrt(var_mu0)
t0
A matrix: 1 × 1 of type dbl
-3.851834
qt(0.975,3)
3.18244630528371

(8)

가설 \(H_0 : \beta_2 = 0, H_1: \beta_2 <0\)\(\alpha = 0.05\) 로 검정하시오.

se_b2 <- sqrt(xx_i[3,3] * mse)
se_b2
A matrix: 1 × 1 of type dbl
0.9287779
t0<- b[3] / se_b2
t0
A matrix: 1 × 1 of type dbl
-2.804385

(9)

가설 $H_0: _1 = 2_2, H_1 : _1 _2 $를 \(\alpha=0..05\)로 검정하시오.

linearHypothesis(m, c(0,1,-2), 0)
A anova: 2 × 6
Res.Df RSS Df Sum of Sq F Pr(>F)
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4 30.797619 NA NA NA NA
2 3 7.674419 1 23.1232 9.039069 0.05737101
z <- matrix(c(1,1,1,1,1,1,2*x[,2]+x[,3]), ncol=2)
zz <- t(z) %*% z
zz
A matrix: 2 × 2 of type dbl
6 36
36 244
zz_i <- solve(zz)
zz_i
A matrix: 2 × 2 of type dbl
1.4523810 -0.21428571
-0.2142857 0.03571429
zy <- t(z) %*% y
zy
A matrix: 2 × 1 of type dbl
13
79
yy <- t(y) %*% y
yy
A matrix: 1 × 1 of type dbl
59
z %*% zz_i %*% t(z)
A matrix: 6 × 6 of type dbl
0.48809524 0.27380952 0.38095238 0.05952381 -0.04761905 -0.15476190
0.27380952 0.20238095 0.23809524 0.13095238 0.09523810 0.05952381
0.38095238 0.23809524 0.30952381 0.09523810 0.02380952 -0.04761905
0.05952381 0.13095238 0.09523810 0.20238095 0.23809524 0.27380952
-0.04761905 0.09523810 0.02380952 0.23809524 0.30952381 0.38095238
-0.15476190 0.05952381 -0.04761905 0.27380952 0.38095238 0.48809524
gammahat <- zz_i %*% t(z) %*% y
gammahat
A matrix: 2 × 1 of type dbl
1.95238095
0.03571429
e <- y - z %*% gammahat
e
A matrix: 6 × 1 of type dbl
-1.059524
2.869048
-2.095238
1.797619
1.761905
-3.273810
sse_rm <- t(e) %*% e
sse_rm
A matrix: 1 × 1 of type dbl
30.79762
f0 = (sse_rm - sse)/(sse/3)
f0
A matrix: 1 × 1 of type dbl
9.039069
qf(0.95,1,3)
10.1279644860139