고급회귀분석 과제, CH03,07

고급회귀분석 세번째 과제입니다.

제출 기한 : 11월 21일

제출 방법

직접 제출(607호) 도 가능하지만,
문서 작성 후 pdf로 변환(★★★)하여 lms에 제출을 추천

(pdf 아닌 문서는 미제출로 간주)

주의사항

pdf로 꼭 변환하여 제출
풀이가 꼭 있어야 함 (답만 적혀 있는 경우 ’0’점 처리)
부정행위 시 ’F’학점
계산은 R로 해도 되지만 계산 풀이 과정이 꼭 있어야 함!!

예) R에서 lm으로 beta의 추정량을 구하면 안 됨. 수업 시간에 배운 식으로 풀이를 적어야 함.

행렬식 계산은 R로 해도 됩니다.

************ R을 이용해서 푸는 문제는, R 코드도 같이 업로드.

1.

어떤 큰 공장에서 동일한 기계들의 정비기록에 관한 표본자료를 취하였다. 이는 기계의 사용연도(age of machines)와 정비비용(maintenance cost) 간에 어떤 관계가 있는가를 밝혀내기 위한 것이다. 그 자료는 다음과 같다 (표본의 크기 $n = 14$).

사용연도$X$(단위:년)	정비비용$Y$(단위:1,000원)
3	39
1	24
5	115
8	105
1	50
4	86
2	67
6	90
9	140
3	112
5	70
7	186
2	43
6	126

dt <- data.frame(y=c(39,24,115,105,50,86,67,90,140,112,70,186,43,126), 
                 x=c(3,1,5,8,1,4,2,6,9,3,5,7,2,6))

x <- matrix(c(rep(1,14), dt$x), ncol=2)

y <- matrix(c(dt$y), ncol=1)

(1)

$X^\top X, X^\top y, y^\top y$와 $(X^\top X)^{-1}$을 구하시오.

xx <- t(x) %*% x

xx

A matrix: 2 × 2 of type dbl
14	62
62	360

xy <- t(x) %*% y

xy

A matrix: 2 × 1 of type dbl
1253
6714

yy <- t(y) %*% y

yy

A matrix: 1 × 1 of type dbl
138197

xx_i <- solve(xx)

xx_i

A matrix: 2 × 2 of type dbl
0.30100334	-0.05183946
-0.05183946	0.01170569

(2)

$\hat{\beta} = (X^\top X)^{-1}X^\top y$를 구하고, 적합된 회귀선형을 써 보아라.

betahat <- xx_i %*%  xy

betahat

A matrix: 2 × 1 of type dbl
29.10702
13.63712

$y = 29.1 + 13.6x$

coef(lm(y~x, dt))

(Intercept): 29.1070234113712
x: 13.6371237458194

(3)

$\sigma^2$을 MSE 로 추정할 경우, $\hat{\beta}$의 분산-공분산행렬의 추정은 $\hat{Var}(\hat{\beta}) = (X^\top X)^{-1}$(MSE)이다. 먼저 분산분석하여 MSE 를 구하고 $\hat{Var}(\hat{\beta})$을 구하시오

n <- 14

p <- 1

In = diag(rep(1,n))

H = x %*% xx_i %*% t(x)

$SSE = y^\top(I-H)y$

$SST = y^\top y - n(\bar{y})^2$

sse = t(y) %*% (In-H) %*% y
sse

A matrix: 1 × 1 of type dbl
10166.25

sst = t(y) %*% y - n * mean(y)^2
sst

A matrix: 1 × 1 of type dbl
26053.5

ssr = sst- sse
ssr

A matrix: 1 × 1 of type dbl
15887.25

msr = ssr/p
msr

A matrix: 1 × 1 of type dbl
15887.25

mse = sse/(n-p-1)
mse

A matrix: 1 × 1 of type dbl
847.1876

f0 = msr/mse
f0

A matrix: 1 × 1 of type dbl
18.75293

qf(0.95,p,n-p-1)

4.74722534672251

분산분석표

요인	제곱합	자유도(df)	평균제곱(MS)	$F_0$	유의확률
회귀	15887.25	1	15887.25	18.75293	4.747225
잔차	10166.25	12	847.19
계	26053.5	13

hat_var_betahat <- mse[,] * xx_i

vcov(lm(y~x, dt))

A matrix: 2 × 2 of type dbl
	(Intercept)	x
(Intercept)	255.00629	-43.917750
x	-43.91775	9.916911

2.

어떤 공정에서 나오는 제품의 강도$(kg/cm^2)$가 그 공정의 온도와 압력에 어떠한 영향을 받는가를 조사하기 위하여 다음의 데이터를 얻었다.

공정온도$x_1$(단위:°C)	공정압력$x_2$(단위:$psi$)	강도$y$(단위:$kg/cm^2$)
195	57	81.4
179	61	122.2
205	60	101.7
204	62	175.6
201	61	150.3
184	54	64.8
210	58	92.1
209	61	113.8

dt <- data.frame(y=c(81.4,122.2,101.7,175.6,150.3,64.8,92.1,113.8),
                 x1 = c(195,179,205,204,201,184,210,209),
                 x2 = c(57,61,60,62,61,54,58,61))

m2 <- lm(y~., dt)

summary(m2)


Call:
lm(formula = y ~ ., data = dt)

Residuals:
      1       2       3       4       5       6       7       8 
 -5.259 -14.775 -18.742  31.259  17.279  11.744  -3.723 -17.783 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept) -554.3112   196.9775  -2.814   0.0374 *
x1            -0.1797     0.7626  -0.236   0.8230  
x2            11.8600     3.2301   3.672   0.0144 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.64 on 5 degrees of freedom
Multiple R-squared:  0.7478,    Adjusted R-squared:  0.6469 
F-statistic: 7.412 on 2 and 5 DF,  p-value: 0.03195

anova(m2)

A anova: 3 × 5
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
	<int>	<dbl>	<dbl>	<dbl>	<dbl>
x1	1	628.467	628.4670	1.342371	0.29894191
x2	1	6311.728	6311.7278	13.481505	0.01441752
Residuals	5	2340.884	468.1768	NA	NA

(1)

선형회귀모형, $y_j = \beta_0 + \beta_1 x_{1j} + \beta_2 x_{2j} + \epsilon_j$ 가 성립된다고 가정하고 데이터로부터 회귀모형을 추정하시오.

x = matrix(c(rep(1,8),dt$x1, dt$x2), ncol=3)

y = dt$y

xx <- t(x) %*% x
xx

A matrix: 3 × 3 of type dbl
8	1587	474
1587	315745	94108
474	94108	28136

xx_i <- solve(xx)
xx_i

A matrix: 3 × 3 of type dbl
82.8749894	-0.134598917	-0.945973490
-0.1345989	0.001242266	-0.001887521
-0.9459735	-0.001887521	0.022285408

xy <- t(x) %*% y
xy

A matrix: 3 × 1 of type dbl
901.9
179676.4
54034.3

yy <- t(y) %*% y
yy

A matrix: 1 × 1 of type dbl
110959

b <- xx_i %*% xy
b

A matrix: 3 × 1 of type dbl
-554.3112232
-0.1797396
11.8599927

$y = -554.31 - 0.1797 x_1 + 11.86 x_2$

(2)

오차분산이 $\sigma^2 = 3$이라 하면, $Var(\hat{\beta}_0), Var(\hat{\beta}_1), Var(\hat{\beta}_2)$와 $Cov(\hat{\beta}_1, \hat{\beta}_2)$는 무엇인가?

$var(\hat{\beta}) = (x^\top x)^{-1} \sigma^2$

xx_i * 3

A matrix: 3 × 3 of type dbl
248.6249683	-0.403796751	-2.837920471
-0.4037968	0.003726798	-0.005662562
-2.8379205	-0.005662562	0.066856223

$var(\hat{\beta}_0) = 248.6250$

$var(\hat{\beta}_1) = 0.0037$

$var(\hat{\beta}_2) = 0.0669$

$cov(\hat{\beta}_1, \hat{\beta}_2) = -0.0057$

(3)

$x_1 = 200$°C이고 $x_2 = 59\text{psi}$에서 평균 제품의 강도의 추정값 $\hat{y}$는 얼마인가? 이의 분산을 $\sigma^2 = 3$이라 가정하고 구하시오.

new_dt <- data.frame(x1=200, x2=59)
new_dt

A data.frame: 1 × 2
x1	x2
<dbl>	<dbl>
200	59

predict(m2, newdata = new_dt)

1: 109.480424963516

x0 <- c(1,200,59)
x0

mu0 <- x0 %*% b
mu0

A matrix: 1 × 1 of type dbl
109.4804

$\hat{\mu}_0 = 109.4804$

(4)

추정된 회귀계수 $\hat{\beta}_1, \hat{\beta}_2$의 의미는 무엇인가?

$\hat{\beta}_1$ : 공정압력이 일정하게 유지되었을 때, 공정온도가 1도씨 증가하면 강도는 0.1797만큼 감소한다.

$\hat{\beta}_2$ : 공정온도가 일정하게 유지되었을 때, 공정압력이 1psi 증가하면 강도는 11.8600만큼 증가한다.

(5)

분산분석표를 작성하고 $\alpha = 0.05$로 $F$-검정을 행하시오.

anova(m2)

A anova: 3 × 5
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
	<int>	<dbl>	<dbl>	<dbl>	<dbl>
x1	1	628.467	628.4670	1.342371	0.29894191
x2	1	6311.728	6311.7278	13.481505	0.01441752
Residuals	5	2340.884	468.1768	NA	NA

I8 = diag(rep(1,8))
I8

A matrix: 8 × 8 of type dbl
1	0	0	0	0	0	0	0
0	1	0	0	0	0	0	0
0	0	1	0	0	0	0	0
0	0	0	1	0	0	0	0
0	0	0	0	1	0	0	0
0	0	0	0	0	1	0	0
0	0	0	0	0	0	1	0
0	0	0	0	0	0	0	1

H = x %*% xx_i %*% t(x)
H

A matrix: 8 × 8 of type dbl
0.223303342	0.0473478224	0.0925307250	0.004932881	0.04854185	0.354021685	0.18034566	0.04897604
0.047347822	0.7875815614	0.0003377034	0.178850120	0.18539614	0.121730006	-0.28766297	-0.03358038
0.092530725	0.0003377034	0.1733021360	0.174906227	0.15025388	0.004944942	0.19895553	0.20476885
0.004932881	0.1788501200	0.1749062270	0.274444297	0.21838554	-0.166837528	0.08255641	0.23276205
0.048541845	0.1853961381	0.1502538806	0.218385537	0.18446745	-0.053127978	0.08195337	0.18412975
0.354021685	0.1217300062	0.0049449423	-0.166837528	-0.05312798	0.711046519	0.14493505	-0.11671270
0.180345664	-0.2876629720	0.1989555317	0.082556415	0.08195337	0.144935052	0.38255762	0.21635932
0.048976035	-0.0335803794	0.2047688541	0.232762052	0.18412975	-0.116712699	0.21635932	0.26329707

$SSE = y^\top (I-H)y$

$SST = y^\top y - n(\bar{y})^2$

sse = t(y) %*% (I8-H) %*% y
sse

A matrix: 1 × 1 of type dbl
2340.884

sst = t(y) %*% y - 8 * mean(y)^2
sst

A matrix: 1 × 1 of type dbl
9281.079

ssr = sst- sse
ssr

A matrix: 1 × 1 of type dbl
6940.195

msr = ssr/2
msr

A matrix: 1 × 1 of type dbl
3470.097

mse = sse/5
mse

A matrix: 1 × 1 of type dbl
468.1768

f0 = msr/mse
f0

A matrix: 1 × 1 of type dbl
7.411938

qf(0.95,2,5)

5.78613504334997

요인	제곱합	자유도(df)	평균제곱(MS)	$F_0$	유의확률
회귀	6940.195	2	3470.097	7.4119	5.786135
잔차	2340.884	5	468.177
계	9281.079	7

결론 : 검정통계량의 관측값이 기각역에 속하므로 귀무가설 기각. 즉 모형이 유의수준 0.05하에서 유의하다고 할 수 있다.

(6)

결정계수 $R^2$을 구하시오.

summary(m2)$r.squared

0.747778895028607

$R^2 = \frac{SSR}{SST}$

ssr/sst
# 0.7478

A matrix: 1 × 1 of type dbl
0.7477789

(7)

수정된 결정계수 $R^2_{adj}$ 을 구하시오.

summary(m2)$adj

0.64689045304005

$R^2_{adj} = 1-\frac{SSE(n-p-1)}{SST/(n-1)}$

1-(sse/5)/(sst/7)
# 0.6469

A matrix: 1 × 1 of type dbl
0.6468905

(8)

$\sigma^2$의 추정값 MSE 를 구하시오.

$\sigma^2 = MSE$

mse
##468.17687

A matrix: 1 × 1 of type dbl
468.1768

(9)

MSR의 기대값을 $\sigma^2$과 $\beta_0, \beta_1, \beta_2$의 함수로 표시하여라.

I8 <- diag(rep(1,8))
I8

A matrix: 8 × 8 of type dbl
1	0	0	0	0	0	0	0
0	1	0	0	0	0	0	0
0	0	1	0	0	0	0	0
0	0	0	1	0	0	0	0
0	0	0	0	1	0	0	0
0	0	0	0	0	1	0	0
0	0	0	0	0	0	1	0
0	0	0	0	0	0	0	1

one1 <- rep(1,8)
one1

Jn <- one1 %*% t(one1)
Jn

A matrix: 8 × 8 of type dbl
1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1

t(x) %*% (I8 - Jn/8) %*% x

A matrix: 3 × 3 of type dbl
0	0.000	0.00
0	923.875	78.25
0	78.250	51.50

3.

어떤 공장에서 물의 소비량을 조사하기 위하여 매달의 물소비량($y$), 평균온도($x_1$), 작업일수($x_2$)와 작업량($x_3$)에 관한 데이터를 얻었다.

물소비량$y$(단위:1,000톤)	평균온도$x_1$(단위:°C)	작업일수$x_2$(단위:일)	작업량$x_3$(단위:1,000톤)
2.8	10	27	64
3.9	24	26	72
3.9	25	28	80
4.4	28	26	88
3.1	15	30	81
3.1	18	24	45
3.5	22	27	46
3.6	22	25	69
3.0	12	27	54
3.3	15	25	39

dt <- data.frame(y=c(2.8,3.9,3.9,4.4,3.1,3.1,3.5,3.6,3.0,3.3),
                 x1 = c(10,24,25,28,15,18,22,22,12,15),
                 x2 = c(27,26,28,26,30,24,27,25,27,25),
                 x3 = c(64,72,80,88,81,45,46,69,54,39))

m3 <- lm(y~., dt)

summary(m3)


Call:
lm(formula = y ~ ., data = dt)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.23490 -0.07744 -0.02166  0.08840  0.23442 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept)  2.409213   1.125954   2.140  0.07618 . 
x1           0.069788   0.012640   5.521  0.00149 **
x2          -0.024767   0.044830  -0.552  0.60060   
x3           0.005864   0.005052   1.161  0.28978   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.172 on 6 degrees of freedom
Multiple R-squared:  0.9202,    Adjusted R-squared:  0.8803 
F-statistic: 23.05 on 3 and 6 DF,  p-value: 0.001079

anova(m3)

A anova: 4 × 5
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
	<int>	<dbl>	<dbl>	<dbl>	<dbl>
x1	1	2.004315887	2.004315887	67.73858599	0.0001737553
x2	1	0.002273071	0.002273071	0.07682153	0.7909535228
x3	1	0.039877142	0.039877142	1.34770234	0.2897755827
Residuals	6	0.177533900	0.029588983	NA	NA

(1)

데이터로부터 회귀모형, $\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x_1 + \hat{\beta}_2 x_2 + \hat{\beta}_3 x_3$ 을 구하여라. 그리고 어째서 이 모형이 선택되었는가에 대하여 토의하시오.

x = matrix(c(rep(1,10),dt$x1, dt$x2, dt$x3), ncol=4)

y = dt$y

xx <- t(x) %*% x
xx

A matrix: 4 × 4 of type dbl
10	191	265	638
191	3971	5047	12620
265	5047	7049	17038
638	12620	17038	43324

xx_i <- solve(xx)
xx_i

A matrix: 4 × 4 of type dbl
42.8460778	-0.274517258	-1.666784957	0.1044984818
-0.2745173	0.005399764	0.009802163	-0.0013851965
-1.6667850	0.009802163	0.067921641	-0.0050213139
0.1044985	-0.001385196	-0.005021314	0.0008624387

xy <- t(x) %*% y
xy

A matrix: 4 × 1 of type dbl
34.6
686.3
916.0
2249.9

yy <- t(y) %*% y
yy

A matrix: 1 × 1 of type dbl
121.94

b <- xx_i %*% xy
b

A matrix: 4 × 1 of type dbl
2.409213010
0.069788005
-0.024766597
0.005864434

$\hat{y} = 2.409 + 0.0698x_1 - 0.0248x_2 + 0.0059x_3$

(2)

$\hat{\beta}_1, \hat{\beta}_2$와 $\hat{\beta}_3$의 의미는 무엇인가?

$\hat{\beta}_1$ : 작업일수, 작업량이 변하지 않을 때, 평균온도가 1도씨 증가하면 물소비량은 69.8톤 증가한다.

$\hat{\beta}_2$ : 평균온도, 작업량이 변하지 않을 때, 작업일수가 1일 증가하면 물소비량은 24.8톤 감소한다.

$\hat{\beta}_3$ : 평균온도, 작업일수이 변하지 않을 때, 작업량이 1000톤 증가하면 물소비량은 5.9톤 증가한다.

(3)

$Var(\hat{\beta}_3)$을 구하시오. $\sigma^2$을 MSE 로 추정하면 $\hat{Var}(\hat{\beta}_3)$은 무엇인가?

summary(m3)$coefficients

A matrix: 4 × 4 of type dbl
	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	2.409213010	1.125953765	2.1397087	0.076181015
x1	0.069788005	0.012640155	5.5211353	0.001485354
x2	-0.024766597	0.044830038	-0.5524554	0.600595338
x3	0.005864434	0.005051602	1.1609058	0.289775583

$var(\beta_3) = (x^\top x)^{-1}_{(4,4)} \sigma^2$

xx_i[4,4]
## 0.0009*sigma^2

0.000862438702127396

hat_var_b3 <- xx_i[4,4] * mse
hat_var_b3

A matrix: 1 × 1 of type dbl
0.4037738

$var(\hat{\beta}_3) = 0.0000255$

(4)

분산분석표를 작성하고, 결정계수 $R^2$을 구하시오.

anova(m3)

A anova: 4 × 5
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
	<int>	<dbl>	<dbl>	<dbl>	<dbl>
x1	1	2.004315887	2.004315887	67.73858599	0.0001737553
x2	1	0.002273071	0.002273071	0.07682153	0.7909535228
x3	1	0.039877142	0.039877142	1.34770234	0.2897755827
Residuals	6	0.177533900	0.029588983	NA	NA

I10 = diag(rep(1,10))
I10

A matrix: 10 × 10 of type dbl
1	0	0	0	0	0	0	0	0	0
0	1	0	0	0	0	0	0	0	0
0	0	1	0	0	0	0	0	0	0
0	0	0	1	0	0	0	0	0	0
0	0	0	0	1	0	0	0	0	0
0	0	0	0	0	1	0	0	0	0
0	0	0	0	0	0	1	0	0	0
0	0	0	0	0	0	0	1	0	0
0	0	0	0	0	0	0	0	1	0
0	0	0	0	0	0	0	0	0	1

H = x %*% xx_i %*% t(x)
H

A matrix: 10 × 10 of type dbl
0.479007503	-0.005805719	-0.08066882	-0.01958705	0.26225012	0.09655116	-0.23996631	0.10866366	0.28730870	0.11224675
-0.005805719	0.186448902	0.16479959	0.27196857	0.01139329	0.10399383	0.06627062	0.18477003	-0.01335934	0.02952023
-0.080668821	0.164799589	0.33179614	0.23355629	0.24558622	-0.08987596	0.23998739	0.04338569	-0.01509426	-0.07347228
-0.019587049	0.271968566	0.23355629	0.48736361	0.01178437	0.05219047	-0.10292798	0.28777554	-0.11085118	-0.11127263
0.262250120	0.011393292	0.24558622	0.01178437	0.58743384	-0.22034983	0.06947229	-0.08527926	0.20954991	-0.09184095
0.096551163	0.103993827	-0.08987596	0.05219047	-0.22034983	0.36048096	0.08217297	0.20539705	0.10911272	0.30032662
-0.239966310	0.066270622	0.23998739	-0.10292798	0.06947229	0.08217297	0.69646084	-0.11029224	0.06925883	0.22956358
0.108663660	0.184770026	0.04338569	0.28777554	-0.08527926	0.20539705	-0.11029224	0.27283220	0.01617393	0.07657339
0.287308704	-0.013359337	-0.01509426	-0.11085118	0.20954991	0.10911272	0.06925883	0.01617393	0.25886069	0.18903998
0.112246749	0.029520231	-0.07347228	-0.11127263	-0.09184095	0.30032662	0.22956358	0.07657339	0.18903998	0.33931531

$SSE = y^\top (I-H)y$

$SST = y^\top y - n(\bar{y})^2$

sse = t(y) %*% (I10-H) %*% y
sse

A matrix: 1 × 1 of type dbl
0.1775339

sst = t(y) %*% y - 10 * mean(y)^2
sst

A matrix: 1 × 1 of type dbl
2.224

ssr = sst- sse
ssr

A matrix: 1 × 1 of type dbl
2.046466

msr = ssr/3
msr

A matrix: 1 × 1 of type dbl
0.6821554

mse = sse/6
mse

A matrix: 1 × 1 of type dbl
0.02958898

f0 = msr/mse
f0

A matrix: 1 × 1 of type dbl
23.05437

qf(0.95,3,6)

4.75706266308941

요인	제곱합	자유도(df)	평균제곱(MS)	$F_0$	유의확률
회귀	2.0465	3	0.6822	23.0544	4.7571
잔차	0.1775	6	0.0296
계	2.2240	9

$R^2$

ssr/sst
#0.9202

A matrix: 1 × 1 of type dbl
0.9201736

(5)

$x_1 = 20, x_2 = 27, x_3 = 60$에서 평균 물소비량을 추정하시오. 이 추정된 소비량의 표준편차의 추정값을 구하시오.

new_dt <- data.frame(x1=20, x2=27, x3 = 60)
new_dt

A data.frame: 1 × 3
x1	x2	x3
<dbl>	<dbl>	<dbl>
20	27	60

predict(m3, newdata = new_dt)

1: 3.48814105556645

x0 <- c(1,20,27,60)
x0

mu0 <- x0 %*% b
mu0

A matrix: 1 × 1 of type dbl
3.488141

$\hat{\mu}_0 = 3.4881$

var_mu0 <- (t(x0) %*% xx_i %*% x0) * mse
var_mu0

A matrix: 1 × 1 of type dbl
0.005065205

sqrt(var_mu0)
# se(hat mu0) = 0.0712

A matrix: 1 × 1 of type dbl
0.07117026

(6)

$x_1 = 20, x_2 = 27, x_3 = 60$에서 어느 한 달의 물소비량 $y_s$ 를 예측하여라. 이 예측된 물소비량의 표준편차의 추정값을 구하시오.

x0 <- c(1,20,27,60)
x0

mu0 <- x0 %*% b
mu0

A matrix: 1 × 1 of type dbl
3.488141

$\hat{y}_s = 3.4881$

var_ys <- (1+t(x0) %*% xx_i %*% x0) * mse
var_ys

A matrix: 1 × 1 of type dbl
0.03465419

sqrt(var_ys)
# se(hat ys) = 0.1862

A matrix: 1 × 1 of type dbl
0.1861564

(7)

$y_j = \beta_0 + \beta_1 x_{1j} + \beta_2 x_{2j} + \epsilon_j$ 라 가정하고 회귀제곱합 SSR의 기대값을 $\sigma^2$과 $\beta_i, i = 0, 1, 2, 3$의 함수로 표시하여라.

I10 <- diag(rep(1,10))
I10

A matrix: 10 × 10 of type dbl
1	0	0	0	0	0	0	0	0	0
0	1	0	0	0	0	0	0	0	0
0	0	1	0	0	0	0	0	0	0
0	0	0	1	0	0	0	0	0	0
0	0	0	0	1	0	0	0	0	0
0	0	0	0	0	1	0	0	0	0
0	0	0	0	0	0	1	0	0	0
0	0	0	0	0	0	0	1	0	0
0	0	0	0	0	0	0	0	1	0
0	0	0	0	0	0	0	0	0	1

one1 <- rep(1,10)
one1

Jn <- one1 %*% t(one1)
Jn

A matrix: 10 × 10 of type dbl
1	1	1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1	1	1

t(x) %*% (I10 - Jn/10) %*% x

A matrix: 4 × 4 of type dbl
6.661338e-16	1.160183e-14	1.751377e-14	4.096723e-14
-1.065814e-14	3.229000e+02	-1.450000e+01	4.342000e+02
-1.332268e-14	-1.450000e+01	2.650000e+01	1.310000e+02
-1.421085e-14	4.342000e+02	1.310000e+02	2.619600e+03

(8)

$\beta_3$의 95% 신뢰구간을 구하시오. 이 신뢰구간의 의미를 해석하시오.

confint(m3)

A matrix: 4 × 2 of type dbl
	2.5 %	97.5 %
(Intercept)	-0.345896601	5.16432262
x1	0.038858661	0.10071735
x2	-0.134461748	0.08492855
x3	-0.006496391	0.01822526

$\beta_3 \pm t_{0.025}(6) se(\hat{\beta}_3)$

b[4] - qt(0.975,6) * sqrt(hat_var_b3)

A matrix: 1 × 1 of type dbl
-1.548982

b[4] + qt(0.975,6) * sqrt(hat_var_b3)

A matrix: 1 × 1 of type dbl
1.56071

#(-0.0065, 0.0182)

이 신뢰구간이 $\beta_3$을 포함하고 있을 것이라고 95% 확신한다.

(9)

$\beta_1$의 99% 신뢰구간을 구하시오. 이 신뢰구간의 의미를 해석하시오.

confint(m3, level=0.99)

A matrix: 4 × 2 of type dbl
	0.5 %	99.5 %
(Intercept)	-1.76517953	6.58360555
x1	0.02292554	0.11665047
x2	-0.19097074	0.14143754
x3	-0.01286402	0.02459289

$\beta_1 \pm t_{0.005}(6) se(\hat{\beta}_1)$

hat_var_b1 = xx_i[2,2] * mse
hat_var_b1

A matrix: 1 × 1 of type dbl
0.0001597735

b[2] - qt(0.995,6) * sqrt(hat_var_b1)

A matrix: 1 × 1 of type dbl
0.02292554

b[2] + qt(0.995,6) * sqrt(hat_var_b1)

A matrix: 1 × 1 of type dbl
0.1166505

#(0.0229, 0.1167)

이 신뢰구간이 $\beta_1$을 포함하고 있을 것이라고 99% 확신한다.

(10)

$x_1 = 20, x_2 = 27, x_3 = 60$에서 평균 물소비량, $E(y)$의 95% 신뢰구간을 구하시오.

predict(m3, newdata = new_dt, interval = c("confidence"))

A matrix: 1 × 3 of type dbl
	fit	lwr	upr
1	3.488141	3.313994	3.662288

mu0 - qt(0.975,6) * sqrt(var_mu0)

A matrix: 1 × 1 of type dbl
3.313994

mu0 + qt(0.975,6) * sqrt(var_mu0)

A matrix: 1 × 1 of type dbl
3.662288

#(3.3140, 3.6623)

(11)

가설 $H_0 : \beta_1 = 0, H_1 : \beta_1 > 0$을 $\alpha = 0.05$로 검정하시오.

summary(m3)


Call:
lm(formula = y ~ ., data = dt)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.23490 -0.07744 -0.02166  0.08840  0.23442 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept)  2.409213   1.125954   2.140  0.07618 . 
x1           0.069788   0.012640   5.521  0.00149 **
x2          -0.024767   0.044830  -0.552  0.60060   
x3           0.005864   0.005052   1.161  0.28978   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.172 on 6 degrees of freedom
Multiple R-squared:  0.9202,    Adjusted R-squared:  0.8803 
F-statistic: 23.05 on 3 and 6 DF,  p-value: 0.001079

검정통계량 : $t_0 = \frac{\hat{\beta}_1}{se(\hat{\beta}_1)}$

t0 <- b[2]/sqrt(hat_var_b1)
t0

A matrix: 1 × 1 of type dbl
5.521135

$t_0 = 5.5211$

기각역 : $t_{0.05}(6) = 1.9432$

qt(0.95,6)

1.9431802805153

결론 : $H_0$ 기각 가능. 즉 유의수준 5%에서 $\beta_2$는 0보다 크다고 할 수 있다.

$T = \frac{\hat{\beta}_i - \beta^0_i}{\hat{\sigma} \sqrt{c_{ii}}}$

(12)

가설 $H_0 : \beta_1 = \beta_2 = \beta_3$을 $\alpha = 0.05$로 검정하시오.

library(car)

Loading required package: carData

linearHypothesis(m3, matrix(c(0,1,-1,0,0,0,1,-1), byrow = T, nrow=2),c(0,0))

A anova: 2 × 6
	Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
1	8	1.1224764	NA	NA	NA	NA
2	6	0.1775339	2	0.9449425	15.96781	0.003956509

$H_0 : \beta_1 = \beta_2 = \beta_3$

$T= (0,1,-1,0) (0,0,1,-1)$

$y= \beta_0 + \beta_1 x_1 + \beta_1 x_2 + \beta_1 x_3 + \epsilon = \beta_0 + \beta_1(x_1+x_2+x_3)$

z <- matrix(c(rep(1,10), dt$x1+dt$x2+dt$x3), ncol=2)
z

A matrix: 10 × 2 of type dbl
1	101
1	122
1	133
1	142
1	126
1	87
1	95
1	116
1	93
1	79

zz <- t(z) %*% z
zz

A matrix: 2 × 2 of type dbl
10	1094
1094	123754

zz_i <- solve(zz)
zz_i

A matrix: 2 × 2 of type dbl
3.04034002	-0.0268769654
-0.02687697	0.0002456761

zy <- t(z) %*% y
zy

A matrix: 2 × 1 of type dbl
34.6
3852.2

yy <- t(y) %*% y
yy

A matrix: 1 × 1 of type dbl
121.94

z %*% zz_i %*% t(z)

A matrix: 10 × 10 of type dbl
0.11733491	0.073997642	0.051297170	0.03272406	0.065742925	0.146226415	0.12971698	0.08637972	0.133844340	0.162735849
0.07399764	0.139003538	0.173054245	0.20091392	0.151385613	0.030660377	0.05542453	0.12043042	0.049233491	0.005896226
0.05129717	0.173054245	0.236831761	0.28901336	0.196246069	-0.029874214	0.01650943	0.13826651	0.004913522	-0.076257862
0.03272406	0.200913915	0.289013365	0.36109473	0.232950079	-0.079402516	-0.01533019	0.15285967	-0.031348270	-0.143474843
0.06574292	0.151385613	0.196246069	0.23295008	0.167698506	0.008647799	0.04127358	0.12691627	0.033117138	-0.023977987
0.14622642	0.030660377	-0.029874214	-0.07940252	0.008647799	0.223270440	0.17924528	0.06367925	0.190251572	0.267295597
0.12971698	0.055424528	0.016509434	-0.01533019	0.041273585	0.179245283	0.15094340	0.07665094	0.158018868	0.207547170
0.08637972	0.120430425	0.138266509	0.15285967	0.126916274	0.063679245	0.07665094	0.11070165	0.073408019	0.050707547
0.13384434	0.049233491	0.004913522	-0.03134827	0.033117138	0.190251572	0.15801887	0.07340802	0.166077044	0.222484277
0.16273585	0.005896226	-0.076257862	-0.14347484	-0.023977987	0.267295597	0.20754717	0.05070755	0.222484277	0.327044025

gammahat <- zz_i %*% t(z) %*% y
gammahat

A matrix: 2 × 1 of type dbl
1.66031840
0.01645047

e <- y - z %*% gammahat
e

A matrix: 10 × 1 of type dbl
-0.521816038
0.232724057
0.051768868
0.403714623
-0.633077830
0.008490566
0.276886792
0.031426887
-0.190212264
0.340094340

sse_rm <- t(e) %*% e
sse_rm

A matrix: 1 × 1 of type dbl
1.122476

f0 = ((sse_rm - sse)/(2))/(sse/6)
f0

A matrix: 1 × 1 of type dbl
15.96781

qf(0.95,2,6)

5.14325284978472

검정통계량 : $\frac{(SSE_{RM} - SSE_{FM})/r}{SSE_{FM}/(n-p-1)}$

관측값 $f_0 = 15.968$

기각역 : $f_0 > 5.1433$

결론 : 기각할 수 있다.

(13)

가설 $H_0 : \beta_1 = \beta_2 + 3$을 $\alpha = 0.05$로 검정하시오.

$H_0 : \beta_1 = \beta_2 = \beta_3$

$T= (0,1,-1,0)$

$y = \beta_0 + (\beta_2+3)x_1 + \beta_2 x_2 + \beta_3 x_3 + \epsilon$ $y - 3x_1 = \beta_0 + \beta_2(x_1 + x_2) + \beta_3 x_3 + \epsilon$

yy <- y-3*dt$x1
yy

-27.2
-68.1
-71.1
-79.6
-41.9
-50.9
-62.5
-62.4
-33
-41.7

z <- matrix(c(rep(1,10), dt$x1+dt$x2,dt$x3), ncol=3)
z

A matrix: 10 × 3 of type dbl
1	37	64
1	50	72
1	53	80
1	54	88
1	45	81
1	42	45
1	49	46
1	47	69
1	39	54
1	40	39

zz <- t(z) %*% z
zz

A matrix: 3 × 3 of type dbl
10	456	638
456	21114	29658
638	29658	43324

zz_i <- solve(zz)
zz_i

A matrix: 3 × 3 of type dbl
6.76054651	-0.160413550	0.0102556645
-0.16041355	0.005038964	-0.0010871974
0.01025566	-0.001087197	0.0006163093

gammahat <- zz_i %*% t(z) %*% yy
gammahat

A matrix: 3 × 1 of type dbl
77.7140851
-3.1683286
0.2025345

e <- yy - z %*% gammahat
e

A matrix: 10 × 1 of type dbl
-0.6481331
-1.9801368
2.9045732
-4.0473741
6.5554097
-4.6583347
5.7174312
-5.1775193
1.9138690
-0.5797851

sse_rm <- t(e) %*% e
sse_rm

A matrix: 1 × 1 of type dbl
157.327

f0 = ((sse_rm - sse)/(1))/(sse/6)
f0

A matrix: 1 × 1 of type dbl
5311.082

qf(0.95,1,6)

5.9873776072737

검정통계량 : $\frac{(SSE_RM - SSE_FM)/r)}{(SSE_FM/(n-p-1))}$

관측값 $f_0 = 5311.082$

기각역 : $f_0 > 5.9874$

결론 : 기각할 수 있다.

(14)

$x_1 = 20, x_2 = 27, x_3 = 60$에서 $E(y)$에 관한 가설 $H_0 : E(y) = 3.5, H_1 : E(y) ̸\neq 3.5$를 $\alpha = 0.05$로 검정하시오.

sy 풀이

$T_0 = \frac{\hat{y} - E(y)}{\sqrt{Var{\hat{y}}}} = \frac{\hat{y} - E(Y)}{\sqrt{x^\top (X^\top X)^{-1} x MSE}} \sim t(n-1)$

x_indi <- matrix(c(1,20,27,60),nrow=1,ncol=4)

yhat <- x_indi %*% b
round(yhat,5)

A matrix: 1 × 1 of type dbl
3.48814

$Var{y} = x_i (X^\top X)^{-1} x_i^\top \sigma^2$

var_y <- ((x_indi %*%solve(t(x) %*% x) %*% t(x_indi) ) * mse)[,]
round(var_y,5)

0.00507

t_0 <- (yhat - 3.5)/sqrt(var_y)
round(t_0[,],5)

-0.16663

round(-pt(0.025,9),5)

-0.5097

$F_0=-0.16663$ > $t_{0.025}(9)=-0.5097$ 로 귀무가설을 기각하지 못하여 $E(y) = 3.5$라는 결론이 나온다.

4.

중회귀모형, $y = X\beta + \epsilon, \epsilon \sim N(0_n, I\sigma^2)$에서 $X$ 가 $n \times (p + 1)$행렬이고 rank가 $p + 1$이라면 적합된 모형 $\hat{y} = X\hat{\beta}$에 대하여 \[\sum^{n}_{j=1} Var(\hat{y}_j ) = (p + 1)\sigma^2\] 이 됨을 증명하여라.

$\hat{\bf{y}}= X\hat{\beta} = X(X^\top X)^{-1} X^\top y = Hy$

$Var(X\hat{\beta}) = XVar(\hat{\beta})X^\top$

$Var(\hat{\beta}) = Var((X^\top X)^{-1} X^\top y) = (X^\top X)^{-1} X^\top Var(y)X(X^\top X)^{-1} = (X^\top X)^{-1} X^\top X(X^\top X)^{-1} \sigma^2 = (X^\top X )^{-1} \sigma^2$

$Var(\hat{y}) = Var(X\hat{\beta}) = X(X^\top X)^{-1} X^\top \sigma^2$

$\to Var(\hat{y}) = HVar(y) H^\top = HH\sigma^2 = H\sigma^2$

$\star H^\top = H, H^2 = H, Var(y) = I_n \sigma^2$

$\sum^{n}_{j=1} Var(\hat{y}_j) = tr(Var(\hat{y})) = tr(H\sigma^2) = \sigma^2 tr(H) = tr((X^\top X)^{-1} X^\top X)\sigma^2 = tr(I_{p+1})\sigma^2 = (p+1)\sigma^2$

$\star tr(X (X^\top X)^{-1} X^\top)$

5.

중회귀모형에서 다음을 증명하시오.

(1)

$Cov(e, y) = \sigma^2[I_n − X(X^\top X)^{-1}X^\top]$

$\bf{e} = y - \hat{y} = y - Hy = (I-H)y$

$Cov(\bf{e},y) = Cov((I-H)y,y) = (I-H)Var(y) = (I-H) \sigma^2$

$\star Cov(Ax,y) = A Cov(X,Y)$

$\star Var(y) = \sigma^2$

(2)

$Cov(e, \hat{y}) = \mathbb{O}_n$

$Cov(e(I - H)\bf{y},Hy) = (I-H) Cov(y,y)H^\top = (I-H)H\sigma^2 = \mathbb{O}_n \sigma^2$

$\star Cov(Ax,By) = ACov(X,Y)B^\top$

$\star H^\top= H$

$\star H^2 = H$

(3)

$Cov(e, \hat{\beta}) = \mathbb{O}_{n\times(k+1)}$

$Cov((I-H)\bf{y},(X^\top X)^{-1} y) = (I-H) Cov(y,y) X (X^\top X)^{-1} = (I - X(X^\top X)^{-1} X^\top ) X (X^\top X)^{-1} \sigma^2 = \{ X(X^\top X)^{-1} - X(X^\top X)^{-1} \} \sigma^2 = \mathbb{O}_{n \times ([+1)}$

(4)

$Cov(\epsilon, \hat{\beta}) = \sigma^2 X(X^\top X)^{-1}$

$\bf{\hat{\beta}} = (X^\top X)^{-1} X^\top y = (X^\top X)^{-1} X^\top (X\beta + \epsilon) = \beta + (X^\top X)^{-1} X^\top \epsilon$

$Cov(\bf{\epsilon} , \beta + (X^\top X)^{-1} X^\top \epsilon ) = Cov(\epsilon, \beta) + Cov(\epsilon, \epsilon)X(X^\top X)^{-1} = \sigma^2 X(X^\top X)^{-1}$

$\star Cov(\epsilon, \beta) = 0$

$\star Cov(\epsilon, \epsilon) = I_n \sigma^2$

(5)

$\sum^{n}_{j=1} e_j y_j = SSE$

$\sum^{n}_{j=1} e_j y_j = \bf{e^\top y} = \{ (I - H)y\}^\top y = y^\top (I-H)y$

$\star \bf{e}^\top (e_1, \dots , e_n) = (y-\hat{y})^\top$

$\star \bf{y}^\top = (y_1 , \dots , y_n)$

$SSE = \sum^n_{j=1} (y_i - \hat{y}_j)^2 = (\bf{y} - \hat{y} ) ^\top ( y - \hat{y} ) = e^\top e = \{ (I - H)y \}^\top \{ (I - H)y \} = y^\top (I - H) (I-H)y = y^\top (I - H)y$

$\star I - H_H+H^2 = I-H, H^2 = H$

$\therefore \sum^{n}_{j=1} e_j y_j = SSE$

(6)

$\sum^{n}_{j=1} e_j \hat{y}_j = 0$

$\sum^{n}_{j=1} \bf{e_j \hat{y}_j} = e^\top \hat{y} = y^\top (I-H) Hy = y^\top_{1\times n} \mathbb{O}_{n\times n} y_{n \times 1} = 0$

$\star H - H^2 = H - H = \mathbb{O}_{n \times n}$

1 plus.

다음의 데이터에 대하여 모형 $y_i = \beta_0 + \beta_1 x_{1j} + \beta_2 x_{2j} + \epsilon_j, \epsilon \sim N(0,\sigma^2)$이 옳다고 가정하고 질문에 답하여라.

y	1	5	0	4	4	-1
x_1	1	2	1	3	3	3
x_2	1	1	2	1	2	3

library(car)

y <- c(1,5,0,4,4,-1)

x <- matrix(c(1,1,1,1,1,1,1,2,1,3,3,3,1,1,2,1,2,3), ncol=3)

dt <- data.frame(y=y,x1=x[,2],x2 = x[,3])

m <- lm(y~., dt)

summary(m)


Call:
lm(formula = y ~ ., data = dt)

Residuals:
      1       2       3       4       5       6 
-1.1395  1.3488  0.4651 -1.1628  1.4419 -0.9535 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   3.2326     1.9740   1.638   0.2000  
x1            1.5116     0.7713   1.960   0.1449  
x2           -2.6047     0.9288  -2.804   0.0676 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.599 on 3 degrees of freedom
Multiple R-squared:  0.7511,    Adjusted R-squared:  0.5852 
F-statistic: 4.527 on 2 and 3 DF,  p-value: 0.1242

anova(m)

A anova: 3 × 5
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
	<int>	<dbl>	<dbl>	<dbl>	<dbl>
x1	1	3.040230	3.04023	1.188454	0.35537433
x2	1	20.118685	20.11868	7.864577	0.06760655
Residuals	3	7.674419	2.55814	NA	NA

(1)

$X^\top X, X^\top y y^\top y$와 $(X^\top X)^{-1}$ 을 구하시오.

xx <- t(x) %*% x
xx

A matrix: 3 × 3 of type dbl
6	13	10
13	33	23
10	23	20

xx_i <- solve(xx)
xx_i

A matrix: 3 × 3 of type dbl
1.5232558	-0.34883721	-0.36046512
-0.3488372	0.23255814	-0.09302326
-0.3604651	-0.09302326	0.33720930

xx_i2 <- round(xx_i,2)
xx_i2

A matrix: 3 × 3 of type dbl
1.52	-0.35	-0.36
-0.35	0.23	-0.09
-0.36	-0.09	0.34

xy <- t(x) %*% y
xy

A matrix: 3 × 1 of type dbl
13
32
15

yy <- t(y) %*% y
yy

A matrix: 1 × 1 of type dbl
59

(2)

$\beta_0, \beta_1, \beta_2, \sigma^2$를 추정하시오.

coef(m)

(Intercept): 3.23255813953489
x1: 1.51162790697674
x2: -2.6046511627907

anova(m)$Mean[3]

2.55813953488372

b <- xx_i %*% xy
b

A matrix: 3 × 1 of type dbl
3.232558
1.511628
-2.604651

b1 <- round(round(xx_i,2)%*% xy,2)
b1

A matrix: 3 × 1 of type dbl
3.16
1.46
-2.46

haty <- x %*% b
haty

A matrix: 6 × 1 of type dbl
2.13953488
3.65116279
-0.46511628
5.16279070
2.55813953
-0.04651163

haty_2 <- round(x %*% b1,2)
haty_2

A matrix: 6 × 1 of type dbl
2.16
3.62
-0.30
5.08
2.62
0.16

e <- y - haty
e

A matrix: 6 × 1 of type dbl
-1.1395349
1.3488372
0.4651163
-1.1627907
1.4418605
-0.9534884

e_2 <- y - haty_2
e_2

A matrix: 6 × 1 of type dbl
-1.16
1.38
0.30
-1.08
1.38
-1.16

sse <- t(e) %*% e
sse

A matrix: 1 × 1 of type dbl
7.674419

sse_2 <- t(e_2) %*% e_2
sse_2

A matrix: 1 × 1 of type dbl
7.7564

mse <- sse / 3
mse

A matrix: 1 × 1 of type dbl
2.55814

mse_2 <- sse_2 / 3
mse_2

A matrix: 1 × 1 of type dbl
2.585467

(3)

$x_1 = x_2 = 2$에서 $E(y)$의 95% 신뢰구간을 구하시오.

new_dt <- data.frame(x1=2, x2=2)
new_dt

A data.frame: 1 × 2
x1	x2
<dbl>	<dbl>
2	2

predict(m, newdata = new_dt,interval = c("confidence"), level = 0.95)

A matrix: 1 × 3 of type dbl
	fit	lwr	upr
1	1.046512	-1.345982	3.439005

x0 <- c(1,2,2)

mu0 <- x0 %*% b

t(x0) %*% xx_i %*% x0

A matrix: 1 × 1 of type dbl
0.2209302

var_mu0 <- (t(x0) %*% xx_i %*% x0) * mse
var_mu0

A matrix: 1 × 1 of type dbl
0.5651704

tse <- qt(0.975,3) * sqrt(var_mu0)
tse

A matrix: 1 × 1 of type dbl
2.392494

mu0 - (tse)

A matrix: 1 × 1 of type dbl
-1.345982

mu0 + (tse)

A matrix: 1 × 1 of type dbl
3.439005

(4)

$\beta_1$의 90% 신뢰구간을 구하시오.

summary(m)


Call:
lm(formula = y ~ ., data = dt)

Residuals:
      1       2       3       4       5       6 
-1.1395  1.3488  0.4651 -1.1628  1.4419 -0.9535 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   3.2326     1.9740   1.638   0.2000  
x1            1.5116     0.7713   1.960   0.1449  
x2           -2.6047     0.9288  -2.804   0.0676 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.599 on 3 degrees of freedom
Multiple R-squared:  0.7511,    Adjusted R-squared:  0.5852 
F-statistic: 4.527 on 2 and 3 DF,  p-value: 0.1242

confint(m, level = 0.9)

A matrix: 3 × 2 of type dbl
	5 %	95 %
(Intercept)	-1.4129961	7.8781124
x1	-0.3035404	3.3267962
x2	-4.7904032	-0.4188991

se_b1 <- sqrt(xx_i[2,2] * mse)
se_b1

A matrix: 1 × 1 of type dbl
0.7713081

tse <- qt(0.95,3) *se_b1
tse

A matrix: 1 × 1 of type dbl
1.815168

b[2] - tse

A matrix: 1 × 1 of type dbl
-0.3035404

b[2] + tse

A matrix: 1 × 1 of type dbl
3.326796

(5)

가설 $H_0: \beta_1 = \beta_2, H_1: \beta_1 \neq \beta_2$을 유의수준 $\alpha = 0.05$

library(car)

linearHypothesis(m, c(0,1,-1), 0)

A anova: 2 × 6
	Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
1	4	30.092308	NA	NA	NA	NA
2	3	7.674419	1	22.41789	8.763357	0.05952979

z <- matrix(c(1,1,1,1,1,1,x[,2]+x[,3]), ncol=2)

zz <- t(z) %*% z
zz

A matrix: 2 × 2 of type dbl
6	23
23	99

zz_i <- solve(zz)
zz_i

A matrix: 2 × 2 of type dbl
1.5230769	-0.35384615
-0.3538462	0.09230769

zy <- t(z) %*% y
zy

A matrix: 2 × 1 of type dbl
13
47

yy <- t(y) %*% y
yy

A matrix: 1 × 1 of type dbl
59

z %*% zz_i %*% t(z)

A matrix: 6 × 6 of type dbl
0.47692308	3.076923e-01	3.076923e-01	0.1384615	-0.03076923	-2.000000e-01
0.30769231	2.307692e-01	2.307692e-01	0.1538462	0.07692308	1.110223e-16
0.30769231	2.307692e-01	2.307692e-01	0.1538462	0.07692308	1.110223e-16
0.13846154	1.538462e-01	1.538462e-01	0.1692308	0.18461538	2.000000e-01
-0.03076923	7.692308e-02	7.692308e-02	0.1846154	0.29230769	4.000000e-01
-0.20000000	1.110223e-16	1.110223e-16	0.2000000	0.40000000	6.000000e-01

gammahat <- zz_i %*% t(z) %*% y
gammahat

A matrix: 2 × 1 of type dbl
3.1692308
-0.2615385

e <- y - z %*% gammahat
e

A matrix: 6 × 1 of type dbl
-1.646154
2.615385
-2.384615
1.876923
2.138462
-2.600000

sse_rm <- t(e) %*% e
sse_rm

A matrix: 1 × 1 of type dbl
30.09231

f0 = (sse_rm - sse)/(sse/3)
f0

A matrix: 1 × 1 of type dbl
8.763357

qf(0.95,1,3)

10.1279644860139

T검정 방법

q = c(0,1,-1)

var_qb = t(q) %*% xx_i %*% q*  mse
var_qb

A matrix: 1 × 1 of type dbl
1.933478

se_qb = sqrt(var_qb)
se_qb

A matrix: 1 × 1 of type dbl
1.390495

t0 = (t(q) %*% b )/se_qb
t0

A matrix: 1 × 1 of type dbl
2.960297

t0^2

A matrix: 1 × 1 of type dbl
8.763357

(6)

가설 $H_0 : \beta_1 = 0, H_1 : \beta_1 \neq 0$을 유의수준 $\alpha = 0.1$로 검정하시오. 위 (3)의 질문의 결과와 어떠한 연관을 지을 수 있는가?

t0 <- b[2] / se_b1

qt(0.95,3)

2.35336343480182

(7)

$x_1 = 2, x_2 = 3$에서 가설 $H_0 : E(y) = 4,H_1 : E(y) \neq 4$를 $\alpha = 0.05$로 검정하시오

new_dt <- data.frame(x1=2, x2=2)

predict(m, newdata = new_dt,interval = c("confidence"), level = 0.95)

A matrix: 1 × 3 of type dbl
	fit	lwr	upr
1	1.046512	-1.345982	3.439005

x0 <- c(1,2,3)

mu0 <- x0 %*% b

t(x0) %*% xx_i %*% x0

A matrix: 1 × 1 of type dbl
0.8139535

var_mu0 <- (t(x0) %*% xx_i %*% x0) * mse
var_mu0

A matrix: 1 × 1 of type dbl
2.082207

t0 <- (mu0-4)/ sqrt(var_mu0)
t0

A matrix: 1 × 1 of type dbl
-3.851834

qt(0.975,3)

3.18244630528371

(8)

가설 $H_0 : \beta_2 = 0, H_1: \beta_2 <0$을 $\alpha = 0.05$ 로 검정하시오.

se_b2 <- sqrt(xx_i[3,3] * mse)
se_b2

A matrix: 1 × 1 of type dbl
0.9287779

t0<- b[3] / se_b2
t0

A matrix: 1 × 1 of type dbl
-2.804385

(9)

가설 $H_0: _1 = 2_2, H_1 : _1 _2 $를 $\alpha=0..05$로 검정하시오.

linearHypothesis(m, c(0,1,-2), 0)

A anova: 2 × 6
	Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
1	4	30.797619	NA	NA	NA	NA
2	3	7.674419	1	23.1232	9.039069	0.05737101

z <- matrix(c(1,1,1,1,1,1,2*x[,2]+x[,3]), ncol=2)

zz <- t(z) %*% z
zz

A matrix: 2 × 2 of type dbl
6	36
36	244

zz_i <- solve(zz)
zz_i

A matrix: 2 × 2 of type dbl
1.4523810	-0.21428571
-0.2142857	0.03571429

zy <- t(z) %*% y
zy

A matrix: 2 × 1 of type dbl
13
79

yy <- t(y) %*% y
yy

A matrix: 1 × 1 of type dbl
59

z %*% zz_i %*% t(z)

A matrix: 6 × 6 of type dbl
0.48809524	0.27380952	0.38095238	0.05952381	-0.04761905	-0.15476190
0.27380952	0.20238095	0.23809524	0.13095238	0.09523810	0.05952381
0.38095238	0.23809524	0.30952381	0.09523810	0.02380952	-0.04761905
0.05952381	0.13095238	0.09523810	0.20238095	0.23809524	0.27380952
-0.04761905	0.09523810	0.02380952	0.23809524	0.30952381	0.38095238
-0.15476190	0.05952381	-0.04761905	0.27380952	0.38095238	0.48809524

gammahat <- zz_i %*% t(z) %*% y
gammahat

A matrix: 2 × 1 of type dbl
1.95238095
0.03571429

e <- y - z %*% gammahat
e

A matrix: 6 × 1 of type dbl
-1.059524
2.869048
-2.095238
1.797619
1.761905
-3.273810

sse_rm <- t(e) %*% e
sse_rm

A matrix: 1 × 1 of type dbl
30.79762

f0 = (sse_rm - sse)/(sse/3)
f0

A matrix: 1 × 1 of type dbl
9.039069

qf(0.95,1,3)

10.1279644860139