고급회귀분석 과제, CH03,07
고급회귀분석 세번째 과제입니다.
제출 기한 : 11월 21일
제출 방법
(pdf 아닌 문서는 미제출로 간주)
주의사항
예) R에서 lm으로 beta의 추정량을 구하면 안 됨. 수업 시간에 배운 식으로 풀이를 적어야 함.
************ R을 이용해서 푸는 문제는, R 코드도 같이 업로드.
1.
어떤 큰 공장에서 동일한 기계들의 정비기록에 관한 표본자료를 취하였다. 이는 기계의 사용연도(age of machines)와 정비비용(maintenance cost) 간에 어떤 관계가 있는가를 밝혀내기 위한 것이다. 그 자료는 다음과 같다 (표본의 크기 \(n = 14\)).
3 |
39 |
1 |
24 |
5 |
115 |
8 |
105 |
1 |
50 |
4 |
86 |
2 |
67 |
6 |
90 |
9 |
140 |
3 |
112 |
5 |
70 |
7 |
186 |
2 |
43 |
6 |
126 |
dt <- data.frame(y=c(39,24,115,105,50,86,67,90,140,112,70,186,43,126),
x=c(3,1,5,8,1,4,2,6,9,3,5,7,2,6))
x <- matrix(c(rep(1,14), dt$x), ncol=2)
y <- matrix(c(dt$y), ncol=1)
(1)
\(X^\top X, X^\top y, y^\top y\)와 \((X^\top X)^{-1}\)을 구하시오.
A matrix: 2 × 2 of type dbl
14 |
62 |
62 |
360 |
A matrix: 2 × 1 of type dbl
1253 |
6714 |
A matrix: 1 × 1 of type dbl
138197 |
A matrix: 2 × 2 of type dbl
0.30100334 |
-0.05183946 |
-0.05183946 |
0.01170569 |
(2)
\(\hat{\beta} = (X^\top X)^{-1}X^\top y\)를 구하고, 적합된 회귀선형을 써 보아라.
A matrix: 2 × 1 of type dbl
29.10702 |
13.63712 |
\(y = 29.1 + 13.6x\)
- (Intercept)
- 29.1070234113712
- x
- 13.6371237458194
(3)
\(\sigma^2\)을 MSE 로 추정할 경우, \(\hat{\beta}\)의 분산-공분산행렬의 추정은 \(\hat{Var}(\hat{\beta}) = (X^\top X)^{-1}\)(MSE)이다. 먼저 분산분석하여 MSE 를 구하고 \(\hat{Var}(\hat{\beta})\)을 구하시오
\(SSE = y^\top(I-H)y\)
\(SST = y^\top y - n(\bar{y})^2\)
sse = t(y) %*% (In-H) %*% y
sse
A matrix: 1 × 1 of type dbl
10166.25 |
sst = t(y) %*% y - n * mean(y)^2
sst
A matrix: 1 × 1 of type dbl
26053.5 |
A matrix: 1 × 1 of type dbl
15887.25 |
A matrix: 1 × 1 of type dbl
15887.25 |
A matrix: 1 × 1 of type dbl
847.1876 |
A matrix: 1 × 1 of type dbl
18.75293 |
분산분석표
회귀 |
15887.25 |
1 |
15887.25 |
18.75293 |
4.747225 |
잔차 |
10166.25 |
12 |
847.19 |
|
|
계 |
26053.5 |
13 |
|
|
|
hat_var_betahat <- mse[,] * xx_i
A matrix: 2 × 2 of type dbl
(Intercept) |
255.00629 |
-43.917750 |
x |
-43.91775 |
9.916911 |
2.
어떤 공정에서 나오는 제품의 강도\((kg/cm^2)\)가 그 공정의 온도와 압력에 어떠한 영향을 받는가를 조사하기 위하여 다음의 데이터를 얻었다.
195 |
57 |
81.4 |
179 |
61 |
122.2 |
205 |
60 |
101.7 |
204 |
62 |
175.6 |
201 |
61 |
150.3 |
184 |
54 |
64.8 |
210 |
58 |
92.1 |
209 |
61 |
113.8 |
dt <- data.frame(y=c(81.4,122.2,101.7,175.6,150.3,64.8,92.1,113.8),
x1 = c(195,179,205,204,201,184,210,209),
x2 = c(57,61,60,62,61,54,58,61))
Call:
lm(formula = y ~ ., data = dt)
Residuals:
1 2 3 4 5 6 7 8
-5.259 -14.775 -18.742 31.259 17.279 11.744 -3.723 -17.783
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -554.3112 196.9775 -2.814 0.0374 *
x1 -0.1797 0.7626 -0.236 0.8230
x2 11.8600 3.2301 3.672 0.0144 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 21.64 on 5 degrees of freedom
Multiple R-squared: 0.7478, Adjusted R-squared: 0.6469
F-statistic: 7.412 on 2 and 5 DF, p-value: 0.03195
A anova: 3 × 5
|
<int> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
x1 |
1 |
628.467 |
628.4670 |
1.342371 |
0.29894191 |
x2 |
1 |
6311.728 |
6311.7278 |
13.481505 |
0.01441752 |
Residuals |
5 |
2340.884 |
468.1768 |
NA |
NA |
(1)
선형회귀모형, \(y_j = \beta_0 + \beta_1 x_{1j} + \beta_2 x_{2j} + \epsilon_j\) 가 성립된다고 가정하고 데이터로부터 회귀모형을 추정하시오.
x = matrix(c(rep(1,8),dt$x1, dt$x2), ncol=3)
A matrix: 3 × 3 of type dbl
8 |
1587 |
474 |
1587 |
315745 |
94108 |
474 |
94108 |
28136 |
A matrix: 3 × 3 of type dbl
82.8749894 |
-0.134598917 |
-0.945973490 |
-0.1345989 |
0.001242266 |
-0.001887521 |
-0.9459735 |
-0.001887521 |
0.022285408 |
A matrix: 3 × 1 of type dbl
901.9 |
179676.4 |
54034.3 |
A matrix: 1 × 1 of type dbl
110959 |
A matrix: 3 × 1 of type dbl
-554.3112232 |
-0.1797396 |
11.8599927 |
\(y = -554.31 - 0.1797 x_1 + 11.86 x_2\)
(2)
오차분산이 \(\sigma^2 = 3\)이라 하면, \(Var(\hat{\beta}_0), Var(\hat{\beta}_1), Var(\hat{\beta}_2)\)와 \(Cov(\hat{\beta}_1, \hat{\beta}_2)\)는 무엇인가?
\(var(\hat{\beta}) = (x^\top x)^{-1} \sigma^2\)
A matrix: 3 × 3 of type dbl
248.6249683 |
-0.403796751 |
-2.837920471 |
-0.4037968 |
0.003726798 |
-0.005662562 |
-2.8379205 |
-0.005662562 |
0.066856223 |
\(var(\hat{\beta}_0) = 248.6250\)
\(var(\hat{\beta}_1) = 0.0037\)
\(var(\hat{\beta}_2) = 0.0669\)
\(cov(\hat{\beta}_1, \hat{\beta}_2) = -0.0057\)
(3)
\(x_1 = 200\)°C이고 \(x_2 = 59\text{psi}\)에서 평균 제품의 강도의 추정값 \(\hat{y}\)는 얼마인가? 이의 분산을 \(\sigma^2 = 3\)이라 가정하고 구하시오.
new_dt <- data.frame(x1=200, x2=59)
new_dt
A data.frame: 1 × 2
<dbl> |
<dbl> |
200 |
59 |
predict(m2, newdata = new_dt)
1: 109.480424963516
A matrix: 1 × 1 of type dbl
109.4804 |
\(\hat{\mu}_0 = 109.4804\)
(4)
추정된 회귀계수 \(\hat{\beta}_1, \hat{\beta}_2\)의 의미는 무엇인가?
\(\hat{\beta}_1\) : 공정압력이 일정하게 유지되었을 때, 공정온도가 1도씨 증가하면 강도는 0.1797만큼 감소한다.
\(\hat{\beta}_2\) : 공정온도가 일정하게 유지되었을 때, 공정압력이 1psi 증가하면 강도는 11.8600만큼 증가한다.
(5)
분산분석표를 작성하고 \(\alpha = 0.05\)로 \(F\)-검정을 행하시오.
A anova: 3 × 5
|
<int> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
x1 |
1 |
628.467 |
628.4670 |
1.342371 |
0.29894191 |
x2 |
1 |
6311.728 |
6311.7278 |
13.481505 |
0.01441752 |
Residuals |
5 |
2340.884 |
468.1768 |
NA |
NA |
A matrix: 8 × 8 of type dbl
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
H = x %*% xx_i %*% t(x)
H
A matrix: 8 × 8 of type dbl
0.223303342 |
0.0473478224 |
0.0925307250 |
0.004932881 |
0.04854185 |
0.354021685 |
0.18034566 |
0.04897604 |
0.047347822 |
0.7875815614 |
0.0003377034 |
0.178850120 |
0.18539614 |
0.121730006 |
-0.28766297 |
-0.03358038 |
0.092530725 |
0.0003377034 |
0.1733021360 |
0.174906227 |
0.15025388 |
0.004944942 |
0.19895553 |
0.20476885 |
0.004932881 |
0.1788501200 |
0.1749062270 |
0.274444297 |
0.21838554 |
-0.166837528 |
0.08255641 |
0.23276205 |
0.048541845 |
0.1853961381 |
0.1502538806 |
0.218385537 |
0.18446745 |
-0.053127978 |
0.08195337 |
0.18412975 |
0.354021685 |
0.1217300062 |
0.0049449423 |
-0.166837528 |
-0.05312798 |
0.711046519 |
0.14493505 |
-0.11671270 |
0.180345664 |
-0.2876629720 |
0.1989555317 |
0.082556415 |
0.08195337 |
0.144935052 |
0.38255762 |
0.21635932 |
0.048976035 |
-0.0335803794 |
0.2047688541 |
0.232762052 |
0.18412975 |
-0.116712699 |
0.21635932 |
0.26329707 |
\(SSE = y^\top (I-H)y\)
\(SST = y^\top y - n(\bar{y})^2\)
sse = t(y) %*% (I8-H) %*% y
sse
A matrix: 1 × 1 of type dbl
2340.884 |
sst = t(y) %*% y - 8 * mean(y)^2
sst
A matrix: 1 × 1 of type dbl
9281.079 |
A matrix: 1 × 1 of type dbl
6940.195 |
A matrix: 1 × 1 of type dbl
3470.097 |
A matrix: 1 × 1 of type dbl
468.1768 |
A matrix: 1 × 1 of type dbl
7.411938 |
회귀 |
6940.195 |
2 |
3470.097 |
7.4119 |
5.786135 |
잔차 |
2340.884 |
5 |
468.177 |
|
|
계 |
9281.079 |
7 |
|
|
|
결론 : 검정통계량의 관측값이 기각역에 속하므로 귀무가설 기각. 즉 모형이 유의수준 0.05하에서 유의하다고 할 수 있다.
(6)
결정계수 \(R^2\)을 구하시오.
\(R^2 = \frac{SSR}{SST}\)
A matrix: 1 × 1 of type dbl
0.7477789 |
(7)
수정된 결정계수 \(R^2_{adj}\) 을 구하시오.
\(R^2_{adj} = 1-\frac{SSE(n-p-1)}{SST/(n-1)}\)
1-(sse/5)/(sst/7)
# 0.6469
A matrix: 1 × 1 of type dbl
0.6468905 |
(8)
\(\sigma^2\)의 추정값 MSE 를 구하시오.
\(\sigma^2 = MSE\)
A matrix: 1 × 1 of type dbl
468.1768 |
(9)
MSR의 기대값을 \(\sigma^2\)과 \(\beta_0, \beta_1, \beta_2\)의 함수로 표시하여라.
A matrix: 8 × 8 of type dbl
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
Jn <- one1 %*% t(one1)
Jn
A matrix: 8 × 8 of type dbl
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
t(x) %*% (I8 - Jn/8) %*% x
A matrix: 3 × 3 of type dbl
0 |
0.000 |
0.00 |
0 |
923.875 |
78.25 |
0 |
78.250 |
51.50 |
3.
어떤 공장에서 물의 소비량을 조사하기 위하여 매달의 물소비량(\(y\)), 평균온도(\(x_1\)), 작업일수(\(x_2\))와 작업량(\(x_3\))에 관한 데이터를 얻었다.
2.8 |
10 |
27 |
64 |
3.9 |
24 |
26 |
72 |
3.9 |
25 |
28 |
80 |
4.4 |
28 |
26 |
88 |
3.1 |
15 |
30 |
81 |
3.1 |
18 |
24 |
45 |
3.5 |
22 |
27 |
46 |
3.6 |
22 |
25 |
69 |
3.0 |
12 |
27 |
54 |
3.3 |
15 |
25 |
39 |
dt <- data.frame(y=c(2.8,3.9,3.9,4.4,3.1,3.1,3.5,3.6,3.0,3.3),
x1 = c(10,24,25,28,15,18,22,22,12,15),
x2 = c(27,26,28,26,30,24,27,25,27,25),
x3 = c(64,72,80,88,81,45,46,69,54,39))
Call:
lm(formula = y ~ ., data = dt)
Residuals:
Min 1Q Median 3Q Max
-0.23490 -0.07744 -0.02166 0.08840 0.23442
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.409213 1.125954 2.140 0.07618 .
x1 0.069788 0.012640 5.521 0.00149 **
x2 -0.024767 0.044830 -0.552 0.60060
x3 0.005864 0.005052 1.161 0.28978
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.172 on 6 degrees of freedom
Multiple R-squared: 0.9202, Adjusted R-squared: 0.8803
F-statistic: 23.05 on 3 and 6 DF, p-value: 0.001079
A anova: 4 × 5
|
<int> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
x1 |
1 |
2.004315887 |
2.004315887 |
67.73858599 |
0.0001737553 |
x2 |
1 |
0.002273071 |
0.002273071 |
0.07682153 |
0.7909535228 |
x3 |
1 |
0.039877142 |
0.039877142 |
1.34770234 |
0.2897755827 |
Residuals |
6 |
0.177533900 |
0.029588983 |
NA |
NA |
(1)
데이터로부터 회귀모형, \(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x_1 + \hat{\beta}_2 x_2 + \hat{\beta}_3 x_3\) 을 구하여라. 그리고 어째서 이 모형이 선택되었는가에 대하여 토의하시오.
x = matrix(c(rep(1,10),dt$x1, dt$x2, dt$x3), ncol=4)
A matrix: 4 × 4 of type dbl
10 |
191 |
265 |
638 |
191 |
3971 |
5047 |
12620 |
265 |
5047 |
7049 |
17038 |
638 |
12620 |
17038 |
43324 |
A matrix: 4 × 4 of type dbl
42.8460778 |
-0.274517258 |
-1.666784957 |
0.1044984818 |
-0.2745173 |
0.005399764 |
0.009802163 |
-0.0013851965 |
-1.6667850 |
0.009802163 |
0.067921641 |
-0.0050213139 |
0.1044985 |
-0.001385196 |
-0.005021314 |
0.0008624387 |
A matrix: 4 × 1 of type dbl
34.6 |
686.3 |
916.0 |
2249.9 |
A matrix: 1 × 1 of type dbl
121.94 |
A matrix: 4 × 1 of type dbl
2.409213010 |
0.069788005 |
-0.024766597 |
0.005864434 |
\(\hat{y} = 2.409 + 0.0698x_1 - 0.0248x_2 + 0.0059x_3\)
(2)
\(\hat{\beta}_1, \hat{\beta}_2\)와 \(\hat{\beta}_3\)의 의미는 무엇인가?
\(\hat{\beta}_1\) : 작업일수, 작업량이 변하지 않을 때, 평균온도가 1도씨 증가하면 물소비량은 69.8톤 증가한다.
\(\hat{\beta}_2\) : 평균온도, 작업량이 변하지 않을 때, 작업일수가 1일 증가하면 물소비량은 24.8톤 감소한다.
\(\hat{\beta}_3\) : 평균온도, 작업일수이 변하지 않을 때, 작업량이 1000톤 증가하면 물소비량은 5.9톤 증가한다.
(3)
\(Var(\hat{\beta}_3)\)을 구하시오. \(\sigma^2\)을 MSE 로 추정하면 \(\hat{Var}(\hat{\beta}_3)\)은 무엇인가?
A matrix: 4 × 4 of type dbl
(Intercept) |
2.409213010 |
1.125953765 |
2.1397087 |
0.076181015 |
x1 |
0.069788005 |
0.012640155 |
5.5211353 |
0.001485354 |
x2 |
-0.024766597 |
0.044830038 |
-0.5524554 |
0.600595338 |
x3 |
0.005864434 |
0.005051602 |
1.1609058 |
0.289775583 |
\(var(\beta_3) = (x^\top x)^{-1}_{(4,4)} \sigma^2\)
xx_i[4,4]
## 0.0009*sigma^2
0.000862438702127396
hat_var_b3 <- xx_i[4,4] * mse
hat_var_b3
A matrix: 1 × 1 of type dbl
0.4037738 |
\(var(\hat{\beta}_3) = 0.0000255\)
(4)
분산분석표를 작성하고, 결정계수 \(R^2\)을 구하시오.
A anova: 4 × 5
|
<int> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
x1 |
1 |
2.004315887 |
2.004315887 |
67.73858599 |
0.0001737553 |
x2 |
1 |
0.002273071 |
0.002273071 |
0.07682153 |
0.7909535228 |
x3 |
1 |
0.039877142 |
0.039877142 |
1.34770234 |
0.2897755827 |
Residuals |
6 |
0.177533900 |
0.029588983 |
NA |
NA |
I10 = diag(rep(1,10))
I10
A matrix: 10 × 10 of type dbl
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
H = x %*% xx_i %*% t(x)
H
A matrix: 10 × 10 of type dbl
0.479007503 |
-0.005805719 |
-0.08066882 |
-0.01958705 |
0.26225012 |
0.09655116 |
-0.23996631 |
0.10866366 |
0.28730870 |
0.11224675 |
-0.005805719 |
0.186448902 |
0.16479959 |
0.27196857 |
0.01139329 |
0.10399383 |
0.06627062 |
0.18477003 |
-0.01335934 |
0.02952023 |
-0.080668821 |
0.164799589 |
0.33179614 |
0.23355629 |
0.24558622 |
-0.08987596 |
0.23998739 |
0.04338569 |
-0.01509426 |
-0.07347228 |
-0.019587049 |
0.271968566 |
0.23355629 |
0.48736361 |
0.01178437 |
0.05219047 |
-0.10292798 |
0.28777554 |
-0.11085118 |
-0.11127263 |
0.262250120 |
0.011393292 |
0.24558622 |
0.01178437 |
0.58743384 |
-0.22034983 |
0.06947229 |
-0.08527926 |
0.20954991 |
-0.09184095 |
0.096551163 |
0.103993827 |
-0.08987596 |
0.05219047 |
-0.22034983 |
0.36048096 |
0.08217297 |
0.20539705 |
0.10911272 |
0.30032662 |
-0.239966310 |
0.066270622 |
0.23998739 |
-0.10292798 |
0.06947229 |
0.08217297 |
0.69646084 |
-0.11029224 |
0.06925883 |
0.22956358 |
0.108663660 |
0.184770026 |
0.04338569 |
0.28777554 |
-0.08527926 |
0.20539705 |
-0.11029224 |
0.27283220 |
0.01617393 |
0.07657339 |
0.287308704 |
-0.013359337 |
-0.01509426 |
-0.11085118 |
0.20954991 |
0.10911272 |
0.06925883 |
0.01617393 |
0.25886069 |
0.18903998 |
0.112246749 |
0.029520231 |
-0.07347228 |
-0.11127263 |
-0.09184095 |
0.30032662 |
0.22956358 |
0.07657339 |
0.18903998 |
0.33931531 |
\(SSE = y^\top (I-H)y\)
\(SST = y^\top y - n(\bar{y})^2\)
sse = t(y) %*% (I10-H) %*% y
sse
A matrix: 1 × 1 of type dbl
0.1775339 |
sst = t(y) %*% y - 10 * mean(y)^2
sst
A matrix: 1 × 1 of type dbl
2.224 |
A matrix: 1 × 1 of type dbl
2.046466 |
A matrix: 1 × 1 of type dbl
0.6821554 |
A matrix: 1 × 1 of type dbl
0.02958898 |
A matrix: 1 × 1 of type dbl
23.05437 |
회귀 |
2.0465 |
3 |
0.6822 |
23.0544 |
4.7571 |
잔차 |
0.1775 |
6 |
0.0296 |
|
|
계 |
2.2240 |
9 |
|
|
|
\(R^2\)
A matrix: 1 × 1 of type dbl
0.9201736 |
(5)
\(x_1 = 20, x_2 = 27, x_3 = 60\)에서 평균 물소비량을 추정하시오. 이 추정된 소비량의 표준편차의 추정값을 구하시오.
new_dt <- data.frame(x1=20, x2=27, x3 = 60)
new_dt
A data.frame: 1 × 3
<dbl> |
<dbl> |
<dbl> |
20 |
27 |
60 |
predict(m3, newdata = new_dt)
1: 3.48814105556645
A matrix: 1 × 1 of type dbl
3.488141 |
\(\hat{\mu}_0 = 3.4881\)
var_mu0 <- (t(x0) %*% xx_i %*% x0) * mse
var_mu0
A matrix: 1 × 1 of type dbl
0.005065205 |
sqrt(var_mu0)
# se(hat mu0) = 0.0712
A matrix: 1 × 1 of type dbl
0.07117026 |
(6)
\(x_1 = 20, x_2 = 27, x_3 = 60\)에서 어느 한 달의 물소비량 \(y_s\) 를 예측하여라. 이 예측된 물소비량의 표준편차의 추정값을 구하시오.
A matrix: 1 × 1 of type dbl
3.488141 |
\(\hat{y}_s = 3.4881\)
var_ys <- (1+t(x0) %*% xx_i %*% x0) * mse
var_ys
A matrix: 1 × 1 of type dbl
0.03465419 |
sqrt(var_ys)
# se(hat ys) = 0.1862
A matrix: 1 × 1 of type dbl
0.1861564 |
(7)
\(y_j = \beta_0 + \beta_1 x_{1j} + \beta_2 x_{2j} + \epsilon_j\) 라 가정하고 회귀제곱합 SSR의 기대값을 \(\sigma^2\)과 \(\beta_i, i = 0, 1, 2, 3\)의 함수로 표시하여라.
I10 <- diag(rep(1,10))
I10
A matrix: 10 × 10 of type dbl
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
Jn <- one1 %*% t(one1)
Jn
A matrix: 10 × 10 of type dbl
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
t(x) %*% (I10 - Jn/10) %*% x
A matrix: 4 × 4 of type dbl
6.661338e-16 |
1.160183e-14 |
1.751377e-14 |
4.096723e-14 |
-1.065814e-14 |
3.229000e+02 |
-1.450000e+01 |
4.342000e+02 |
-1.332268e-14 |
-1.450000e+01 |
2.650000e+01 |
1.310000e+02 |
-1.421085e-14 |
4.342000e+02 |
1.310000e+02 |
2.619600e+03 |
(8)
\(\beta_3\)의 95% 신뢰구간을 구하시오. 이 신뢰구간의 의미를 해석하시오.
A matrix: 4 × 2 of type dbl
(Intercept) |
-0.345896601 |
5.16432262 |
x1 |
0.038858661 |
0.10071735 |
x2 |
-0.134461748 |
0.08492855 |
x3 |
-0.006496391 |
0.01822526 |
\(\beta_3 \pm t_{0.025}(6) se(\hat{\beta}_3)\)
b[4] - qt(0.975,6) * sqrt(hat_var_b3)
A matrix: 1 × 1 of type dbl
-1.548982 |
b[4] + qt(0.975,6) * sqrt(hat_var_b3)
A matrix: 1 × 1 of type dbl
1.56071 |
이 신뢰구간이 \(\beta_3\)을 포함하고 있을 것이라고 95% 확신한다.
(9)
\(\beta_1\)의 99% 신뢰구간을 구하시오. 이 신뢰구간의 의미를 해석하시오.
A matrix: 4 × 2 of type dbl
(Intercept) |
-1.76517953 |
6.58360555 |
x1 |
0.02292554 |
0.11665047 |
x2 |
-0.19097074 |
0.14143754 |
x3 |
-0.01286402 |
0.02459289 |
\(\beta_1 \pm t_{0.005}(6) se(\hat{\beta}_1)\)
hat_var_b1 = xx_i[2,2] * mse
hat_var_b1
A matrix: 1 × 1 of type dbl
0.0001597735 |
b[2] - qt(0.995,6) * sqrt(hat_var_b1)
A matrix: 1 × 1 of type dbl
0.02292554 |
b[2] + qt(0.995,6) * sqrt(hat_var_b1)
A matrix: 1 × 1 of type dbl
0.1166505 |
이 신뢰구간이 \(\beta_1\)을 포함하고 있을 것이라고 99% 확신한다.
(10)
\(x_1 = 20, x_2 = 27, x_3 = 60\)에서 평균 물소비량, \(E(y)\)의 95% 신뢰구간을 구하시오.
predict(m3, newdata = new_dt, interval = c("confidence"))
A matrix: 1 × 3 of type dbl
1 |
3.488141 |
3.313994 |
3.662288 |
mu0 - qt(0.975,6) * sqrt(var_mu0)
A matrix: 1 × 1 of type dbl
3.313994 |
mu0 + qt(0.975,6) * sqrt(var_mu0)
A matrix: 1 × 1 of type dbl
3.662288 |
(11)
가설 \(H_0 : \beta_1 = 0, H_1 : \beta_1 > 0\)을 \(\alpha = 0.05\)로 검정하시오.
Call:
lm(formula = y ~ ., data = dt)
Residuals:
Min 1Q Median 3Q Max
-0.23490 -0.07744 -0.02166 0.08840 0.23442
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.409213 1.125954 2.140 0.07618 .
x1 0.069788 0.012640 5.521 0.00149 **
x2 -0.024767 0.044830 -0.552 0.60060
x3 0.005864 0.005052 1.161 0.28978
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.172 on 6 degrees of freedom
Multiple R-squared: 0.9202, Adjusted R-squared: 0.8803
F-statistic: 23.05 on 3 and 6 DF, p-value: 0.001079
검정통계량 : \(t_0 = \frac{\hat{\beta}_1}{se(\hat{\beta}_1)}\)
t0 <- b[2]/sqrt(hat_var_b1)
t0
A matrix: 1 × 1 of type dbl
5.521135 |
\(t_0 = 5.5211\)
기각역 : \(t_{0.05}(6) = 1.9432\)
결론 : \(H_0\) 기각 가능. 즉 유의수준 5%에서 \(\beta_2\)는 0보다 크다고 할 수 있다.
\(T = \frac{\hat{\beta}_i - \beta^0_i}{\hat{\sigma} \sqrt{c_{ii}}}\)
(12)
가설 \(H_0 : \beta_1 = \beta_2 = \beta_3\)을 \(\alpha = 0.05\)로 검정하시오.
Loading required package: carData
linearHypothesis(m3, matrix(c(0,1,-1,0,0,0,1,-1), byrow = T, nrow=2),c(0,0))
A anova: 2 × 6
|
<dbl> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
1 |
8 |
1.1224764 |
NA |
NA |
NA |
NA |
2 |
6 |
0.1775339 |
2 |
0.9449425 |
15.96781 |
0.003956509 |
\(H_0 : \beta_1 = \beta_2 = \beta_3\)
\(T= (0,1,-1,0) (0,0,1,-1)\)
\(y= \beta_0 + \beta_1 x_1 + \beta_1 x_2 + \beta_1 x_3 + \epsilon = \beta_0 + \beta_1(x_1+x_2+x_3)\)
z <- matrix(c(rep(1,10), dt$x1+dt$x2+dt$x3), ncol=2)
z
A matrix: 10 × 2 of type dbl
1 |
101 |
1 |
122 |
1 |
133 |
1 |
142 |
1 |
126 |
1 |
87 |
1 |
95 |
1 |
116 |
1 |
93 |
1 |
79 |
A matrix: 2 × 2 of type dbl
10 |
1094 |
1094 |
123754 |
A matrix: 2 × 2 of type dbl
3.04034002 |
-0.0268769654 |
-0.02687697 |
0.0002456761 |
A matrix: 2 × 1 of type dbl
34.6 |
3852.2 |
A matrix: 1 × 1 of type dbl
121.94 |
A matrix: 10 × 10 of type dbl
0.11733491 |
0.073997642 |
0.051297170 |
0.03272406 |
0.065742925 |
0.146226415 |
0.12971698 |
0.08637972 |
0.133844340 |
0.162735849 |
0.07399764 |
0.139003538 |
0.173054245 |
0.20091392 |
0.151385613 |
0.030660377 |
0.05542453 |
0.12043042 |
0.049233491 |
0.005896226 |
0.05129717 |
0.173054245 |
0.236831761 |
0.28901336 |
0.196246069 |
-0.029874214 |
0.01650943 |
0.13826651 |
0.004913522 |
-0.076257862 |
0.03272406 |
0.200913915 |
0.289013365 |
0.36109473 |
0.232950079 |
-0.079402516 |
-0.01533019 |
0.15285967 |
-0.031348270 |
-0.143474843 |
0.06574292 |
0.151385613 |
0.196246069 |
0.23295008 |
0.167698506 |
0.008647799 |
0.04127358 |
0.12691627 |
0.033117138 |
-0.023977987 |
0.14622642 |
0.030660377 |
-0.029874214 |
-0.07940252 |
0.008647799 |
0.223270440 |
0.17924528 |
0.06367925 |
0.190251572 |
0.267295597 |
0.12971698 |
0.055424528 |
0.016509434 |
-0.01533019 |
0.041273585 |
0.179245283 |
0.15094340 |
0.07665094 |
0.158018868 |
0.207547170 |
0.08637972 |
0.120430425 |
0.138266509 |
0.15285967 |
0.126916274 |
0.063679245 |
0.07665094 |
0.11070165 |
0.073408019 |
0.050707547 |
0.13384434 |
0.049233491 |
0.004913522 |
-0.03134827 |
0.033117138 |
0.190251572 |
0.15801887 |
0.07340802 |
0.166077044 |
0.222484277 |
0.16273585 |
0.005896226 |
-0.076257862 |
-0.14347484 |
-0.023977987 |
0.267295597 |
0.20754717 |
0.05070755 |
0.222484277 |
0.327044025 |
gammahat <- zz_i %*% t(z) %*% y
gammahat
A matrix: 2 × 1 of type dbl
1.66031840 |
0.01645047 |
e <- y - z %*% gammahat
e
A matrix: 10 × 1 of type dbl
-0.521816038 |
0.232724057 |
0.051768868 |
0.403714623 |
-0.633077830 |
0.008490566 |
0.276886792 |
0.031426887 |
-0.190212264 |
0.340094340 |
sse_rm <- t(e) %*% e
sse_rm
A matrix: 1 × 1 of type dbl
1.122476 |
f0 = ((sse_rm - sse)/(2))/(sse/6)
f0
A matrix: 1 × 1 of type dbl
15.96781 |
검정통계량 : \(\frac{(SSE_{RM} - SSE_{FM})/r}{SSE_{FM}/(n-p-1)}\)
관측값 \(f_0 = 15.968\)
기각역 : \(f_0 > 5.1433\)
결론 : 기각할 수 있다.
(13)
가설 \(H_0 : \beta_1 = \beta_2 + 3\)을 \(\alpha = 0.05\)로 검정하시오.
\(H_0 : \beta_1 = \beta_2 = \beta_3\)
\(T= (0,1,-1,0)\)
\(y = \beta_0 + (\beta_2+3)x_1 + \beta_2 x_2 + \beta_3 x_3 + \epsilon\) \(y - 3x_1 = \beta_0 + \beta_2(x_1 + x_2) + \beta_3 x_3 + \epsilon\)
- -27.2
- -68.1
- -71.1
- -79.6
- -41.9
- -50.9
- -62.5
- -62.4
- -33
- -41.7
z <- matrix(c(rep(1,10), dt$x1+dt$x2,dt$x3), ncol=3)
z
A matrix: 10 × 3 of type dbl
1 |
37 |
64 |
1 |
50 |
72 |
1 |
53 |
80 |
1 |
54 |
88 |
1 |
45 |
81 |
1 |
42 |
45 |
1 |
49 |
46 |
1 |
47 |
69 |
1 |
39 |
54 |
1 |
40 |
39 |
A matrix: 3 × 3 of type dbl
10 |
456 |
638 |
456 |
21114 |
29658 |
638 |
29658 |
43324 |
A matrix: 3 × 3 of type dbl
6.76054651 |
-0.160413550 |
0.0102556645 |
-0.16041355 |
0.005038964 |
-0.0010871974 |
0.01025566 |
-0.001087197 |
0.0006163093 |
gammahat <- zz_i %*% t(z) %*% yy
gammahat
A matrix: 3 × 1 of type dbl
77.7140851 |
-3.1683286 |
0.2025345 |
e <- yy - z %*% gammahat
e
A matrix: 10 × 1 of type dbl
-0.6481331 |
-1.9801368 |
2.9045732 |
-4.0473741 |
6.5554097 |
-4.6583347 |
5.7174312 |
-5.1775193 |
1.9138690 |
-0.5797851 |
sse_rm <- t(e) %*% e
sse_rm
A matrix: 1 × 1 of type dbl
157.327 |
f0 = ((sse_rm - sse)/(1))/(sse/6)
f0
A matrix: 1 × 1 of type dbl
5311.082 |
검정통계량 : \(\frac{(SSE_RM - SSE_FM)/r)}{(SSE_FM/(n-p-1))}\)
관측값 \(f_0 = 5311.082\)
기각역 : \(f_0 > 5.9874\)
결론 : 기각할 수 있다.
(14)
\(x_1 = 20, x_2 = 27, x_3 = 60\)에서 \(E(y)\)에 관한 가설 \(H_0 : E(y) = 3.5, H_1 : E(y) ̸\neq 3.5\)를 \(\alpha = 0.05\)로 검정하시오.
sy 풀이
\(T_0 = \frac{\hat{y} - E(y)}{\sqrt{Var{\hat{y}}}} = \frac{\hat{y} - E(Y)}{\sqrt{x^\top (X^\top X)^{-1} x MSE}} \sim t(n-1)\)
x_indi <- matrix(c(1,20,27,60),nrow=1,ncol=4)
yhat <- x_indi %*% b
round(yhat,5)
A matrix: 1 × 1 of type dbl
3.48814 |
\(Var{y} = x_i (X^\top X)^{-1} x_i^\top \sigma^2\)
var_y <- ((x_indi %*%solve(t(x) %*% x) %*% t(x_indi) ) * mse)[,]
round(var_y,5)
0.00507
t_0 <- (yhat - 3.5)/sqrt(var_y)
round(t_0[,],5)
-0.16663
\(F_0=-0.16663\) > \(t_{0.025}(9)=-0.5097\) 로 귀무가설을 기각하지 못하여 \(E(y) = 3.5\)라는 결론이 나온다.
4.
중회귀모형, \(y = X\beta + \epsilon, \epsilon \sim N(0_n, I\sigma^2)\)에서 \(X\) 가 \(n \times (p + 1)\)행렬이고 rank가 \(p + 1\)이라면 적합된 모형 \(\hat{y} = X\hat{\beta}\)에 대하여 \[\sum^{n}_{j=1} Var(\hat{y}_j ) = (p + 1)\sigma^2\] 이 됨을 증명하여라.
\(\hat{\bf{y}}= X\hat{\beta} = X(X^\top X)^{-1} X^\top y = Hy\)
\(Var(X\hat{\beta}) = XVar(\hat{\beta})X^\top\)
\(Var(\hat{\beta}) = Var((X^\top X)^{-1} X^\top y) = (X^\top X)^{-1} X^\top Var(y)X(X^\top X)^{-1} = (X^\top X)^{-1} X^\top X(X^\top X)^{-1} \sigma^2 = (X^\top X )^{-1} \sigma^2\)
\(Var(\hat{y}) = Var(X\hat{\beta}) = X(X^\top X)^{-1} X^\top \sigma^2\)
\(\to Var(\hat{y}) = HVar(y) H^\top = HH\sigma^2 = H\sigma^2\)
\(\star H^\top = H, H^2 = H, Var(y) = I_n \sigma^2\)
\(\sum^{n}_{j=1} Var(\hat{y}_j) = tr(Var(\hat{y})) = tr(H\sigma^2) = \sigma^2 tr(H) = tr((X^\top X)^{-1} X^\top X)\sigma^2 = tr(I_{p+1})\sigma^2 = (p+1)\sigma^2\)
\(\star tr(X (X^\top X)^{-1} X^\top)\)
5.
중회귀모형에서 다음을 증명하시오.
(1)
\(Cov(e, y) = \sigma^2[I_n − X(X^\top X)^{-1}X^\top]\)
\(\bf{e} = y - \hat{y} = y - Hy = (I-H)y\)
\(Cov(\bf{e},y) = Cov((I-H)y,y) = (I-H)Var(y) = (I-H) \sigma^2\)
\(\star Cov(Ax,y) = A Cov(X,Y)\)
\(\star Var(y) = \sigma^2\)
(2)
\(Cov(e, \hat{y}) = \mathbb{O}_n\)
\(Cov(e(I - H)\bf{y},Hy) = (I-H) Cov(y,y)H^\top = (I-H)H\sigma^2 = \mathbb{O}_n \sigma^2\)
\(\star Cov(Ax,By) = ACov(X,Y)B^\top\)
\(\star H^\top= H\)
\(\star H^2 = H\)
(3)
\(Cov(e, \hat{\beta}) = \mathbb{O}_{n\times(k+1)}\)
\(Cov((I-H)\bf{y},(X^\top X)^{-1} y) = (I-H) Cov(y,y) X (X^\top X)^{-1} = (I - X(X^\top X)^{-1} X^\top ) X (X^\top X)^{-1} \sigma^2 = \{ X(X^\top X)^{-1} - X(X^\top X)^{-1} \} \sigma^2 = \mathbb{O}_{n \times ([+1)}\)
(4)
\(Cov(\epsilon, \hat{\beta}) = \sigma^2 X(X^\top X)^{-1}\)
\(\bf{\hat{\beta}} = (X^\top X)^{-1} X^\top y = (X^\top X)^{-1} X^\top (X\beta + \epsilon) = \beta + (X^\top X)^{-1} X^\top \epsilon\)
\(Cov(\bf{\epsilon} , \beta + (X^\top X)^{-1} X^\top \epsilon ) = Cov(\epsilon, \beta) + Cov(\epsilon, \epsilon)X(X^\top X)^{-1} = \sigma^2 X(X^\top X)^{-1}\)
\(\star Cov(\epsilon, \beta) = 0\)
\(\star Cov(\epsilon, \epsilon) = I_n \sigma^2\)
(5)
\(\sum^{n}_{j=1} e_j y_j = SSE\)
\(\sum^{n}_{j=1} e_j y_j = \bf{e^\top y} = \{ (I - H)y\}^\top y = y^\top (I-H)y\)
\(\star \bf{e}^\top (e_1, \dots , e_n) = (y-\hat{y})^\top\)
\(\star \bf{y}^\top = (y_1 , \dots , y_n)\)
\(SSE = \sum^n_{j=1} (y_i - \hat{y}_j)^2 = (\bf{y} - \hat{y} ) ^\top ( y - \hat{y} ) = e^\top e = \{ (I - H)y \}^\top \{ (I - H)y \} = y^\top (I - H) (I-H)y = y^\top (I - H)y\)
\(\star I - H_H+H^2 = I-H, H^2 = H\)
\(\therefore \sum^{n}_{j=1} e_j y_j = SSE\)
(6)
\(\sum^{n}_{j=1} e_j \hat{y}_j = 0\)
\(\sum^{n}_{j=1} \bf{e_j \hat{y}_j} = e^\top \hat{y} = y^\top (I-H) Hy = y^\top_{1\times n} \mathbb{O}_{n\times n} y_{n \times 1} = 0\)
\(\star H - H^2 = H - H = \mathbb{O}_{n \times n}\)
1 plus.
다음의 데이터에 대하여 모형 \(y_i = \beta_0 + \beta_1 x_{1j} + \beta_2 x_{2j} + \epsilon_j, \epsilon \sim N(0,\sigma^2)\)이 옳다고 가정하고 질문에 답하여라.
x_1 |
1 |
2 |
1 |
3 |
3 |
3 |
x_2 |
1 |
1 |
2 |
1 |
2 |
3 |
x <- matrix(c(1,1,1,1,1,1,1,2,1,3,3,3,1,1,2,1,2,3), ncol=3)
dt <- data.frame(y=y,x1=x[,2],x2 = x[,3])
Call:
lm(formula = y ~ ., data = dt)
Residuals:
1 2 3 4 5 6
-1.1395 1.3488 0.4651 -1.1628 1.4419 -0.9535
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.2326 1.9740 1.638 0.2000
x1 1.5116 0.7713 1.960 0.1449
x2 -2.6047 0.9288 -2.804 0.0676 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.599 on 3 degrees of freedom
Multiple R-squared: 0.7511, Adjusted R-squared: 0.5852
F-statistic: 4.527 on 2 and 3 DF, p-value: 0.1242
A anova: 3 × 5
|
<int> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
x1 |
1 |
3.040230 |
3.04023 |
1.188454 |
0.35537433 |
x2 |
1 |
20.118685 |
20.11868 |
7.864577 |
0.06760655 |
Residuals |
3 |
7.674419 |
2.55814 |
NA |
NA |
(1)
\(X^\top X, X^\top y y^\top y\)와 \((X^\top X)^{-1}\) 을 구하시오.
A matrix: 3 × 3 of type dbl
6 |
13 |
10 |
13 |
33 |
23 |
10 |
23 |
20 |
A matrix: 3 × 3 of type dbl
1.5232558 |
-0.34883721 |
-0.36046512 |
-0.3488372 |
0.23255814 |
-0.09302326 |
-0.3604651 |
-0.09302326 |
0.33720930 |
xx_i2 <- round(xx_i,2)
xx_i2
A matrix: 3 × 3 of type dbl
1.52 |
-0.35 |
-0.36 |
-0.35 |
0.23 |
-0.09 |
-0.36 |
-0.09 |
0.34 |
A matrix: 3 × 1 of type dbl
13 |
32 |
15 |
A matrix: 1 × 1 of type dbl
59 |
(2)
\(\beta_0, \beta_1, \beta_2, \sigma^2\)를 추정하시오.
- (Intercept)
- 3.23255813953489
- x1
- 1.51162790697674
- x2
- -2.6046511627907
A matrix: 3 × 1 of type dbl
3.232558 |
1.511628 |
-2.604651 |
b1 <- round(round(xx_i,2)%*% xy,2)
b1
A matrix: 3 × 1 of type dbl
3.16 |
1.46 |
-2.46 |
A matrix: 6 × 1 of type dbl
2.13953488 |
3.65116279 |
-0.46511628 |
5.16279070 |
2.55813953 |
-0.04651163 |
haty_2 <- round(x %*% b1,2)
haty_2
A matrix: 6 × 1 of type dbl
2.16 |
3.62 |
-0.30 |
5.08 |
2.62 |
0.16 |
A matrix: 6 × 1 of type dbl
-1.1395349 |
1.3488372 |
0.4651163 |
-1.1627907 |
1.4418605 |
-0.9534884 |
A matrix: 6 × 1 of type dbl
-1.16 |
1.38 |
0.30 |
-1.08 |
1.38 |
-1.16 |
A matrix: 1 × 1 of type dbl
7.674419 |
sse_2 <- t(e_2) %*% e_2
sse_2
A matrix: 1 × 1 of type dbl
7.7564 |
A matrix: 1 × 1 of type dbl
2.55814 |
A matrix: 1 × 1 of type dbl
2.585467 |
(3)
\(x_1 = x_2 = 2\)에서 \(E(y)\)의 95% 신뢰구간을 구하시오.
new_dt <- data.frame(x1=2, x2=2)
new_dt
A data.frame: 1 × 2
<dbl> |
<dbl> |
2 |
2 |
predict(m, newdata = new_dt,interval = c("confidence"), level = 0.95)
A matrix: 1 × 3 of type dbl
1 |
1.046512 |
-1.345982 |
3.439005 |
A matrix: 1 × 1 of type dbl
0.2209302 |
var_mu0 <- (t(x0) %*% xx_i %*% x0) * mse
var_mu0
A matrix: 1 × 1 of type dbl
0.5651704 |
tse <- qt(0.975,3) * sqrt(var_mu0)
tse
A matrix: 1 × 1 of type dbl
2.392494 |
A matrix: 1 × 1 of type dbl
-1.345982 |
A matrix: 1 × 1 of type dbl
3.439005 |
(4)
\(\beta_1\)의 90% 신뢰구간을 구하시오.
Call:
lm(formula = y ~ ., data = dt)
Residuals:
1 2 3 4 5 6
-1.1395 1.3488 0.4651 -1.1628 1.4419 -0.9535
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.2326 1.9740 1.638 0.2000
x1 1.5116 0.7713 1.960 0.1449
x2 -2.6047 0.9288 -2.804 0.0676 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.599 on 3 degrees of freedom
Multiple R-squared: 0.7511, Adjusted R-squared: 0.5852
F-statistic: 4.527 on 2 and 3 DF, p-value: 0.1242
A matrix: 3 × 2 of type dbl
(Intercept) |
-1.4129961 |
7.8781124 |
x1 |
-0.3035404 |
3.3267962 |
x2 |
-4.7904032 |
-0.4188991 |
se_b1 <- sqrt(xx_i[2,2] * mse)
se_b1
A matrix: 1 × 1 of type dbl
0.7713081 |
tse <- qt(0.95,3) *se_b1
tse
A matrix: 1 × 1 of type dbl
1.815168 |
A matrix: 1 × 1 of type dbl
-0.3035404 |
A matrix: 1 × 1 of type dbl
3.326796 |
(5)
가설 \(H_0: \beta_1 = \beta_2, H_1: \beta_1 \neq \beta_2\)을 유의수준 \(\alpha = 0.05\)
linearHypothesis(m, c(0,1,-1), 0)
A anova: 2 × 6
|
<dbl> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
1 |
4 |
30.092308 |
NA |
NA |
NA |
NA |
2 |
3 |
7.674419 |
1 |
22.41789 |
8.763357 |
0.05952979 |
z <- matrix(c(1,1,1,1,1,1,x[,2]+x[,3]), ncol=2)
A matrix: 2 × 2 of type dbl
6 |
23 |
23 |
99 |
A matrix: 2 × 2 of type dbl
1.5230769 |
-0.35384615 |
-0.3538462 |
0.09230769 |
A matrix: 2 × 1 of type dbl
13 |
47 |
A matrix: 1 × 1 of type dbl
59 |
A matrix: 6 × 6 of type dbl
0.47692308 |
3.076923e-01 |
3.076923e-01 |
0.1384615 |
-0.03076923 |
-2.000000e-01 |
0.30769231 |
2.307692e-01 |
2.307692e-01 |
0.1538462 |
0.07692308 |
1.110223e-16 |
0.30769231 |
2.307692e-01 |
2.307692e-01 |
0.1538462 |
0.07692308 |
1.110223e-16 |
0.13846154 |
1.538462e-01 |
1.538462e-01 |
0.1692308 |
0.18461538 |
2.000000e-01 |
-0.03076923 |
7.692308e-02 |
7.692308e-02 |
0.1846154 |
0.29230769 |
4.000000e-01 |
-0.20000000 |
1.110223e-16 |
1.110223e-16 |
0.2000000 |
0.40000000 |
6.000000e-01 |
gammahat <- zz_i %*% t(z) %*% y
gammahat
A matrix: 2 × 1 of type dbl
3.1692308 |
-0.2615385 |
e <- y - z %*% gammahat
e
A matrix: 6 × 1 of type dbl
-1.646154 |
2.615385 |
-2.384615 |
1.876923 |
2.138462 |
-2.600000 |
sse_rm <- t(e) %*% e
sse_rm
A matrix: 1 × 1 of type dbl
30.09231 |
f0 = (sse_rm - sse)/(sse/3)
f0
A matrix: 1 × 1 of type dbl
8.763357 |
T검정 방법
var_qb = t(q) %*% xx_i %*% q* mse
var_qb
A matrix: 1 × 1 of type dbl
1.933478 |
se_qb = sqrt(var_qb)
se_qb
A matrix: 1 × 1 of type dbl
1.390495 |
t0 = (t(q) %*% b )/se_qb
t0
A matrix: 1 × 1 of type dbl
2.960297 |
A matrix: 1 × 1 of type dbl
8.763357 |
(6)
가설 \(H_0 : \beta_1 = 0, H_1 : \beta_1 \neq 0\)을 유의수준 \(\alpha = 0.1\)로 검정하시오. 위 (3)의 질문의 결과와 어떠한 연관을 지을 수 있는가?
(7)
\(x_1 = 2, x_2 = 3\)에서 가설 \(H_0 : E(y) = 4,H_1 : E(y) \neq 4\)를 \(\alpha = 0.05\)로 검정하시오
new_dt <- data.frame(x1=2, x2=2)
predict(m, newdata = new_dt,interval = c("confidence"), level = 0.95)
A matrix: 1 × 3 of type dbl
1 |
1.046512 |
-1.345982 |
3.439005 |
A matrix: 1 × 1 of type dbl
0.8139535 |
var_mu0 <- (t(x0) %*% xx_i %*% x0) * mse
var_mu0
A matrix: 1 × 1 of type dbl
2.082207 |
t0 <- (mu0-4)/ sqrt(var_mu0)
t0
A matrix: 1 × 1 of type dbl
-3.851834 |
(8)
가설 \(H_0 : \beta_2 = 0, H_1: \beta_2 <0\)을 \(\alpha = 0.05\) 로 검정하시오.
se_b2 <- sqrt(xx_i[3,3] * mse)
se_b2
A matrix: 1 × 1 of type dbl
0.9287779 |
A matrix: 1 × 1 of type dbl
-2.804385 |
(9)
가설 $H_0: _1 = 2_2, H_1 : _1 _2 $를 \(\alpha=0..05\)로 검정하시오.
linearHypothesis(m, c(0,1,-2), 0)
A anova: 2 × 6
|
<dbl> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
1 |
4 |
30.797619 |
NA |
NA |
NA |
NA |
2 |
3 |
7.674419 |
1 |
23.1232 |
9.039069 |
0.05737101 |
z <- matrix(c(1,1,1,1,1,1,2*x[,2]+x[,3]), ncol=2)
A matrix: 2 × 2 of type dbl
6 |
36 |
36 |
244 |
A matrix: 2 × 2 of type dbl
1.4523810 |
-0.21428571 |
-0.2142857 |
0.03571429 |
A matrix: 2 × 1 of type dbl
13 |
79 |
A matrix: 1 × 1 of type dbl
59 |
A matrix: 6 × 6 of type dbl
0.48809524 |
0.27380952 |
0.38095238 |
0.05952381 |
-0.04761905 |
-0.15476190 |
0.27380952 |
0.20238095 |
0.23809524 |
0.13095238 |
0.09523810 |
0.05952381 |
0.38095238 |
0.23809524 |
0.30952381 |
0.09523810 |
0.02380952 |
-0.04761905 |
0.05952381 |
0.13095238 |
0.09523810 |
0.20238095 |
0.23809524 |
0.27380952 |
-0.04761905 |
0.09523810 |
0.02380952 |
0.23809524 |
0.30952381 |
0.38095238 |
-0.15476190 |
0.05952381 |
-0.04761905 |
0.27380952 |
0.38095238 |
0.48809524 |
gammahat <- zz_i %*% t(z) %*% y
gammahat
A matrix: 2 × 1 of type dbl
1.95238095 |
0.03571429 |
e <- y - z %*% gammahat
e
A matrix: 6 × 1 of type dbl
-1.059524 |
2.869048 |
-2.095238 |
1.797619 |
1.761905 |
-3.273810 |
sse_rm <- t(e) %*% e
sse_rm
A matrix: 1 × 1 of type dbl
30.79762 |
f0 = (sse_rm - sse)/(sse/3)
f0
A matrix: 1 × 1 of type dbl
9.039069 |