Hazard ratio, Odds ratio
로지스틱 회귀분석
\[log(\frac{p}{1-p}) = \beta_0 + \beta_1 x_1 + \dots + \beta_k x_k\]
# 예제 데이터 생성
set.seed(123)
n <- 200
exam1 <- round(rnorm(n, mean = 70, sd = 10))
exam2 <- round(rnorm(n, mean = 75, sd = 8))
exam3 <- round(rnorm(n, mean = 80, sd = 6))
pass <- rbinom(n, 1, plogis(0.3 + 0.1 * exam1 + 0.2 * exam2 + 0.3 * exam3))
data <- data.frame(exam1, exam2, exam3, pass)
# 로지스틱 회귀분석 모델 학습
model <- glm(pass ~ exam1 + exam2 + exam3, data = data, family = binomial)
Warning message:
“glm.fit: algorithm did not converge”
family = binomial에서 이항분포의 개념
# 모델 요약 정보 출력
summary(model)
Call:
glm(formula = pass ~ exam1 + exam2 + exam3, family = binomial,
data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
2.409e-06 2.409e-06 2.409e-06 2.409e-06 2.409e-06
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.657e+01 4.810e+05 0 1
exam1 -3.384e-11 2.676e+03 0 1
exam2 2.952e-10 3.178e+03 0 1
exam3 2.546e-10 4.359e+03 0 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 0.0000e+00 on 199 degrees of freedom
Residual deviance: 1.1603e-09 on 196 degrees of freedom
AIC: 8
Number of Fisher Scoring iterations: 25
# 각 독립변수의 회귀계수와 p-value 출력
coef(summary(model))
A matrix: 4 × 4 of type dbl
(Intercept) |
2.656607e+01 |
480958.636 |
5.523566e-05 |
0.9999559 |
exam1 |
-3.384147e-11 |
2676.045 |
-1.264607e-14 |
1.0000000 |
exam2 |
2.951540e-10 |
3178.418 |
9.286192e-14 |
1.0000000 |
exam3 |
2.546355e-10 |
4359.366 |
5.841114e-14 |
1.0000000 |
# 새로운 데이터에 대한 예측값 계산
new_data <- data.frame(exam1 = c(65, 80), exam2 = c(70, 85), exam3 = c(75, 90))
pred <- predict(model, newdata = new_data, type = "response")
로지스틱 함수와 오즈비
\[P(Y=1|X) = \frac{e^{\beta_0 + \beta_1 X}}{1 + e^{\beta_0 + \beta_1 X}}\]
\(P(Y=1|X) = P\)
라고 할때,
\(ln(\frac{p}{1-p}) = \beta_0 + \beta_1 X\)
odds = \(\frac{p}{1-p}\)
# exam1 변수의 계수 출력
coef(model)["exam1"]
exam1: -3.38414654242419e-11
# 오즈비 계산
odds_ratio <- exp(coef(model)["exam1"])
odds_ratio
exam1: 0.999999999966159
위험비
\[HR = \frac{\lambda_1(t)}{\lambda_0(t)} = \exp(\beta_1 X_1 + \beta_2 X_2 + \dots + \beta_k X_k)\]
\(\lambda_1(t)\): 치료군의 위험도
\(\lambda_0(t)\)는 대조군의 위험도
위험비는 치료군 대 대조군의 위험도 비율
1보다 크면 치료군의 위험이 높겠고,
1보다 작으면 대조군의 위험이 높겠고.
# 데이터 로딩
data(lung)
# Cox 모형 적합
library(survival)
fit <- coxph(Surv(time, status) ~ age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss, data = lung)
Warning message in data(lung):
“data set ‘lung’ not found”
# Hazard Ratio 계산
exp(coef(fit))
- age
- 1.0107060959455
- sex
- 0.576458374699102
- ph.ecog
- 2.08376570435143
- ph.karno
- 1.02270907640951
- pat.karno
- 0.987660215984953
- meal.cal
- 1.00003329080755
- wt.loss
- 0.985771582300398