비모수적 방법

  • 모형에 대한 사전 설정/가정을 최소화
  • 데이터 자체의 특성만을 이용하여 연관성을 추정하고자 함

“Let the data speak for themselves”

  • 커널 추정법은 비모수적 방법 중 하나로 g에 대해서는 적당히 부드러운(smooth) 성질을 가진다는 가정만 상정
  • 자료 특성의 국소적인 변화를 모수적 방법에 비해 민감하게 추정할 수 있음
library(np)
data("cps71")
attach(cps71)
Nonparametric Kernel Methods for Mixed Datatypes (version 0.60-11)
[vignette("np_faq",package="np") provides answers to frequently asked questions]
[vignette("np",package="np") an overview]
[vignette("entropy_np",package="np") an overview of entropy-based methods]

예시: 임금데이터

X:연령

Y:임금

35세의 평균임금

plot(age,logwage)
abline(v=mean(age),col='red')

특성을 만족하는 관측치만 활용

  • 연령별 임금 추정치
  • 연령에 따른 변화가 지나치게 심하게 나타남
plot(age,logwage)
lines(smooth.spline(age,logwage,spar = 0.1),col='red')

국소 평균

  • 연령별 평균에 의한 추정은 다음과 같은 문제를 가짐
    • 추정된 g가 연속이라는 보장이 없음
    • 특정 연령에서의 관측치가 없거나 부족한 경우 추정이 불안정
  • 연령이 비슷하면 평균임금도 비슷하지 않을까?
    • 특정 연령의 평균연령 추정을 위해 이웃한 관측치를 활용
  • 국소화 (localizing)
reg.npbw <- npregbw(logwage ~ age, regtype = "lc")
reg.np <- npreg(reg.npbw)
                   
reg.npbw
Regression Data (205 observations, 1 variable(s)):

                   age
Bandwidth(s): 1.892157

Regression Type: Local-Constant
Bandwidth Selection Method: Least Squares Cross-Validation
Formula: logwage ~ age
Bandwidth Type: Fixed
Objective Function Value: 0.316055 (achieved on multistart 1)

Continuous Kernel Type: Second-Order Gaussian
No. Continuous Explanatory Vars.: 1
reg.np
Regression Data: 205 training points, in 1 variable(s)
                   age
Bandwidth(s): 1.892157

Kernel Regression Estimator: Local-Constant
Bandwidth Type: Fixed

Continuous Kernel Type: Second-Order Gaussian
No. Continuous Explanatory Vars.: 1

bws를 2,3,10,0.3.. 바꿔가면서 시도

bws 빼면 bws 자동으로 계산한다.

nw <- npreg(logwage ~ age, regtype = "lc",bws=2)
nw
Regression Data: 205 training points, in 1 variable(s)
              age
Bandwidth(s):   2

Kernel Regression Estimator: Local-Constant
Bandwidth Type: Fixed

Continuous Kernel Type: Second-Order Gaussian
No. Continuous Explanatory Vars.: 1

이렇게도 쓸 수 있다. bws = reg.npbw

  • lc: local constant
  • ll: local linear

kernal option: ckertype=?

ll <- npreg(logwage ~ age,regtype="ll",bws=2)
ll
Regression Data: 205 training points, in 1 variable(s)
              age
Bandwidth(s):   2

Kernel Regression Estimator: Local-Linear
Bandwidth Type: Fixed

Continuous Kernel Type: Second-Order Gaussian
No. Continuous Explanatory Vars.: 1

국소상수 vs 국소선형

plot(age,logwage,main="Nw vs LL")
lines(age,nw$mean,col="red",lwd=2)
lines(age,ll$mean,col="blue",lwd=2)
legend(20,15,c("NW","LL"),col=c(2,12),lty=c(1,1))
x = seq(0,1,0.01)
y=rnorm(101)
plot(x,y)
nw = npreg(y~x, regtype = "lc",bws=0.5)
nw
Regression Data: 101 training points, in 1 variable(s)
                x
Bandwidth(s): 0.5

Kernel Regression Estimator: Local-Constant
Bandwidth Type: Fixed

Continuous Kernel Type: Second-Order Gaussian
No. Continuous Explanatory Vars.: 1
ll <- npreg(y~x,regtype = "ll",bws=0.5)
ll
Regression Data: 101 training points, in 1 variable(s)
                x
Bandwidth(s): 0.5

Kernel Regression Estimator: Local-Linear
Bandwidth Type: Fixed

Continuous Kernel Type: Second-Order Gaussian
No. Continuous Explanatory Vars.: 1
plot(x,y,main="NW vs LL")
lines(x,nw$mean,col="red",lwd=2)
lines(x,ll$mean,col="blue",lwd=2)
legend(0,2.5,c("NW","LL"),col=c(2,12),lty=c(1,1))