빅데이터 분석 (4주차) 9월30일
파이토치를 이용하여 회귀모형 학습하기 (2)
- Data
- step1~2 요약
- 방법1: 모델을 직접선언 + loss함수도 직접선언
- 방법2: 모델식을 torch.nn으로 선언 (bias=False) + loss 직접선언
- 방법3: 모델식을 torch.nn으로 선언 (bias=True) + loss 직접선언
- 방법4: 모델식을 직접선언 + loss함수는 torch.nn.MSELoss()
- 방법5: 모델식을 torch.nn으로 선언 (bias=False) + loss함수는 torch.nn.MSELoss()
- 방법6: 모델식을 torch.nn으로 선언 (bias=True) + loss함수는 torch.nn.MSELoss()
- step3: derivation
- step4: update
- step1~4를 반복하면된다.
- 숙제
- definition of SGD
import torch
import numpy as np
-
model: $y_i= w_0+w_1 x_i +\epsilon_i = 2.5 + 4x_i +\epsilon_i, \quad i=1,2,\dots,n$
-
model: ${\bf y}={\bf X}{\bf W} +\boldsymbol{\epsilon}$
- ${\bf y}=\begin{bmatrix} y_1 \\ y_2 \\ \dots \\ y_n\end{bmatrix}, \quad {\bf X}=\begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \dots \\ 1 & x_n\end{bmatrix}, \quad {\bf W}=\begin{bmatrix} 2.5 \\ 4 \end{bmatrix}, \quad \boldsymbol{\epsilon}= \begin{bmatrix} \epsilon_1 \\ \dots \\ \epsilon_n\end{bmatrix}$
torch.manual_seed(202150754)
n=100
ones= torch.ones(n)
x,_ = torch.randn(n).sort()
X = torch.vstack([ones,x]).T
W = torch.tensor([2.5,4])
ϵ = torch.randn(n)*0.5
y = X@W + ϵ
ytrue = X@W
What1=torch.tensor([-5.0,10.0],requires_grad=True)
yhat1=X@What1
loss1=torch.mean((y-yhat1)**2)
loss1
net2=torch.nn.Linear(in_features=2,out_features=1,bias=False)
net2.weight.data= torch.tensor([[-5.0,10.0]])
yhat2=net2(X)
loss2=torch.mean((y.reshape(100,1)-yhat2)**2)
loss2
net3=torch.nn.Linear(in_features=1,out_features=1,bias=True)
net3.weight.data= torch.tensor([[10.0]])
net3.bias.data= torch.tensor([[-5.0]])
yhat3=net3(x.reshape(100,1))
loss3=torch.mean((y.reshape(100,1)-yhat3)**2)
loss3
What4=torch.tensor([-5.0,10.0],requires_grad=True)
yhat4=X@What4
lossfn=torch.nn.MSELoss()
loss4=lossfn(y,yhat4)
loss4
net5=torch.nn.Linear(in_features=2,out_features=1,bias=False)
net5.weight.data= torch.tensor([[-5.0,10.0]])
yhat5=net5(X)
#lossfn=torch.nn.MSELoss()
loss5=lossfn(y.reshape(100,1),yhat5)
loss5
net6=torch.nn.Linear(in_features=1,out_features=1,bias=True)
net6.weight.data= torch.tensor([[10.0]])
net6.bias.data= torch.tensor([[-5.0]])
yhat6=net6(x.reshape(100,1))
loss6=lossfn(y.reshape(100,1),yhat6)
loss6
loss1.backward()
What1.grad.data
loss2.backward()
net2.weight.grad
loss3.backward()
net3.bias.grad,net3.weight.grad
loss4.backward()
What4.grad.data
loss5.backward()
net5.weight.grad
loss6.backward()
net6.bias.grad,net6.weight.grad
What1.data ## update 전
lr=0.1 # learning rate
What1.data = What1.data - lr*What1.grad.data ## update 후
What1
net2.weight.data
- SGD: Implements stochastic gradient descent (optionally with momentum).
optmz2 = torch.optim.SGD(net2.parameters(),lr=0.1)
list(net2.parameters())
optmz2.step() ## update
net2.weight.data ## update 후
net3.bias.data,net3.weight.data
optmz3 = torch.optim.SGD(net3.parameters(),lr=0.1)
optmz3.step()
net3.bias.data,net3.weight.data
list(net3.parameters())
What4.data ## update 전
lr=0.1
What4.data = What4.data - lr*What4.grad.data ## update 후
What4
net5.weight.data
optmz5 = torch.optim.SGD(net5.parameters(),lr=0.1)
optmz5.step() ## update
net5.weight.data ## update 후
net6.bias.data,net6.weight.data
optmz6 = torch.optim.SGD(net6.parameters(),lr=0.1)
optmz6.step()
net6.bias.data,net6.weight.data
net=torch.nn.Linear(in_features=2,out_features=1,bias=False) ## 모형정의
optmz=torch.optim.SGD(net.parameters(),lr=0.1)
mseloss=torch.nn.MSELoss()
for epoc in range(100):
# step1: yhat
yhat=net(X) ## yhat 계산
# step2: loss
loss=mseloss(y.reshape(100,1),yhat)
# step3: derivation
loss.backward()
# step4: update
optmz.step()
optmz.zero_grad() ## 누적되는 기울기값 방지
list(net.parameters())
아래를 실행해보고 결과를 관찰하라.
net=torch.nn.Linear(in_features=2,out_features=1,bias=False) ## 모형정의
optmz=torch.optim.SGD(net.parameters(),lr=0.1)
mseloss=torch.nn.MSELoss()
for epoc in range(100):
# step1: yhat
yhat=net(X) ## yhat 계산
# step2: loss
loss=mseloss(y.reshape(100,1),yhat)
# step3: derivation
loss.backward()
# step4: update
optmz.step()
list(net.parameters())
CLASS > torch.optim.SGD(params, lr= 사용한 function step(closure=None) > Performs a single optimization step. Parameters > closure (callable, optional) – A closure that reevaluates the model and returns the loss. zero_grad(set_to_none=False)> Sets the gradients of all optimized torch.Tensor s to zero. Parameters> set_to_none (bool) – instead of setting to zero, set the grads to None. This will in general have lower memory footprint, and can modestly improve performance. However, it changes certain behaviors. For example: 1. When the user tries to access a gradient and perform manual ops on it, a None attribute or a Tensor full of 0s will behave differently. 2. If the user requests zero_grad(set_to_none=True) followed by a backward pass, .grads are guaranteed to be None for params that did not receive a gradient. 3. torch.optim optimizers have a different behavior if the gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether).