딥러닝 공부 추천 교재

한국어: 기계학습(오일석)(한빛아카데미)
원서: 심층학습(이안굿펠로우) 번역본보다는 원서로..

(-) 표시는 교수님 의견..!

Pytorch and Tensorflow - 컴퓨터 잘하는 사람들, 전문가용

장점: 밑바닥부터 구현이 가능하다는 장점, 그래서 어떠한 형태의 알고리즘이라도 구현가능
단점: 최종 아웃푹까지 만드는 코딩이 힘듦
- TR / TEST 분리, 배치, GPU, Dropout 등 귀찮은 작업이 있음
- 내가 할 줄 알아도 하기 싫음

Fastai and Keras - 컴퓨터에 익숙하지 않지만 딥러닝을 할 필요가 있는 사람들, 준전문가용

장점: 사람들이 쓰기 쉽게 구성되어 있음
단점: 지원하는 어플리케이션이 제한되어 있음

Tensorflow

산업체에서 많이 쓴다.
점유율이 Pytorch보다 앞서고 있다.
쓰기 어렵다. 구현방식이 별로이다. (-)
Google이 서포트한다는 점에서 장점이 있음
Google이 지원하니까 Tensorflow의 여러 단점들이 하나씩 극복되지 않을까? 하는 기대가 있음 (-)

Pytorch

학계에서 많이 쓴다.
컴퓨터 공학/통계/산업공학 의외의 다른 정공의 전문가들(=교수님들)이 선호 (-)
쓰기가 쉽고 코드 가독성이 높다. 즉, 깔끔하다. (-)
요즘 대세, 예제코드 자체가 Tensorflow를 압도하는 느낌 (-)
Meta(구 facebook)에서 지원, Google만큼의 안전성은 없음 (-)

Keras

과거에는 Tensorflow, Pytorch와 어깨를 나란히. 지금은 Tensorflow에 소속되어 있음
지금은 Tensorflow에 소속되어 있음
Google에서 공식 지원
현재는 Tensorflow <-> Keras 간의 부드러운 연동이 부족한 느낌 (-)
공식문서가 매우 잘 정리되어 있음

Fastai

Meta(구 facebook)에서 공식 지원하지 않음
공식 문서가 잘 정리되지 않음
적용되는 어플리케이션의 수가 Keras보다 적다.
파이토치와 연동이 부드러움, 그래서 코드를 뜯어보면서 공부하기 좋다. (-)

종합

Tensorflow + Keras / Pytorch + Fastai 중 하나는 반드시 할 줄 알아야 함,
- 둘 중에서는 Pytorch + Fastai가 좋아보이기도 하다! 가정일뿐
4개를 독립적으로 보면 Pytorch(연구목적), Keras(빠르개)가 좋다.

강인공지능(AGI)?

- https://zdnet.co.kr/view/?no=20160622145838

김 교수는 “인공지능은 사람보다 훌륭한 강인공지능과 사람보다 못하지만 사람에게 도움이 되는 약인공지능으로 나눌 수 있다”며 “90년대 후반부터 부활한 인공지능은 기계학습의 발전, 데이터로부터의 지식 추출, 빠른 컴퓨터와 다양한 데이터로 새로운 지식을 창출할 수 있는 방향으로 발전했다”고 설명했다. 또 “사람들은 지식의 오류를 인정하는 약인공지능으로 갔고 약인공지능은 확률, 통계 이론을 중심으로 발전하고 있다”고 덧붙였다. 강인공지능은 영화 터미네이터, 아이로봇에 나오는 로봇처럼 사람의 능력을 뛰어넘는 인공지능을 의미한다. 사람보다 강한 체력과 지능으로 인간이 못하는 일을 척척 해내는 인공지능이다. 반면 약인공지능은 바이센티니얼맨이나 A.I.에 나오는 로봇으로 감성 등 인간 고유의 특성을 넘을 수 없고 오류가 나기도 하지만 뛰어난 연산능력으로 사람의 업무에 도움을 주는 인공지능이다. 인공지능 초기에는 강인공지능이 대세를 이뤘다. 튜링테스트를 시작으로 초기의 자연어 처리 기능이 등장했지만 강인공지능은 커진 기대감을 충족시키지 못하고 큰 실망감을 안기며 사라졌다. 김 교수는 “70년대 중반부터 과학기술 투자 펀드가 인공지능 연구 지원을 끊었는데 이유는 결과물이 없었기 때문”이라고 설명했다.

대략 이렇게 설명.

약인공지능: 우리가 수업시간에 배운 AI
강인공지능: 우리가 영화에서 보던 AI

Timeline

2012년 힌튼 + 알렉스 우승
2016년 알파고 우승 3월
2016년 김용대교수님 6월
..
2020년 여름, GPT-3

GPT-3: AGI의 등장.

소설 쓰는 AI
작곡하는 AI
github compilot 코드를 이어서 자동으로 짜줌

- GPT3를 기점으로 언어모델의 발전이 눈부심 $\to$ AGI의 출현이라 말하는 사람도 있음

최근 뜨고 있는 모델

언어모델 실습

from fastai.text.all import * 
import numpy as np

path = untar_data(URLs.IMDB)

path

Path('/home/csy/.fastai/data/imdb')

files = get_text_files(path) 
files

(#100002) [Path('/home/csy/.fastai/data/imdb/train/neg/775_4.txt'),Path('/home/csy/.fastai/data/imdb/train/neg/8999_4.txt'),Path('/home/csy/.fastai/data/imdb/train/neg/3814_1.txt'),Path('/home/csy/.fastai/data/imdb/train/neg/834_4.txt'),Path('/home/csy/.fastai/data/imdb/train/neg/5250_4.txt'),Path('/home/csy/.fastai/data/imdb/train/neg/10016_4.txt'),Path('/home/csy/.fastai/data/imdb/train/neg/10241_1.txt'),Path('/home/csy/.fastai/data/imdb/train/neg/1775_4.txt'),Path('/home/csy/.fastai/data/imdb/train/neg/7746_1.txt'),Path('/home/csy/.fastai/data/imdb/train/neg/7721_4.txt')...]

files: path의 모든 하위폴더에 존재하는 text파일들을 목록화하여 저장한것

is_lm=False

가 default고 텍스트 입력하면 긍정인지 부정인지 맞추는것

is_lm=True

생성하는 모형에 관심있으니까.

dls = DataBlock(
    blocks=TextBlock.from_folder(path,is_lm=True), 
    get_items=get_text_files, splitter=RandomSplitter(0.1)
).dataloaders(path,bs=128,seq_len=80)

seq_len의 길이에 따라 나타남,
is_lm(언어모델) false로 놓으면 text만 나타나

dls.show_batch()

xxbox: 새로운 텍스트의 시작
xxmaj: 다음단어가 대문자임로 시작함을 의미함 (모든단어는 기본적으로 소문자로 생각함)

lrnr = language_model_learner(dls,AWD_LSTM,metrics=accuracy).to_fp16()

to_fp16()

계산을 대충하는 대신,, 가볍게 만드는..

lrnr.fit_one_cycle(5)

lrnr.predict('I liked this movie because',40)

"i liked this movie because it cells wooden much . It something any other affects the guy a way , although Susan Sarandon does n't dare to discuss this . i went to step up the movie and watch it on the"

말이 안되는게 있긴 하겠지만 그럴듯해보임
실행할때마다 달라

문제의설계

간단한 예제를 통한 이해

text = 'h e l l o '*100 
text

'h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o h e l l o '

tokens = text.split(' ')[:-1]
tokens[:10]

['h', 'e', 'l', 'l', 'o', 'h', 'e', 'l', 'l', 'o']

- 바로직전의 문자로 다음문자를 맞춰보자

hello니까, h $\to$ e, e$\to$ l, l $\to$ l/o (?), o $\to$ h, ...
l 다음에 올 문자가 조금 애매하다.

- 마치 아래의 표에서 $X \to y$인 맵핑을 알아차려 $X$를 보고 $y$를 예측하듯이

X	y
1	2
2	4
3	6
1	2
2	4
3	6
1	2
2	4
...	...

아래의 규칙을 알아차리는 것이 목표이다.

X	y
h	e
e	l
l	l/o
o	h
h	e
...	...

Embedding

- X,y를 설정하자.

len(tokens)

500

X= tokens[:(len(tokens)-1)]
y= tokens[1:]

x다음 y가 이어지는 알파벳 오도록 설정

X[0],y[0]

('h', 'e')

X[1],y[1]

('e', 'l')

print(X[:10])
print(y[:10])

['h', 'e', 'l', 'l', 'o', 'h', 'e', 'l', 'l', 'o']
['e', 'l', 'l', 'o', 'h', 'e', 'l', 'l', 'o', 'h']

- 이제 문자를 숫자로 바꾸어서 컴퓨터가 이해할수 있는 형태, 즉 학습가능한 형태로 만들자.

dic = {'h':0, 'e':1, 'l':2, 'o':3} 
dic

{'h': 0, 'e': 1, 'l': 2, 'o': 3}

dic['h'],dic['e'],dic['l'],dic['o']

(0, 1, 2, 3)

nums = [dic[i] for i in tokens]

tokens[:10], nums[:10]

(['h', 'e', 'l', 'l', 'o', 'h', 'e', 'l', 'l', 'o'],
 [0, 1, 2, 2, 3, 0, 1, 2, 2, 3])

- (맵핑방식1) 아래와 같이 문자와 숫자를 맵핑하였다.

문자(tokens)	숫자(nums)
'h'	0
'e'	1
'l'	2
'l'	2
'o'	3
'h'	0
'e'	1
'l'	2
'l'	2
'o'	3
...	...

- (맵핑방식2) 위의 방식보다 아래의 방식이 더 의미상 좋다. 위의 방식대로 맵핑하면하면 의미가 e=1, l=2가 되는데 그렇다고 해서 l이 e보다 2배 강한 입력을 의미하는 것은 아니잖음?

문자(tokens)	숫자(nums)
'h'	1,0,0,0
'e'	0,1,0,0
'l'	0,0,1,0
'l'	0,0,1,0
'o'	0,0,0,1
'h'	1,0,0,0
'e'	0,1,0,0
'l'	0,0,1,0
'l'	0,0,1,0
'o'	0,0,0,1
...	...

- 맵핑방식2로 처리하고 싶은데, 데이터 전처리 하기가 너무 힘들것 같다.

그런데 이러한것은 빈번하게 일어나는 상황
누군가가 구해놓지 않았을까?
torch.nn.Embedding

- 맵핑방식1의 구현

_x = torch.tensor([[0.0],[1.0],[2.0],[2.0],[3.0],[0.0],[1.0],[2.0],[2.0],[3.0]])
_x

tensor([[0.],
        [1.],
        [2.],
        [2.],
        [3.],
        [0.],
        [1.],
        [2.],
        [2.],
        [3.]])

column으로 만들기 위해 각각을 []에 넣어줌

_l1 = torch.nn.Linear(in_features=1, out_features=20, bias=False)

_l1(_x).shape

torch.Size([10, 20])

_l1(_x)

tensor([[ 0.0000, -0.0000,  0.0000,  0.0000, -0.0000, -0.0000, -0.0000,  0.0000,
         -0.0000,  0.0000, -0.0000,  0.0000,  0.0000, -0.0000,  0.0000, -0.0000,
          0.0000, -0.0000, -0.0000,  0.0000],
        [ 0.3879, -0.5278,  0.6592,  0.5050, -0.4057, -0.1697, -0.9148,  0.7739,
         -0.0408,  0.9071, -0.9085,  0.7083,  0.0870, -0.0630,  0.0993, -0.9907,
          0.7534, -0.1981, -0.1808,  0.9388],
        [ 0.7759, -1.0555,  1.3183,  1.0099, -0.8113, -0.3395, -1.8297,  1.5477,
         -0.0815,  1.8141, -1.8170,  1.4166,  0.1741, -0.1261,  0.1987, -1.9813,
          1.5068, -0.3961, -0.3617,  1.8776],
        [ 0.7759, -1.0555,  1.3183,  1.0099, -0.8113, -0.3395, -1.8297,  1.5477,
         -0.0815,  1.8141, -1.8170,  1.4166,  0.1741, -0.1261,  0.1987, -1.9813,
          1.5068, -0.3961, -0.3617,  1.8776],
        [ 1.1638, -1.5833,  1.9775,  1.5149, -1.2170, -0.5092, -2.7445,  2.3216,
         -0.1223,  2.7212, -2.7255,  2.1250,  0.2611, -0.1891,  0.2980, -2.9720,
          2.2602, -0.5942, -0.5425,  2.8165],
        [ 0.0000, -0.0000,  0.0000,  0.0000, -0.0000, -0.0000, -0.0000,  0.0000,
         -0.0000,  0.0000, -0.0000,  0.0000,  0.0000, -0.0000,  0.0000, -0.0000,
          0.0000, -0.0000, -0.0000,  0.0000],
        [ 0.3879, -0.5278,  0.6592,  0.5050, -0.4057, -0.1697, -0.9148,  0.7739,
         -0.0408,  0.9071, -0.9085,  0.7083,  0.0870, -0.0630,  0.0993, -0.9907,
          0.7534, -0.1981, -0.1808,  0.9388],
        [ 0.7759, -1.0555,  1.3183,  1.0099, -0.8113, -0.3395, -1.8297,  1.5477,
         -0.0815,  1.8141, -1.8170,  1.4166,  0.1741, -0.1261,  0.1987, -1.9813,
          1.5068, -0.3961, -0.3617,  1.8776],
        [ 0.7759, -1.0555,  1.3183,  1.0099, -0.8113, -0.3395, -1.8297,  1.5477,
         -0.0815,  1.8141, -1.8170,  1.4166,  0.1741, -0.1261,  0.1987, -1.9813,
          1.5068, -0.3961, -0.3617,  1.8776],
        [ 1.1638, -1.5833,  1.9775,  1.5149, -1.2170, -0.5092, -2.7445,  2.3216,
         -0.1223,  2.7212, -2.7255,  2.1250,  0.2611, -0.1891,  0.2980, -2.9720,
          2.2602, -0.5942, -0.5425,  2.8165]], grad_fn=<MmBackward0>)

입력: (10,1)
출력: (10,20)

- 맵핑방식2의 구현

e1= torch.nn.Embedding(num_embeddings=4, embedding_dim=20)

_x = torch.tensor([0,1,2,2,3,0,1,2,2,3])
_x

tensor([0, 1, 2, 2, 3, 0, 1, 2, 2, 3])

e1(_x)

tensor([[ 0.2254, -0.3840,  0.9268,  0.4517,  2.2971, -0.4934, -1.3508, -1.3967,
          0.2507,  1.0178, -1.1175, -0.3891, -1.1492,  0.6222, -0.9820,  0.0778,
          0.5645, -0.7270,  0.6511,  1.3918],
        [ 0.6158,  1.1618,  1.0513, -2.2748,  0.4146,  1.1175,  1.0338,  0.9896,
          0.3465,  0.2706,  2.2675, -0.3080, -0.5355,  0.0450, -0.0675, -0.4521,
         -0.1553,  0.3424,  0.5461,  1.5637],
        [ 0.5781, -0.6058,  0.9321, -1.0924,  0.4152, -0.4331, -1.3501,  2.2705,
         -0.0448, -0.1180,  0.1774,  0.6332, -0.2868,  1.0492, -0.4399,  0.5393,
          1.0011, -0.7080,  0.5177, -0.8671],
        [ 0.5781, -0.6058,  0.9321, -1.0924,  0.4152, -0.4331, -1.3501,  2.2705,
         -0.0448, -0.1180,  0.1774,  0.6332, -0.2868,  1.0492, -0.4399,  0.5393,
          1.0011, -0.7080,  0.5177, -0.8671],
        [-1.9725,  0.1694, -0.9259,  0.8846, -0.7848, -1.3822,  1.0476,  0.5490,
         -0.4070, -0.5807, -0.6074, -0.4921, -0.3262,  1.4773,  1.3528,  0.0930,
          1.1281, -0.0695,  0.3036,  1.9600],
        [ 0.2254, -0.3840,  0.9268,  0.4517,  2.2971, -0.4934, -1.3508, -1.3967,
          0.2507,  1.0178, -1.1175, -0.3891, -1.1492,  0.6222, -0.9820,  0.0778,
          0.5645, -0.7270,  0.6511,  1.3918],
        [ 0.6158,  1.1618,  1.0513, -2.2748,  0.4146,  1.1175,  1.0338,  0.9896,
          0.3465,  0.2706,  2.2675, -0.3080, -0.5355,  0.0450, -0.0675, -0.4521,
         -0.1553,  0.3424,  0.5461,  1.5637],
        [ 0.5781, -0.6058,  0.9321, -1.0924,  0.4152, -0.4331, -1.3501,  2.2705,
         -0.0448, -0.1180,  0.1774,  0.6332, -0.2868,  1.0492, -0.4399,  0.5393,
          1.0011, -0.7080,  0.5177, -0.8671],
        [ 0.5781, -0.6058,  0.9321, -1.0924,  0.4152, -0.4331, -1.3501,  2.2705,
         -0.0448, -0.1180,  0.1774,  0.6332, -0.2868,  1.0492, -0.4399,  0.5393,
          1.0011, -0.7080,  0.5177, -0.8671],
        [-1.9725,  0.1694, -0.9259,  0.8846, -0.7848, -1.3822,  1.0476,  0.5490,
         -0.4070, -0.5807, -0.6074, -0.4921, -0.3262,  1.4773,  1.3528,  0.0930,
          1.1281, -0.0695,  0.3036,  1.9600]], grad_fn=<EmbeddingBackward0>)

입력 (10,1)
출력 (10,20)

- torch.nn.Linear(), torch.nn.Embedding() 의 차이가 없어보인다? $\to$ 파라메터를 조사하면 차이가 있다

len(list(_l1.parameters())[0])

20

list(e1.parameters())[0].shape

torch.Size([4, 20])

- 결국에는 맵핑방식1의 경우 아래와 같이 이해할 수 있고

${\bf X}$: (10,1)
${\bf W}$: (1,20)
${\bf XW}$: (10,20)

- 맵핑방식2의 경우 아래와 같이 이해가능하다.

${\bf X}$: (10,1)
$\tilde{\bf X}$: (10,4)
${\bf W}$: (4,20)
$\tilde{\bf X}{\bf W}$: (10,20)

여기서 4는 (0,0,0,0)(0,1,0,0)(0,0,1,0)(0,0,0,1)을 의미함

- 결국 우리가 맵핑방식2처럼 구현하고 싶다고 해도, 입력은 아래와 같이 넣어도 무방하다. 이후에는 파이토치의 torch.nn.Embedding()이 알아서 해결해준다.

_x

tensor([0, 1, 2, 2, 3, 0, 1, 2, 2, 3])

네트워크 구축

- 이제 숫자화된 자료 nums를 이용하여 다시 X,y를 선언하자.

X = torch.tensor(nums[:499]) 
y = torch.tensor(nums[1:])

X[0],y[0]

(tensor(0), tensor(1))

위에서는 h,e였지

X[1],y[1]

(tensor(1), tensor(2))

위에서는 e,l이었지

- 간단한 네트워크를 설계하자.

e1=torch.nn.Embedding(num_embeddings=4, embedding_dim=20) 
l1=torch.nn.Linear(in_features=20,out_features=20)
a1=torch.nn.ReLU()
l2=torch.nn.Linear(in_features=20,out_features=4) 
a2=torch.nn.Softmax()

X.shape, e1(X).shape

(torch.Size([499]), torch.Size([499, 20]))

e1(X).shape, a1(l1(e1(X))).shape

(torch.Size([499, 20]), torch.Size([499, 20]))

a1(l1(e1(X))).shape, l2(a1(l1(e1(X)))).shape

(torch.Size([499, 20]), torch.Size([499, 4]))

a2(l2(a1(l1(e1(X))))).shape

<ipython-input-45-6a5d66616296>:1: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  a2(l2(a1(l1(e1(X))))).shape

torch.Size([499, 4])

$X$의 차원이 정확하게 명시되지 않아서 대충 컴퓨터가 알아서 계산했다라는 뜻의 워닝

- warning이 떠도 맞게 실행되었나 확인하기 위해 softmax를 수동으로 직접 계산해봄

l2(a1(l1(e1(X))))[0]

tensor([ 0.2303, -0.2560,  0.0649, -0.1178], grad_fn=<SelectBackward0>)

np.exp(0.2303)/(np.exp(0.2303)+np.exp( -0.2560)+np.exp(0.0649)+np.exp(-0.1178))

0.31560872675242624

np.exp(-0.2560)/(np.exp(0.2303)+np.exp(-0.2560)+np.exp(0.0649)+np.exp(-0.1178))

0.1940669572337474

np.exp(0.0649)/(np.exp(0.2303)+np.exp( -0.2560)+np.exp(0.0649)+np.exp(-0.1178))

0.2674956327129962

np.exp(-0.1178)/(np.exp(0.2303)+np.exp( -0.2560)+np.exp(0.0649)+np.exp(-0.1178))

0.22282868330083022

a2(l2(a1(l1(e1(X)))))[0]

<ipython-input-50-17d00ccf79de>:1: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  a2(l2(a1(l1(e1(X)))))[0]

tensor([0.3156, 0.1941, 0.2675, 0.2228], grad_fn=<SelectBackward0>)

잘 계산된것 같다.

위에서 학습을 잘했다면, e가 (0,1,0,0) 이런식으로 나오게끔 값이 조정될것

- 순전파의 차원변화 요약

torch.Size([499]) # X
torch.Size([499, 20]) # e1이후
torch.Size([499, 20]) # l1이후  
torch.Size([499, 20]) # a1이후 
torch.Size([499, 4]) # l1이후 
torch.Size([499, 4]) # a2이후 = yhat

net = torch.nn.Sequential(
    torch.nn.Embedding(num_embeddings=4,embedding_dim=20),
    torch.nn.Linear(in_features=20,out_features=20), 
    torch.nn.ReLU(),
    torch.nn.Linear(in_features=20,out_features=4))
    #torch.nn.Softmax()

softmax는 손실함수로 빼서 계산할 거야

net(X)

tensor([[-0.2585,  0.1341,  0.0437, -0.1027],
        [-0.3168,  0.0997, -0.0775, -0.2730],
        [-0.3455,  0.2284, -0.0816,  0.1940],
        ...,
        [-0.3168,  0.0997, -0.0775, -0.2730],
        [-0.3455,  0.2284, -0.0816,  0.1940],
        [-0.3455,  0.2284, -0.0816,  0.1940]], grad_fn=<AddmmBackward0>)

- 손실함수, 옵티마이저

loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters())

- 학습

for i in range(1000): 
    ## 1 
    yhat = net(X)
    ## 2 
    loss = loss_fn(yhat,y) 
    ## 3 
    loss.backward()
    ## 4 
    optimizer.step()
    optimizer.zero_grad()

X[:7]

tensor([0, 1, 2, 2, 3, 0, 1])

net(X)[:7]

tensor([[-3.0674,  5.6732, -1.9852, -3.5369],
        [-4.0728, -3.9873,  4.0940, -3.9972],
        [-4.8514, -3.8910,  3.7314,  3.7311],
        [-4.8514, -3.8910,  3.7314,  3.7311],
        [ 5.3389, -2.6426, -2.3245, -3.6480],
        [-3.0674,  5.6732, -1.9852, -3.5369],
        [-4.0728, -3.9873,  4.0940, -3.9972]], grad_fn=<SliceBackward0>)

학습이 잘 되었다.

a2(net(X)[:7])

<ipython-input-62-407a0f11aae6>:1: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  a2(net(X)[:7])

tensor([[1.5983e-04, 9.9927e-01, 4.7173e-04, 9.9953e-05],
        [2.8370e-04, 3.0900e-04, 9.9910e-01, 3.0596e-04],
        [9.3631e-05, 2.4464e-04, 4.9992e-01, 4.9975e-01],
        [9.3631e-05, 2.4464e-04, 4.9992e-01, 4.9975e-01],
        [9.9906e-01, 3.4141e-04, 4.6926e-04, 1.2492e-04],
        [1.5983e-04, 9.9927e-01, 4.7173e-04, 9.9953e-05],
        [2.8370e-04, 3.0900e-04, 9.9910e-01, 3.0596e-04]],
       grad_fn=<SoftmaxBackward0>)

net의 개선

- 단어수(num_embeddings)가 4에서 바뀔때마다 아래를 반복하여 입력해야할까?

net = torch.nn.Sequential(
    torch.nn.Embedding(num_embeddings=4,embedding_dim=20),
    torch.nn.Linear(in_features=20,out_features=20), 
    torch.nn.ReLU(),
    torch.nn.Linear(in_features=20,out_features=4))
    #torch.nn.Softmax()

- net을 찍어내는 무언가가 있으면 좋겠다. 제가 만들어볼게요!

class BDA(Module): 
    def __init__(self, num_embeddings): 
        self.embedding = torch.nn.Embedding(num_embeddings,20)
        self.linear1 = torch.nn.Linear(in_features=20,out_features=20)
        self.relu = torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(in_features=20,out_features=num_embeddings)
    def forward(self, X): # net(X)를 계산해주는 방식 
        u=self.linear1(self.embedding(X))
        v=self.relu(u)
        return self.linear2(v) # net(X)의 결과

net2 = BDA(4)

net

Sequential(
  (0): Embedding(4, 20)
  (1): Linear(in_features=20, out_features=20, bias=True)
  (2): ReLU()
  (3): Linear(in_features=20, out_features=4, bias=True)
)

net2

BDA(
  (embedding): Embedding(4, 20)
  (linear1): Linear(in_features=20, out_features=20, bias=True)
  (relu): ReLU()
  (linear2): Linear(in_features=20, out_features=4, bias=True)
)

- net2도 학습하여 net와 동일한 결과가 나오는지 체크해보자.

loss_fn= torch.nn.CrossEntropyLoss()
optimizer2 = torch.optim.Adam(net2.parameters())

for i in range(1000):
    ## 1 
    yhat = net2(X) 
    ## 2 
    loss = loss_fn(yhat,y) 
    ## 3
    loss.backward()
    ## 4 
    optimizer2.step()
    optimizer2.zero_grad()

net2(X)

tensor([[-3.7872,  5.4197, -2.2008, -3.8995],
        [-4.3267, -5.5556,  4.3692, -3.0839],
        [-5.8338, -5.0290,  2.7784,  2.7775],
        ...,
        [-4.3267, -5.5556,  4.3692, -3.0839],
        [-5.8338, -5.0290,  2.7784,  2.7775],
        [-5.8338, -5.0290,  2.7784,  2.7775]], grad_fn=<AddmmBackward0>)

net(X)

tensor([[-3.0674,  5.6732, -1.9852, -3.5369],
        [-4.0728, -3.9873,  4.0940, -3.9972],
        [-4.8514, -3.8910,  3.7314,  3.7311],
        ...,
        [-4.0728, -3.9873,  4.0940, -3.9972],
        [-4.8514, -3.8910,  3.7314,  3.7311],
        [-4.8514, -3.8910,  3.7314,  3.7311]], grad_fn=<AddmmBackward0>)

- net2도 잘 학습되었다.

이전 2개의 글자를 보고 다음 글자를 맞추어보자.

- X,y 를 다시 설정하자.

X = torch.tensor([nums[:498],nums[1:499]]).T
y = torch.tensor(nums[2:])

X[0],y[0] # h,e -> l

(tensor([0, 1]), tensor(2))

X[1],y[1] # e,l -> l

(tensor([1, 2]), tensor(2))

X[2],y[2] # l,l -> o

(tensor([2, 2]), tensor(3))

X[3],y[3] # l,o -> h

(tensor([2, 3]), tensor(0))

- 아키텍처를 대충 스케치하여 보자.

_e1 = torch.nn.Embedding(num_embeddings=4, embedding_dim=20)

X.shape, _e1(X).shape

(torch.Size([498, 2]), torch.Size([498, 2, 20]))

차원 하나가 늘어나 2가 된 모습

- 이전의 아키텍처는 아래와 같았음

torch.Size([499]) # X
torch.Size([499, 20]) # e1이후
torch.Size([499, 20]) # l1이후  
torch.Size([499, 20]) # a1이후 
torch.Size([499, 4]) # l1이후 
torch.Size([499, 4]) # a2이후 = yhat

- 마지막의 차원을 처리하기 애매해진다. $\to$ 순환망을 설계함

X[:,0]

tensor([0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2,
        3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2,
        2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1,
        2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0,
        1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3,
        0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2,
        3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2,
        2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1,
        2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0,
        1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3,
        0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2,
        3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2,
        2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1,
        2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0,
        1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3,
        0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2,
        3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2,
        2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1,
        2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0,
        1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3,
        0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2])

X[:,1]

tensor([1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3,
        0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2,
        3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2,
        2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1,
        2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0,
        1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3,
        0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2,
        3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2,
        2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1,
        2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0,
        1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3,
        0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2,
        3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2,
        2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1,
        2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0,
        1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3,
        0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2,
        3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2,
        2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1,
        2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0,
        1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2, 3, 0, 1, 2, 2])

class BDA2(Module): 
    def __init__(self, num_embeddings): 
        self.embedding = torch.nn.Embedding(num_embeddings,20)
        self.linear1 = torch.nn.Linear(in_features=20,out_features=20)
        self.relu = torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(in_features=20,out_features=num_embeddings)
    def forward(self, X): # net(X)를 계산해주는 방식 
        x1=X[:,0] # X의 첫번째 칼럼, y보다 2시점 이전  (x1,x2) -> y // (h,e) --> l 
        x2=X[:,1] # X의 두번째 칼럼, y보다 1시점 이전 
        h=self.relu(self.linear1(self.embedding(x1))) # x1 -> x2를 예측하는 네트워크의 일부 
        h2=self.relu(self.linear1(h+ self.embedding(x2))) # x2 -> y를 예측하는 네트어크의 일부 
        return self.linear2(h2) # net(X)의 결과

- 결국 최종출력인 self.linear2(h2)는 h와 x2가 담긴 함수이다. 그런데 h는 x1이 담긴 함수이다. 따라서 h2는 x2가 담겨있는 동시에 x1에 대한 정보도 약하게 담겨있다고 볼 수 있음

net3=BDA2(4) 
net3

BDA2(
  (embedding): Embedding(4, 20)
  (linear1): Linear(in_features=20, out_features=20, bias=True)
  (relu): ReLU()
  (linear2): Linear(in_features=20, out_features=4, bias=True)
)

net2

BDA(
  (embedding): Embedding(4, 20)
  (linear1): Linear(in_features=20, out_features=20, bias=True)
  (relu): ReLU()
  (linear2): Linear(in_features=20, out_features=4, bias=True)
)

구조의 차이는 없지만 순전파의 계산방식이 다르다! (그렇다면 역전파 계산방식도 다르겠죠?)

- 다시 학습해보자.

loss_fn = torch.nn.CrossEntropyLoss() 
optimizer3= torch. optim.Adam(net3.parameters())

for i in range(1000):
    ## 1 
    yhat = net3(X) 
    ## 2 
    loss = loss_fn(yhat,y) 
    ## 3 
    loss.backward()
    ## 4 
    optimizer3.step()
    optimizer3.zero_grad()

X[:5]

tensor([[0, 1],
        [1, 2],
        [2, 2],
        [2, 3],
        [3, 0]])

net3(X)[:5]

tensor([[-2.3154, -2.9011,  6.0370, -1.8242],
        [-3.7222, -2.4275,  6.0288, -2.0538],
        [-2.1673, -2.4698, -1.2548,  6.5036],
        [ 7.1155, -3.4118, -1.7391, -0.8178],
        [-4.5102,  5.4290, -3.0714, -4.2645]], grad_fn=<SliceBackward0>)

h,e $\to$ l
e,l $\to$ l
l,l $\to$ o
l,o $\to$ h
o,h $\to$ e

- 학습이 잘 되었다.

기사를 대신 써준다던가.
- 축구는 전반은 어떘다, 후반은 어땠다 등 넣어주면 y를 출력해내는 게 어렵지 않음
하지만 파라메터 많이 필요하고...
작년부터 다시 뜨기 시작함..
시계열이랑 RNN이랑 비슷

	text	text_
0	xxbos xxmaj seriously - avoid this movie at any cost . i just saw it in my first " sneak preview " ever and although i paid non - xxunk money for it , i walked out of the cinema after a mere 15 minutes . xxmaj which already includes 2 minutes of discussion among my friends whether or not to leave . xxmaj first time xxup ever i walked out of a movie . xxmaj and i lived through	xxmaj seriously - avoid this movie at any cost . i just saw it in my first " sneak preview " ever and although i paid non - xxunk money for it , i walked out of the cinema after a mere 15 minutes . xxmaj which already includes 2 minutes of discussion among my friends whether or not to leave . xxmaj first time xxup ever i walked out of a movie . xxmaj and i lived through some
1	l. xxmaj jackson , one of the most prolific actors in xxmaj hollywood , as " mace xxmaj windu " and xxmaj jimmy xxmaj smits , another instantly recognizable household name , as xxmaj senator " bail xxmaj xxunk " . xxmaj my xxmaj god , xxmaj lukas could have at least picked actors who have n't reached idol status yet , but no , he had to make his budget xxup larger . xxmaj the majority of people who	xxmaj jackson , one of the most prolific actors in xxmaj hollywood , as " mace xxmaj windu " and xxmaj jimmy xxmaj smits , another instantly recognizable household name , as xxmaj senator " bail xxmaj xxunk " . xxmaj my xxmaj god , xxmaj lukas could have at least picked actors who have n't reached idol status yet , but no , he had to make his budget xxup larger . xxmaj the majority of people who liked
2	will deceive you in this movie . xxmaj stephen xxmaj nichols is mis - cast as a young german student still bending under his father 's orders although the actor obviously looks near 40 years - old . xxmaj this makes his relationship ( a collection of copulation scenes , basically ) to a very young looking girl all the more disturbing . xxmaj the character 's have no dimension and the war depiction serves only as a backdrop for	deceive you in this movie . xxmaj stephen xxmaj nichols is mis - cast as a young german student still bending under his father 's orders although the actor obviously looks near 40 years - old . xxmaj this makes his relationship ( a collection of copulation scenes , basically ) to a very young looking girl all the more disturbing . xxmaj the character 's have no dimension and the war depiction serves only as a backdrop for this
3	germans . xxmaj maybe that s why so few films were made about the xxmaj merchant xxmaj navy , what s a war film without some nasty xxmaj nazis in xxunk with submachine guns running about and a few xxunk xxunk xxunk ? \n\n xxmaj quite possibly the best thing i can say about this film is that after seeing this as a kid i wanted to join the xxmaj merchant xxmaj navy , and i did , and xxmaj	. xxmaj maybe that s why so few films were made about the xxmaj merchant xxmaj navy , what s a war film without some nasty xxmaj nazis in xxunk with submachine guns running about and a few xxunk xxunk xxunk ? \n\n xxmaj quite possibly the best thing i can say about this film is that after seeing this as a kid i wanted to join the xxmaj merchant xxmaj navy , and i did , and xxmaj i
4	for the xxmaj champ to come home when he is out on a drinking binge . xxmaj champ 's ex - wife , socialite xxmaj linda , sees xxmaj andy and xxmaj dink at the racetrack one day and tries to convince xxmaj andy that xxmaj dink would be better off with her . xxmaj at first the xxmaj champ is xxunk . xxmaj however , when he gets a hold of a good sum of money and gambles it	the xxmaj champ to come home when he is out on a drinking binge . xxmaj champ 's ex - wife , socialite xxmaj linda , sees xxmaj andy and xxmaj dink at the racetrack one day and tries to convince xxmaj andy that xxmaj dink would be better off with her . xxmaj at first the xxmaj champ is xxunk . xxmaj however , when he gets a hold of a good sum of money and gambles it away
5	of xxmaj the xxmaj dead . xxbos xxmaj valentine is now one of my favorite slasher films . xxmaj the death scenes are elaborate and the most of the acting is good . xxmaj marley xxmaj shelton did great as the female lead ( much better than xxmaj jennifer xxmaj love xxmaj hewitt in the " i xxmaj know … " films ) . xxmaj david xxmaj boreanaz , whom is the main reason i saw this movie , had	xxmaj the xxmaj dead . xxbos xxmaj valentine is now one of my favorite slasher films . xxmaj the death scenes are elaborate and the most of the acting is good . xxmaj marley xxmaj shelton did great as the female lead ( much better than xxmaj jennifer xxmaj love xxmaj hewitt in the " i xxmaj know … " films ) . xxmaj david xxmaj boreanaz , whom is the main reason i saw this movie , had a
6	big budget disaster movie follows the formula set by any number of xxmaj hollywood films of the late 90 's ( i assume , having seen none of them ) , with the scale of disaster and tragedy bringing out the nobility of the human ( well , xxmaj japanese ) spirit in acts of heroism and sacrifice , and proving the power of love or something like that . i.e. it 's as naive in its psychology as it	budget disaster movie follows the formula set by any number of xxmaj hollywood films of the late 90 's ( i assume , having seen none of them ) , with the scale of disaster and tragedy bringing out the nobility of the human ( well , xxmaj japanese ) spirit in acts of heroism and sacrifice , and proving the power of love or something like that . i.e. it 's as naive in its psychology as it 's
7	reiser does an excellent job , although he is n't a great actor always that does n't mean that this did n't work actually xxmaj peter xxmaj falk and xxmaj paul xxmaj reiser plays the perfect xxmaj father and xxmaj son , the rest of the cast is good enough but you do n't see them as much so just say they do what they shall to get this to shine even more . \n\n xxmaj music : 10 /	does an excellent job , although he is n't a great actor always that does n't mean that this did n't work actually xxmaj peter xxmaj falk and xxmaj paul xxmaj reiser plays the perfect xxmaj father and xxmaj son , the rest of the cast is good enough but you do n't see them as much so just say they do what they shall to get this to shine even more . \n\n xxmaj music : 10 / 10
8	gets work again . xxmaj on top of that i hope the director never gets to make another film , and has his paycheck taken back for this crap . { xxunk out of 10 } xxbos xxmaj it makes me laugh when i read bad reviews of this movie . xxmaj no one claimed it was a classic , no claimed it would win awards or prizes for depth of storyline etc . \n\n xxmaj what it does have	work again . xxmaj on top of that i hope the director never gets to make another film , and has his paycheck taken back for this crap . { xxunk out of 10 } xxbos xxmaj it makes me laugh when i read bad reviews of this movie . xxmaj no one claimed it was a classic , no claimed it would win awards or prizes for depth of storyline etc . \n\n xxmaj what it does have is

epoch	train_loss	valid_loss	accuracy	time
0	4.473711	4.139537	0.282725	08:33
1	4.343895	4.037108	0.289659	08:25
2	4.293358	4.003340	0.291987	08:29
3	4.272007	3.990718	0.292921	08:31
4	4.262070	3.988740	0.293041	08:30