RNN (13주차)

Special Topics in Machine Learning



December 8, 2022

IMDB자료의 분석 (텍스트생성과 감성분류), 잡담


import torch
from fastai.text.all import *

IMDB 자료 텍스트 생성

(1) dls 생성

path = untar_data(URLs.IMDB)
files = get_text_files(path) 
(#100002) [Path('/home/cgb4/.fastai/data/imdb/unsup/8002_0.txt'),Path('/home/cgb4/.fastai/data/imdb/unsup/18305_0.txt'),Path('/home/cgb4/.fastai/data/imdb/unsup/1440_0.txt'),Path('/home/cgb4/.fastai/data/imdb/unsup/43259_0.txt'),Path('/home/cgb4/.fastai/data/imdb/unsup/391_0.txt'),Path('/home/cgb4/.fastai/data/imdb/unsup/14091_0.txt'),Path('/home/cgb4/.fastai/data/imdb/unsup/49382_0.txt'),Path('/home/cgb4/.fastai/data/imdb/unsup/33008_0.txt'),Path('/home/cgb4/.fastai/data/imdb/unsup/24903_0.txt'),Path('/home/cgb4/.fastai/data/imdb/unsup/43058_0.txt')...]
dls = DataBlock(
    get_items=get_text_files, splitter=RandomSplitter(0.1)
text text_
0 xxbos xxmaj that is quite an outdated movie which aims to showcase the youth 's yearning for freedom in some dehumanizing xxmaj british school . xxmaj oh yes it 's like in the army , you learn to obey and do what you 're asked to . xxmaj yes the young dream of something else but it breaks their dreams and sweeps away their optimism on the threshold of life . xxmaj great . \n\n xxmaj basically that 's how you could sum up the nice intentions in xxmaj if … xxmaj nice intentions that arouses no cinematographic challenges : the result is a declamatory movie . xxmaj do you see how boring i mean ? \n\n xxmaj at least that oldie helped xxmaj kubrick cast mcdowell in xxmaj that is quite an outdated movie which aims to showcase the youth 's yearning for freedom in some dehumanizing xxmaj british school . xxmaj oh yes it 's like in the army , you learn to obey and do what you 're asked to . xxmaj yes the young dream of something else but it breaks their dreams and sweeps away their optimism on the threshold of life . xxmaj great . \n\n xxmaj basically that 's how you could sum up the nice intentions in xxmaj if … xxmaj nice intentions that arouses no cinematographic challenges : the result is a declamatory movie . xxmaj do you see how boring i mean ? \n\n xxmaj at least that oldie helped xxmaj kubrick cast mcdowell in a
1 true facts is more than annoying . xxmaj hitler was n't anti - semitic in his youth , he even worked for xxmaj jews before world war one . xxmaj it was however during world war one and after that he formed his views about the xxmaj jews . xxmaj his upbringing in this movie is also inaccurate , xxmaj hitler as a child was n't a disturbed little brat . xxmaj he had a more or less normal upbringing . xxmaj nothing is mentioned about his lost brothers and other important pieces that adds to the puzzle that is xxmaj hitler . \n\n xxmaj robert xxmaj carlyle is a great actor but he does n't really fit in the role as xxmaj hitler . xxmaj hitler was facts is more than annoying . xxmaj hitler was n't anti - semitic in his youth , he even worked for xxmaj jews before world war one . xxmaj it was however during world war one and after that he formed his views about the xxmaj jews . xxmaj his upbringing in this movie is also inaccurate , xxmaj hitler as a child was n't a disturbed little brat . xxmaj he had a more or less normal upbringing . xxmaj nothing is mentioned about his lost brothers and other important pieces that adds to the puzzle that is xxmaj hitler . \n\n xxmaj robert xxmaj carlyle is a great actor but he does n't really fit in the role as xxmaj hitler . xxmaj hitler was n't
2 is really great . xxmaj one to hunt down and watch . xxmaj look out for it ! xxmaj ten out of ten . xxbos xxmaj sinatra was xxup ok , but xxmaj sterling xxmaj hayden was just _ so _ shallow and seemed to be rushing thru some of his scenes like he had something more important to get to . xxmaj my wife and i both burst out laughing at some of the scenes . xxmaj for instance , in the opening scene when the the guy asks the deputy what the name of this town is and the deputy says " suddenly . " " suddenly what ? " the guy responds , and the deputy says " no , that 's the name of really great . xxmaj one to hunt down and watch . xxmaj look out for it ! xxmaj ten out of ten . xxbos xxmaj sinatra was xxup ok , but xxmaj sterling xxmaj hayden was just _ so _ shallow and seemed to be rushing thru some of his scenes like he had something more important to get to . xxmaj my wife and i both burst out laughing at some of the scenes . xxmaj for instance , in the opening scene when the the guy asks the deputy what the name of this town is and the deputy says " suddenly . " " suddenly what ? " the guy responds , and the deputy says " no , that 's the name of the
3 anyone watching this film . \n\n i think this is one of xxmaj keira xxmaj knightley 's better films , and i think she 's a brilliant actress , and was excellent for the role . xxmaj parminder xxmaj nagra was brilliant too . xxmaj sadly , i ca n't say this for xxmaj jonathan rhys - meyers , because i do n't think that he was that much of a good actor , and to be honest , his eyes were a little scary . \n\n xxmaj all in all , a brilliant film , and a brilliant story xxbos xxmaj in the very first episode of xxmaj friends , which aired 22 xxmaj sept 1994 " the xxmaj one xxmaj where xxmaj monica xxmaj gets a watching this film . \n\n i think this is one of xxmaj keira xxmaj knightley 's better films , and i think she 's a brilliant actress , and was excellent for the role . xxmaj parminder xxmaj nagra was brilliant too . xxmaj sadly , i ca n't say this for xxmaj jonathan rhys - meyers , because i do n't think that he was that much of a good actor , and to be honest , his eyes were a little scary . \n\n xxmaj all in all , a brilliant film , and a brilliant story xxbos xxmaj in the very first episode of xxmaj friends , which aired 22 xxmaj sept 1994 " the xxmaj one xxmaj where xxmaj monica xxmaj gets a xxmaj
4 tag the xxmaj national xxmaj lampoon name to a bad flick that most likely needed to recoup as much of it 's low budget as possible . i guarantee there will be a lot of xxunk hoping for boobies and hi - jinx to grab it from the shelves , only to be disappointed when nothing funny or exciting happens well into the film . i sat through it , cringing and bored , hoping for a possible " hilarious " catchphrase to use one day . xxmaj all i got was lethargic , disappointed , and an ending to a movie that sucked as bad as the main character 's blue balls . xxbos xxmaj this is truly abysmal . i just got a copy of " the xxmaj national xxmaj lampoon name to a bad flick that most likely needed to recoup as much of it 's low budget as possible . i guarantee there will be a lot of xxunk hoping for boobies and hi - jinx to grab it from the shelves , only to be disappointed when nothing funny or exciting happens well into the film . i sat through it , cringing and bored , hoping for a possible " hilarious " catchphrase to use one day . xxmaj all i got was lethargic , disappointed , and an ending to a movie that sucked as bad as the main character 's blue balls . xxbos xxmaj this is truly abysmal . i just got a copy of " disco
5 xxbos i noticed with some amusement that in the end credits , the xxmaj detroit xxup pd is thanked for their participation . xxmaj the xxmaj chief of xxmaj police even has one speaking line playing himself ( and boy , can you tell he ca n't act ) . xxmaj the reason for the amusement is that in this movie the police shoot first and ask questions later . xxmaj not the kind of xxup pr , i would think a police force would want . xxmaj other than that , this is your standard cops and robbers film dressed up for the ' 70 's with a racial angle . xxmaj alex xxmaj rocco is given a thankless role of a lifer cop that ca n't i noticed with some amusement that in the end credits , the xxmaj detroit xxup pd is thanked for their participation . xxmaj the xxmaj chief of xxmaj police even has one speaking line playing himself ( and boy , can you tell he ca n't act ) . xxmaj the reason for the amusement is that in this movie the police shoot first and ask questions later . xxmaj not the kind of xxup pr , i would think a police force would want . xxmaj other than that , this is your standard cops and robbers film dressed up for the ' 70 's with a racial angle . xxmaj alex xxmaj rocco is given a thankless role of a lifer cop that ca n't get
6 of xxunk is sharply broken as a cow gives birth to a calf with the face of a human whom screams that something horrendous is coming before falling dead like the abomination it is ( it is quite possible that the sheer hideousness of the creature is some bizarre xxmaj xxunk homage ) . \n\n xxmaj following an incredible introduction for main baddie xxmaj kato , and his henchwoman xxmaj agi ( a surprisingly attractive xxmaj chiaki xxmaj xxunk ) , by way of an apocalyptic army raising . xxmaj the story reverts to normal for a while , but it does n't take long before any and all logic goes down the drain and the young boy teams up with a group of xxmaj miyazaki rejects to xxunk is sharply broken as a cow gives birth to a calf with the face of a human whom screams that something horrendous is coming before falling dead like the abomination it is ( it is quite possible that the sheer hideousness of the creature is some bizarre xxmaj xxunk homage ) . \n\n xxmaj following an incredible introduction for main baddie xxmaj kato , and his henchwoman xxmaj agi ( a surprisingly attractive xxmaj chiaki xxmaj xxunk ) , by way of an apocalyptic army raising . xxmaj the story reverts to normal for a while , but it does n't take long before any and all logic goes down the drain and the young boy teams up with a group of xxmaj miyazaki rejects to take
7 power to self heal any sort of wound or illness . xxmaj desperate for his boy to live xxmaj richard agrees but the procedure has unwanted side effects like turning xxmaj eddie into a brain eating zombie which is just not a good thing … \n\n xxmaj executive produced , written & directed by xxmaj steve xxmaj franke xxmaj i 'll be perfectly frank myself & say xxmaj serum is awful , xxmaj serum is one of those no budget horror films which tries to rip - off other any number of other 's & ends up being slightly more fun than having you fingernails pulled out with pliers . xxmaj the script is terrible , it has the whole re - animator ( 1985 ) feel to to self heal any sort of wound or illness . xxmaj desperate for his boy to live xxmaj richard agrees but the procedure has unwanted side effects like turning xxmaj eddie into a brain eating zombie which is just not a good thing … \n\n xxmaj executive produced , written & directed by xxmaj steve xxmaj franke xxmaj i 'll be perfectly frank myself & say xxmaj serum is awful , xxmaj serum is one of those no budget horror films which tries to rip - off other any number of other 's & ends up being slightly more fun than having you fingernails pulled out with pliers . xxmaj the script is terrible , it has the whole re - animator ( 1985 ) feel to it
8 an expert in this field . xxmaj i 'd be very interested to know your opinion . " \n\n xxmaj marco : " i think it 's unwise to use movies as a guide for reality , do n't you inspector ? " \n\n xxmaj inspector xxmaj alan xxmaj santini : " depends what you mean by reality . " \n\n xxmaj being that this is a giallo , stylish murders are a must and xxmaj dario does not disappoint ( the " bullet through the door " scene is quite possibly one of the greatest deaths ever shot , if you 'll forgive the pun ) . xxmaj the black - gloved , deep - voiced , pulsating brained ( cool shots ! ) killer is cold expert in this field . xxmaj i 'd be very interested to know your opinion . " \n\n xxmaj marco : " i think it 's unwise to use movies as a guide for reality , do n't you inspector ? " \n\n xxmaj inspector xxmaj alan xxmaj santini : " depends what you mean by reality . " \n\n xxmaj being that this is a giallo , stylish murders are a must and xxmaj dario does not disappoint ( the " bullet through the door " scene is quite possibly one of the greatest deaths ever shot , if you 'll forgive the pun ) . xxmaj the black - gloved , deep - voiced , pulsating brained ( cool shots ! ) killer is cold and

(2) lrnr 오브젝트 생성

lrnr = language_model_learner(dls,AWD_LSTM,metrics=[accuracy,Perplexity()]).to_fp16()
  • .to_fp16()torch.float16으로 숫자들을 저장하겠다는 의미 (GPU의 메모리를 아낄 수 있음)

Perplexity 헷갈리는 정도

(3) 학습

epoch train_loss valid_loss accuracy perplexity time
0 4.372098 4.055620 0.288855 57.720924 12:10
epoch train_loss valid_loss accuracy perplexity time
0 4.167530 3.924835 0.300119 50.644745 12:43
1 4.085549 3.857475 0.305903 47.345631 12:52
2 4.057736 3.843057 0.307237 46.667915 13:01
  • perplexity: 낮을수록 좋음.

(4) 예측

lrnr.predict('I liked this movie',40) 
"i liked this movie . Where i stopped in the early 80 's end with repairs and technology and it could have been edited much younger than i had ever seen , the whole story from National Geographic was the inspiration"
lrnr.predict('I hate this movie',40) 
"i hate this movie . i consider it a great movie , but when i saw it , I 'd like to say i got just watching it and wasted my time watching this movie . There was no people laughing at"

좋아하는 영화와 싫어하는 영화의 평이 다름.

IMDB 자료 분류

(1) dls 생성

path = untar_data(URLs.IMDB)
!ls '/home/cgb4/.fastai/data/imdb'
README  imdb.vocab  test  tmp_clas  tmp_lm  train  unsup
dls = TextDataLoaders.from_folder(path, valid='test')
text category
0 xxbos xxmaj match 1 : xxmaj tag xxmaj team xxmaj table xxmaj match xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley vs xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley started things off with a xxmaj tag xxmaj team xxmaj table xxmaj match against xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit . xxmaj according to the rules of the match , both opponents have to go through tables in order to get the win . xxmaj benoit and xxmaj guerrero heated up early on by taking turns hammering first xxmaj spike and then xxmaj bubba xxmaj ray . a xxmaj german xxunk by xxmaj benoit to xxmaj bubba took the wind out of the xxmaj dudley brother . xxmaj spike tried to help his brother , but the referee restrained him while xxmaj benoit and xxmaj guerrero pos
1 xxbos xxmaj titanic directed by xxmaj james xxmaj cameron presents a fictional love story on the historical setting of the xxmaj titanic . xxmaj the plot is simple , xxunk , or not for those who love plots that twist and turn and keep you in suspense . xxmaj the end of the movie can be figured out within minutes of the start of the film , but the love story is an interesting one , however . xxmaj kate xxmaj winslett is wonderful as xxmaj rose , an aristocratic young lady betrothed by xxmaj cal ( billy xxmaj zane ) . xxmaj early on the voyage xxmaj rose meets xxmaj jack ( leonardo dicaprio ) , a lower class artist on his way to xxmaj america after winning his ticket aboard xxmaj titanic in a poker game . xxmaj if he wants something , he goes and gets it pos
2 xxbos xxmaj warning : xxmaj does contain spoilers . \n\n xxmaj open xxmaj your xxmaj eyes \n\n xxmaj if you have not seen this film and plan on doing so , just stop reading here and take my word for it . xxmaj you have to see this film . i have seen it four times so far and i still have n't made up my mind as to what exactly happened in the film . xxmaj that is all i am going to say because if you have not seen this film , then stop reading right now . \n\n xxmaj if you are still reading then i am going to pose some questions to you and maybe if anyone has any answers you can email me and let me know what you think . \n\n i remember my xxmaj grade 11 xxmaj english teacher quite well . xxmaj pos
3 xxbos xxmaj this movie was recently released on xxup dvd in the xxup us and i finally got the chance to see this hard - to - find gem . xxmaj it even came with original theatrical previews of other xxmaj italian horror classics like " xxunk " and " beyond xxup the xxup darkness " . xxmaj unfortunately , the previews were the best thing about this movie . \n\n " zombi 3 " in a bizarre way is actually linked to the infamous xxmaj lucio xxmaj fulci " zombie " franchise which began in 1979 . xxmaj similarly compared to " zombie " , " zombi 3 " consists of a threadbare plot and a handful of extremely bad actors that keeps this ' horror ' trash barely afloat . xxmaj the gore is nearly non - existent ( unless one is frightened of people running around with neg
4 xxbos xxmaj raising xxmaj victor xxmaj vargas : a xxmaj review \n\n xxmaj you know , xxmaj raising xxmaj victor xxmaj vargas is like sticking your hands into a big , steaming bowl of oatmeal . xxmaj it 's warm and gooey , but you 're not sure if it feels right . xxmaj try as i might , no matter how warm and gooey xxmaj raising xxmaj victor xxmaj vargas became i was always aware that something did n't quite feel right . xxmaj victor xxmaj vargas suffers from a certain overconfidence on the director 's part . xxmaj apparently , the director thought that the ethnic backdrop of a xxmaj latino family on the lower east side , and an idyllic storyline would make the film critic proof . xxmaj he was right , but it did n't fool me . xxmaj raising xxmaj victor xxmaj vargas is neg
5 xxbos xxmaj polish film maker xxmaj walerian xxmaj borowczyk 's xxmaj la xxmaj bête ( french , 1975 , aka xxmaj the xxmaj beast ) is among the most controversial and brave films ever made and a very excellent one too . xxmaj this film tells everything that 's generally been hidden and denied about our nature and our sexual nature in particular with the symbolism and silence of its images . xxmaj the images may look wild , perverse , " sick " or exciting , but they are all in relation with the lastly mentioned . xxmaj sex , desire and death are very strong and primary things and dominate all the flesh that has a human soul inside it . xxmaj they interest and xxunk us so powerfully ( and by our nature ) that they are considered scary , unacceptable and something too wild to be pos
6 xxbos xxup anchors xxup aweigh sees two eager young sailors , xxmaj joe xxmaj brady ( gene xxmaj kelly ) and xxmaj clarence xxmaj doolittle / xxmaj brooklyn ( frank xxmaj sinatra ) , get a special four - day shore leave . xxmaj eager to get to the girls , particularly xxmaj joe 's xxmaj lola , neither xxmaj joe nor xxmaj brooklyn figure on the interruption of little xxmaj navy - mad xxmaj donald ( dean xxmaj stockwell ) and his xxmaj aunt xxmaj susie ( kathryn xxmaj grayson ) . xxmaj unexperienced in the ways of females and courting , xxmaj brooklyn quickly enlists xxmaj joe to help him win xxmaj aunt xxmaj susie over . xxmaj along the way , however , xxmaj joe finds himself falling for the gal he thinks belongs to his best friend . xxmaj how is xxmaj brooklyn going to take pos
7 xxbos xxmaj the premise of this movie has been tickling my imagination for quite some time now . xxmaj we 've all heard or read about it in some kind of con - text . xxmaj what would you do if you were all alone in the world ? xxmaj what would you do if the entire world suddenly disappeared in front of your eyes ? xxmaj in fact , the last part is actually what happens to xxmaj dave and xxmaj andrew , two room - mates living in a run - down house in the middle of a freeway system . xxmaj andrew is a nervous wreck to say the least and xxmaj dave is considered being one of the biggest losers of society . xxmaj that alone is the main reason to why these two guys get so well along , because they simply only have each pos
8 xxbos xxmaj i 've rented and watched this movie for the 1st time on xxup dvd without reading any reviews about it . xxmaj so , after 15 minutes of watching xxmaj i 've noticed that something is wrong with this movie ; it 's xxup terrible ! i mean , in the trailers it looked scary and serious ! \n\n i think that xxmaj eli xxmaj roth ( mr . xxmaj director ) thought that if all the characters in this film were stupid , the movie would be funny … ( so stupid , it 's funny … ? xxup wrong ! ) xxmaj he should watch and learn from better horror - comedies such xxunk xxmaj night " , " the xxmaj lost xxmaj boys " and " the xxmaj return xxmaj of the xxmaj living xxmaj dead " ! xxmaj those are funny ! \n\n " neg
  • X는 text, y는 긍정/부정

(2) lrnr 오브젝트 생성

lrnr = text_classifier_learner(dls,AWD_LSTM,metrics=accuracy).to_fp16()

(3) 학습

epoch train_loss valid_loss accuracy time
0 0.459112 0.393681 0.825080 00:25
epoch train_loss valid_loss accuracy time
0 0.347508 0.283278 0.881720 00:45

(4) 예측

lrnr.predict("this film shows incredibly bad writing and is a complete disaster") 
('neg', tensor(0), tensor([0.8853, 0.1147]))
lrnr.predict("this film shows incredible talent and is a complete triumph") 
('pos', tensor(1), tensor([7.1979e-05, 9.9993e-01]))


잡담1: 순환신경망, 텍스트마이닝, 시계열분석

- 순환신경망은 순서가 있는 (말이 좀 애매하지만 아무튼 이렇게 많이 표현해요) 자료를 분석할때 사용할 수 있다. 순서가 있는 자료는 대표적으로 시계열자료과 텍스트자료가 있다.

- 그래서 언뜻 생각하면 텍스트마이닝이나 시계열분석과 내용이 비슷할 것 같지만 사실 그렇지 않다.

  • 텍스트마이닝의 토픽: 단어를 어떻게 숫자로 잘 만들지, 토픽모델 // 자잘하고 실용적인 느낌? 공학적임..
  • 시계열분석의 토픽: 예측(forecasting)과 신뢰구간, 변화점과 관련한 연구 (detection/test), 정상/비정상시계열모형 (ARIMA, GARCH), Cointegration Test, // 느낌이 좀 거창해.. 경제와 관련 많음.
  • 순환신경망의 토픽(재작년까지): 텍스트생성, 텍스트분류 + 시계열 자료의 예측, 단어의 숫자화 … 텍스트마이닝과 시계열분석의 거의 모든 토픽에 관여함
  • 순환신경망의 토픽(작년부터?): 딥러닝의 거의 모든 영역에 관여하기 시작함 (심지어 요즘 이미지 분석도 순환망으로 합니다)


잡담2: 순환신경망의 아키텍처를 얼마나 깊이 이해해야 할까?

- 과거기준(텍스트생성, 텍스트분류, 시계열자료예측 등에만 순환망이 이용되었을 때): 학부수준에서 순수 RNN만 알아도 충분했던 것 같음. LSTM이나 GRU는 석사수준?

- 현재기준: 석사기준 LSTM 같은건 기본이고 어텐션, 트랜스포머등에 대한 개념도 잘 알고 있어야 함. (학부는 잘 모르겠네..)

- 내 생각: 결국 아키텍처는 근데 유행이라 아키텍처는 한번 따라하면서 이해해보고 핵심 아이디어만 이해하면 된다고 생각함. 즉 LSTM 같은 특정모형의 아키텍처를 달달 외울필요는 없다, 수식써있는거 보고 이해하면 그만임. (수식정도를 이해할 능력은 필요한게.. 코드를 짤때 옵션을 이해할 수는 있어야하니까)

- 망상: 나중에는 순환신경망이 거의 모든 딥러닝 방법의 base가 되지 않을까?

잡담3: fastai, pytorch lightning

- 비 컴퓨터공학 출신이 쓰기에는 fastai가 좀 더 쓰기 편한건 사실

- pytorch lightning은 fastai보다 쓰기 어렵지만 (진짜 약간의 클래스관련 지식이 필요함, 솔직히 별로 어렵진 않아요) 좀 더 순수 파이토치에 가깝고 따라서 코드를 뜯어보기 편리하다.

- 과거의 생각

  • 전문가: pytorch + fastai // pytorch + pytorch lightning (컴공출신)
  • 비 전문가: 순수 fastai

- 요즘 생각

  • 모두: pytorch + pytorch lightning
  • 특정한경우: 순수 fastai <– 모형이 구현되어 있다면 fastai가 좋긴 좋아.. 그런데 모형의 구현속도가 못따라감

잡담4: 우린 뭘 해야 할까 (학석사 레벨에서..)

- 능력1: 코드이해력 (= 구현능력 = 코드 베끼는 능력)

  • 이미지분석? 해봤음. 텍스트자료? 해봤음. 시계열? 해봤음. 등등등등? 다 해본적 있음. 어떤 원리인지 정확하게 몰라도 다 해본적 있고 그래서 일할 수 있음!!
  • 돌아가는 코드 최대한 많이 모아놓으세요. torch, fastai, pytorch lightning, tensorflow, keras 등등

- 능력2: 최신트렌드를 파악할 수 있는 힘 (= 논문이해력)

  • 공부, 공부, 공부… A to Z 까지 수식 다 뜯어보고 코드 다 뜯어보면서 집요하게 공부해야함. (LSTM에서 했던것 처럼!) 물론 차근차근 알려주면 수업이 있다면 좋겠지 그런데 보통은 적당히 두리뭉실하게 설명하지 detail 하게 설명하는 수업은 잘 없음. (지루하거든요)
  • 수식이나 코드중 하나라도 볼 줄 모르면 능력2를 얻는것 자체가 불가능.