- (1/11): 액시즈를 이용한 플랏 (1)

- (2/11): 액시즈를 이용한 플랏 (2)

- (3/11): 액시즈를 이용한 플랏 (3)

- (4/11): 액시즈를 이용한 플랏 (4)

- (5/11): 액시즈를 이용한 플랏 (5)

- (6/11): title 설정

- (7/11): 축의 범위를 설정, 독립과 상관계수 (1)

- (8/11): 독립과 상관계수 (2)

- (9/11): matplotlib + seaborn (1)

- (10/11): matplotlib + seaborn (2), qqplot motivation (1)

- (11/11): qqplot motivation (2)

matplotlib로 (진짜 어렵게) 그림을 그리는 방법

에제1: 액시즈를 이용한 플랏

- 목표: plt.plot() 을 사용하지 않고 아래 그림을 그려보자.

import matplotlib.pyplot as plt 
plt.plot([1,2,3],'or')

[<matplotlib.lines.Line2D at 0x7f80e0730f70>]

- 구조: axis $\subset$ axes $\subset$ figure

https://matplotlib.org/stable/gallery/showcase/anatomy.html#sphx-glr-gallery-showcase-anatomy-py

- 전략: 그림을 만들고 (도화지를 준비) $\to$ 액시즈를 만들고 (네모틀을 만든다) $\to$ 액시즈에 그림을 그린다. (.plot()을 이용)

- 우선 그림객체를 생성한다.

fig = plt.figure() # 도화지를 준비한다.

<Figure size 432x288 with 0 Axes>

fig # 현재 도화지상태를 체크

<Figure size 432x288 with 0 Axes>

그림객체를 출력해봐야 아무것도 나오지 않는다. (아무것도 없으니까..)

fig.add_axes() ## 액시즈를 fig에 추가하라. 
fig.axes ## 현재 fig에 있는 액시즈 정보

fig.axes # 현재 네모틀 상태를 체크

[]

fig.add_axes([0,0,1,1]) # 도화지안에 (0,0) 위치에 길이가 (1,1) 인 네모틀을 만든다.

<Axes:>

fig.axes # 현재 네모틀 상태를 체크 --> 네모틀이 하나 있음.

[<Axes:>]

fig # 현재도화지 상태 체크 --> 도화지에 (하나의) 네모틀이 잘 들어가 있음

axs1=fig.axes[0] ## 첫번째 액시즈

axs1.plot([1,2,3],'ob') # 첫번쨰 액시즈에 접근하여 그림을 그림

[<matplotlib.lines.Line2D at 0x7f80dfe94340>]

fig #현재 도화지 상태 체크 --> 그림이 잘 그려짐

예제2: 액시즈를 이용한 서브플랏 (방법1)

- 목표: subplot

fig # 현재 도화지 출력

- 액시즈추가

fig.add_axes([1,0,1,1])

<Axes:>

fig.axes

[<Axes:>, <Axes:>]

fig

axs2=fig.axes[1] ## 두번째 액시즈

- 두번째 액시즈에 그림그림

axs2.plot([1,2,3],'ok') ## 두번째 액시즈에 그림그림

[<matplotlib.lines.Line2D at 0x7f80dfe0cf40>]

fig ## 현재 도화지 체크

- 첫번째 액시즈에 그림추가

axs1.plot([1,2,3],'--b') ### 액시즈1에 점선추가

[<matplotlib.lines.Line2D at 0x7f80dfdc7af0>]

fig ## 현재 도화지 체크

예제3: 액시즈를 이용하여 서브플랏 (방법2)

- 예제2의 레이아웃이 좀 아쉽다.

- 다시 그려보자.

fig = plt.figure()

<Figure size 432x288 with 0 Axes>

fig.axes

[]

fig.subplots(1,2)

array([<AxesSubplot:>, <AxesSubplot:>], dtype=object)

fig.axes

[<AxesSubplot:>, <AxesSubplot:>]

ax1,ax2 = fig.axes

ax1.plot([1,2,3],'or')
ax2.plot([1,2,3],'ob')

[<matplotlib.lines.Line2D at 0x7f80dfd4a7f0>]

fig

그림이 좀 좁은것 같다. (도화지를 늘려보자)

fig.set_figwidth(10)

fig

ax1.plot([1,2,3],'--b')

[<matplotlib.lines.Line2D at 0x7f80dfd0da60>]

fig

예제4: 액시즈를 이용하여 2$\times$2 서브플랏 그리기

fig = plt.figure()
fig.axes

[]

<Figure size 432x288 with 0 Axes>

fig.subplots(2,2) 
fig.axes

[<AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>]

ax1,ax2,ax3,ax4=fig.axes

ax1.plot([1,2,3],'ob')
ax2.plot([1,2,3],'or')
ax3.plot([1,2,3],'ok')
ax4.plot([1,2,3],'oy')

[<matplotlib.lines.Line2D at 0x7f80dfc1eac0>]

fig

예제5: plt.subplots()를 이용하여 2$\times$2 서브플랏 (복습)

x=[1,2,3,4]
y=[1,2,4,3]
_, axs = plt.subplots(2,2) 
axs[0,0].plot(x,y,'o:r') 
axs[0,1].plot(x,y,'Xb') 
axs[1,0].plot(x,y,'xm') 
axs[1,1].plot(x,y,'.--k')

[<matplotlib.lines.Line2D at 0x7f80dfae33d0>]

- 단계적으로 코드를 실행하고 싶을때

x=[1,2,3,4]
y=[1,2,4,3]

_, axs = plt.subplots(2,2)

axs[0,0].plot(x,y,'o:r') 
axs[0,1].plot(x,y,'Xb') 
axs[1,0].plot(x,y,'xm') 
axs[1,1].plot(x,y,'.--k')

[<matplotlib.lines.Line2D at 0x7f80df90c490>]

어? 그림을 볼려면 어떻게 하지?

_

이렇게 하면된다.

- 단계적으로 그림을 그릴경우에는 도화지객체를 fig라는 변수로 명시하여 받는것이 가독성이 좋다.

x=[1,2,3,4]
y=[1,2,4,3]

fig, axs = plt.subplots(2,2)

axs[0,0].plot(x,y,'o:r') 
axs[0,1].plot(x,y,'Xb') 
axs[1,0].plot(x,y,'xm') 
axs[1,1].plot(x,y,'.--k')

[<matplotlib.lines.Line2D at 0x7f80df7ef730>]

fig # 현재 도화지 확인

예제6: plt.subplots()를 2$\times$2 subplot 그리기 -- 액시즈를 각각 변수명으로 저장

x=[1,2,3,4]
y=[1,2,4,3]
fig, axs = plt.subplots(2,2)

ax1,ax2,ax3,ax4 =axs

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-50-3b8534556de7> in <module>
----> 1 ax1,ax2,ax3,ax4 =axs

ValueError: not enough values to unpack (expected 4, got 2)

2x2로 만들어줬으니까 행,열 개수 별로 묶어주기

(ax1,ax2), (ax3,ax4) = axs

ax1.plot(x,y,'o:r') 
ax2.plot(x,y,'Xb') 
ax3.plot(x,y,'xm') 
ax4.plot(x,y,'.--k')

[<matplotlib.lines.Line2D at 0x7f80df5c3160>]

fig

예제7: plt.subplots()를 이용하여 2$\times$2 서브플랏 그리기 -- fig.axes에서 접근!

fig, _ = plt.subplots(2,2)

fig.axes

[<AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>]

_를 그대로 받지 않고, fig옵션에서 axes를 받아온 거라 안 묶어줘도 된다.

ax1, ax2, ax3, ax4= fig.axes

ax1.plot(x,y,'o:r') 
ax2.plot(x,y,'Xb') 
ax3.plot(x,y,'xm') 
ax4.plot(x,y,'.--k')

[<matplotlib.lines.Line2D at 0x7f80df5c7280>]

fig

- 예제7, 예제4와 비교해볼것: 거의 비슷함

- matplotlib은 그래프를 쉽게 그릴수도 있지만 어렵게 그릴수도 있다.

- 오브젝트를 컨트르로 하기 어려우므로 여러가지 축약버전이 존재함.

사실 그래서 서브플랏을 그리는 방법 1,2,3... 와 같은 식으로 정리하여 암기하기에는 무리가 있다.

- 원리를 깨우치면 다양한 방법을 자유자재로 쓸 수 있음. (자유도가 높음)

제목설정

예제1: plt.plot()

x=[1,2,3]
y=[1,2,2]

plt.plot(x,y)
plt.title('title')

Text(0.5, 1.0, 'title')

예제2: 액시즈를 이용

fig = plt.figure()
fig.subplots()

<AxesSubplot:>

ax1=fig.axes[0]

ax1.set_title('title')

Text(0.5, 1.0, 'title')

fig

- 문법을 잘 이해했으면 각 서브플랏의 제목을 설정하는 방법도 쉽게 알 수 있다.

예제3: subplot에서 각각의 제목설정

fig, ax = plt.subplots(2,2)

(ax1,ax2),(ax3,ax4) =ax

ax1.set_title('title1')
ax2.set_title('title2')
ax3.set_title('title3')
ax4.set_title('title4')

Text(0.5, 1.0, 'title4')

fig

- 보기싫음 $\to$ 서브플랏의 레이아웃 재정렬

fig.tight_layout() # 외우세요..

서브플랏의 레이아웃 재정렬 옵션

예제4: 액시즈의 제목 + Figure제목

fig.suptitle('sup title')

Text(0.5, 0.98, 'sup title')

fig

fig.tight_layout()

fig

축범위설정

예제1

x=[1,2,3]
y=[4,5,6]

plt.plot(x,y,'o')

[<matplotlib.lines.Line2D at 0x7f80df1f6940>]

plt.plot(x,y,'o')
plt.xlim(-1,5)
plt.ylim(3,7)

(3.0, 7.0)

예제2

fig = plt.figure()
fig.subplots()

<AxesSubplot:>

ax1=fig.axes[0]

import numpy as np

ax1.plot(np.random.normal(size=100),'o')

[<matplotlib.lines.Line2D at 0x7f80df15bd00>]

fig

ax1.set_xlim(-10,110)
ax1.set_ylim(-5,5)

(-5.0, 5.0)

fig

통계예제

- 여러가지 경우의 산점도와 표본상관계수

예제1

np.random.seed(202150754)
x1=np.linspace(-1,1,100,endpoint=True)
y1=x1**2+np.random.normal(scale=0.1,size=100)

plt.plot(x1,y1,'o')
plt.title('y=x**2')

Text(0.5, 1.0, 'y=x**2')

np.corrcoef(x1,y1)

array([[ 1.        , -0.01756063],
       [-0.01756063,  1.        ]])

- (표본)상관계수의 값이 0에 가까운 것은 두 변수의 직선관계가 약한것을 의미한 것이지 두 변수 사이에 아무런 함수관계가 없다는 것을 의미하는 것은 아니다.

예제2

- 아래와 같은 자료를 고려하자.

np.random.seed(202150754)
x2=np.random.uniform(low=-1,high=1,size=100000)
y2=np.random.uniform(low=-1,high=1,size=100000)

plt.plot(x2,y2,'.')
plt.title('rect')

Text(0.5, 1.0, 'rect')

np.corrcoef(x2,y2)

array([[ 1.        , -0.00332475],
       [-0.00332475,  1.        ]])

예제3

np.random.seed(202150754)
_x3=np.random.uniform(low=-1,high=1,size=100000)
_y3=np.random.uniform(low=-1,high=1,size=100000)

plt.plot(_x3,_y3,'.')

[<matplotlib.lines.Line2D at 0x7f80dec03fd0>]

radius = _x3**2+_y3**2

x3=_x3[radius<1]
y3=_y3[radius<1]
plt.plot(_x3,_y3,'.')
plt.plot(x3,y3,'.')

[<matplotlib.lines.Line2D at 0x7f80deb78f10>]

plt.plot(x3,y3,'.')
plt.title('circ')

Text(0.5, 1.0, 'circ')

np.corrcoef(x3,y3)

array([[1.        , 0.00194254],
       [0.00194254, 1.        ]])

숙제 1

- 예제1,2,3 을 하나의 figure안에 subplot 으로 그려보기 (1$\times$3 행렬처럼 그릴것)

fig,(ax1,ax2,ax3)=plt.subplots(1,3)
ax1.plot(x1,y1,'o')
ax2.plot(x2,y2,'o')
ax3.plot(x3,y3,'o')
fig.set_figwidth(10)

예제2~3으로 알아보는 두 변수의 독립성

- 예제2,3에 대하여 아래와 같은 절차를 고려하여 보자.

(1) $X\in [-h,h]$일 경우 $Y$의 분포를 생각해보자. 그리고 히스토그램을 그려보자.

(2) $X\in [0.9-h,0.9+h]$일 경우 $Y$의 분포를 생각해보자. 그리고 히스토그램을 그려보자.

(3) (1)-(2)를 비교해보자.

- 그림으로 살펴보자.

h=0.05
plt.hist(y2[(x2> -h )*(x2< h )])

(array([528., 552., 514., 505., 480., 454., 517., 525., 532., 443.]),
 array([-9.99496915e-01, -7.99599650e-01, -5.99702385e-01, -3.99805120e-01,
        -1.99907855e-01, -1.05899721e-05,  1.99886675e-01,  3.99783940e-01,
         5.99681205e-01,  7.99578470e-01,  9.99475735e-01]),
 <BarContainer object of 10 artists>)

h=0.05
_,axs= plt.subplots(2,2) 
axs[0,0].hist(y2[(x2> -h )*(x2< h )])
axs[0,1].hist(y2[(x2> 0.9-h )*(x2< 0.9+h )])
axs[1,0].hist(y3[(x3> -h )*(x3< h )])
axs[1,1].hist(y3[(x3> 0.9-h )*(x3< 0.9+h )])

(array([ 87., 207., 234., 236., 277., 282., 286., 261., 230.,  81.]),
 array([-0.51033572, -0.40740268, -0.30446965, -0.20153661, -0.09860358,
         0.00432945,  0.10726249,  0.21019552,  0.31312855,  0.41606159,
         0.51899462]),
 <BarContainer object of 10 artists>)

- 축의범위를 조절하여보자.

h=0.05
_,axs= plt.subplots(2,2) 
axs[0,0].hist(y2[(x2> -h )*(x2< h )])
axs[0,0].set_xlim(-1.1,1.1)
axs[0,1].hist(y2[(x2> 0.9-h )*(x2< 0.9+h )])
axs[0,1].set_xlim(-1.1,1.1)
axs[1,0].hist(y3[(x3> -h )*(x3< h )])
axs[1,0].set_xlim(-1.1,1.1)
axs[1,1].hist(y3[(x3> 0.9-h )*(x3< 0.9+h )])
axs[1,1].set_xlim(-1.1,1.1)

(-1.1, 1.1)

예제4

np.random.seed(202150754)
x4=np.random.normal(size=10000)
y4=np.random.normal(size=10000)

plt.plot(x4,y4,'o')

[<matplotlib.lines.Line2D at 0x7f80d4566190>]

plt.plot(x4,y4,'.')

[<matplotlib.lines.Line2D at 0x7f80d40932b0>]

- 디자인적인 측면에서 보면 올바른 시각화라 볼 수 없다. (이 그림이 밀도를 왜곡시킨다)

- 아래와 같은 그림이 더 우수하다. (밀도를 표현하기 위해 투명도라는 개념을 도입)

plt.scatter(x4,y4,alpha=0.01)

<matplotlib.collections.PathCollection at 0x7f80d47b3250>

np.corrcoef(x4,y4)

array([[1.        , 0.01337901],
       [0.01337901, 1.        ]])

h=0.05
fig, _ = plt.subplots(3,3)

fig.tight_layout()

fig

fig.set_figwidth(10)
fig.set_figheight(10)
fig

fig.axes

[<AxesSubplot:>,
 <AxesSubplot:>,
 <AxesSubplot:>,
 <AxesSubplot:>,
 <AxesSubplot:>,
 <AxesSubplot:>,
 <AxesSubplot:>,
 <AxesSubplot:>,
 <AxesSubplot:>]

k=np.linspace(-2,2,9)
k

array([-2. , -1.5, -1. , -0.5,  0. ,  0.5,  1. ,  1.5,  2. ])

h

0.05

h=0.2
for i in range(9):
    fig.axes[i].hist(y4[(x4>k[i]-h) * (x4<k[i]+h)])

fig

숙제 2

plt.scatter(x4,y4,alpha=0.01)

<matplotlib.collections.PathCollection at 0x7f80a9d271f0>

- 이 그림의 색깔을 붉은색으로 바꿔서 그려보자. (주의: 수업시간에 알려주지 않은 방법임)

plt.scatter(x4,y4,alpha=0.01,color='red')

<matplotlib.collections.PathCollection at 0x7f80a9c7ffa0>

maplotlib + seaborn

import matplotlib.pyplot as plt 
import numpy as np 
import seaborn as sns

x=[44,48,49,58,62,68,69,70,76,79] # 몸무게 
y=[159,160,162,165,167,162,165,175,165,172] #키 
g='F','F','F','F','F','M','M','M','M','M'

plt.plot(x,y,'o')

[<matplotlib.lines.Line2D at 0x7f80537913d0>]

sns.scatterplot(x=x,y=y,hue=g)

<AxesSubplot:>

- 두 그림을 나란히 겹쳐 그릴수 있을까?

fig, (ax1,ax2) = plt.subplots(1,2) 
ax1.plot(x,y,'o')

[<matplotlib.lines.Line2D at 0x7f8053663c10>]

sns.scatterplot(x=x,y=y,hue=g,ax=ax2)

<AxesSubplot:>

fig

fig.set_figwidth(8)

fig

ax1.set_title('matplotlib')
ax2.set_title('seaborn')

Text(0.5, 1.0, 'seaborn')

fig

- 마치 matplotlib에 seaborn을 plugin하듯이 사용할 수 있다.

matplotlib vs seaborn

- 디자인이 예쁜 패키지를 선택하여 하나만 공부하는 것은 그렇게 좋은 전략이 아니다.

sns.set_theme()

plt.plot([1,2,3],[3,4,5],'or')

[<matplotlib.lines.Line2D at 0x7f80535a25e0>]

예제

- 아래와 같은 자료가 있다고 하자.

np.random.seed(202150754)
x=np.random.normal(size=1000,loc=2,scale=1.5)

- 이 자료가 정규분포를 따르는지 어떻게 체크할 수 있을까?

plt.hist(x)

(array([  1.,   4.,  18., 113., 216., 294., 196., 121.,  31.,   6.]),
 array([-3.68558942, -2.64396364, -1.60233786, -0.56071208,  0.48091371,
         1.52253949,  2.56416527,  3.60579105,  4.64741684,  5.68904262,
         6.7306684 ]),
 <BarContainer object of 10 artists>)

- 종모양이므로 정규분포인듯 하다.

- 밀도추정곡선이 있었으면 좋겠다. (KDE로 추정) $\to$ seaborn을 활용하여 그려보자.

sns.histplot(x,kde=True)

<AxesSubplot:ylabel='Count'>

- 종모양인것 같다.

- 그렇다면 아래는 어떤가?

np.random.seed(202150754)
from scipy import stats

y=stats.t.rvs(10,size=1000)

sns.histplot(y,kde=True)

<AxesSubplot:ylabel='Count'>

- 종모양이다..?

- 비교

fig, (ax1,ax2) = plt.subplots(1,2) 
sns.histplot(x,kde=True,ax=ax1)
sns.histplot(y,kde=True,ax=ax2)

<AxesSubplot:ylabel='Count'>

xx= (x-np.mean(x)) / np.std(x,ddof=1) # ddof=1의 의미는 x-1해줘라~
yy= (y-np.mean(y)) / np.std(y,ddof=1) 

fig, (ax1,ax2) = plt.subplots(1,2) 
sns.histplot(xx,kde=True,ax=ax1)
sns.histplot(yy,kde=True,ax=ax2)

<AxesSubplot:ylabel='Count'>

xx= (x-np.mean(x)) / np.std(x,ddof=1) 
yy= (y-np.mean(y)) / np.std(y,ddof=1) 

fig, ((ax1,ax2),(ax3,ax4)) = plt.subplots(2,2) 
ax1.boxplot(xx) 
sns.histplot(xx,kde=True,ax=ax2)
ax3.boxplot(yy)
sns.histplot(yy,kde=True,ax=ax4)

<AxesSubplot:ylabel='Count'>

fig.tight_layout()

fig

- 주의: 아래와 같이 해석하면 잘못된 해석이다.

$y$ 히스토그램을 그려보니 모양이 종모양이다. $\to$ $y$는 정규분포이다

- 관찰: boxplot을 그려보니 $y$의 꼬리가 정규분포보다 두꺼워 보인다.

히스토그램이 종모양이라는 이유로 무조건 정규분포라 가정하지 말자

숙제3

sns.set_theme(style="whitegrid", palette="pastel")
plt.plot([1,2,3])

[<matplotlib.lines.Line2D at 0x7f8052df48b0>]

와 같이 테마를 바꿔서 그림을 그려보고 스샷제출

sns.set_style("darkgrid")
sns.set_context("poster")
plt.plot([1,2,3])
sns.despine(left=True, bottom=True)