Choropleth Map

imports

import folium
import pandas as pd
import json
import requests

Choropleth Map 시작: 지역구그리기

gitbub south korea 들어가면 한국 관련 다양한 자료 많음

local_distriction_jsonurl='https://raw.githubusercontent.com/southkorea/southkorea-maps/master/kostat/2018/json/skorea-municipalities-2018-geo.json'
global_distriction_jsonurl='https://raw.githubusercontent.com/southkorea/southkorea-maps/master/kostat/2018/json/skorea-provinces-2018-geo.json'

local_dict = json.loads(requests.get(local_distriction_jsonurl).text)
global_dict = json.loads(requests.get(global_distriction_jsonurl).text)

local_dict는 시군구별로 나눈 데이터
global_dict는 도별로 나눈 데이터

잔뜩 복잡해보이는 파일이 dict형태로 있으며, 실행해보니 데이터가 저엉말 많다

- 지역을 global scale로 시각화.

세계 지도에서 한국 전체가 보라색 음영이 칠해진 모습!

m = folium.Map([36,128],zoom_start=7,scrollWheelZoom=False)
folium.Choropleth(geo_data=global_dict).add_to(m)
#m

<folium.features.Choropleth at 0x7f49bb47e5b0>

- 지역을 local scale로 시각화

세계 지도에서 한국의 시도군별로 구분된 상태로 모두 보라색 음영이 칠해진 모습!
검은색 음영은 경계가 바뀌었는데 업데이트가 되지 않아서

m = folium.Map([36,128],zoom_start=7,scrollWheelZoom=False)
folium.Choropleth(geo_data=local_dict).add_to(m)
#m

<folium.features.Choropleth at 0x7f49bb47ef70>

json 파일 뜯어보기

- local_dict 를 살펴보자.

local_dict?

Type:        dict
String form: {'type': 'FeatureCollection', 'features': [{'type': 'Feature', 'geometry': {'type': 'MultiPolygon <...>  'name': 'sgg', 'crs': {'type': 'name', 'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}}}
Length:      4
Docstring:  
dict() -> new empty dictionary
dict(mapping) -> new dictionary initialized from a mapping object's
    (key, value) pairs
dict(iterable) -> new dictionary initialized as if via:
    d = {}
    for k, v in iterable:
        d[k] = v
dict(**kwargs) -> new dictionary initialized with the name=value pairs
    in the keyword argument list.  For example:  dict(one=1, two=2)

4개의 원소를 가지는 dictionary

Polygon: 3차원 컴퓨터그래픽을 구성하는 가장 기본단위인 다각형을 의미

- 원소의 이름은 아래와 같이 확인가능

keys() 옵션은 dictionary에서만 사용가능함

local_dict.keys()

dict_keys(['type', 'features', 'name', 'crs'])

type, features, name, crs

- type, name, crs 부터 살펴보자.(features는 너어어어무 많아서)

local_dict['type'],local_dict['name'],local_dict['crs']

('FeatureCollection',
 'sgg',
 {'type': 'name', 'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}})

- 위의 3개는 특별한것이 없어보임. 이제 features를 살펴보자.

_features=local_dict['features']

길이가 250인 list
250의 리스트는 각각의 구(혹은 군,시)를 의미하는 것 같다.
초기 값을 입력한 사람이 모든 좌표를 하나하나....찍었다...연도마다 업데이트 해줘야 할듯?! 경계 등이 바뀔 수도 있으니까
- 지금도 보면 검은 음영이 칠해진 부분을 볼 수 있음

_features?

Type:        list
String form: [{'type': 'Feature', 'geometry': {'type': 'MultiPolygon', 'coordinates': [[[[126.97468086053324,  <...> 'properties': {'name': '서귀포시', 'base_year': '2018', 'name_eng': 'Seogwipo-si', 'code': '39020'}}]
Length:      250
Docstring:  
Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list.
The argument must be an iterable if specified.

_features[0]

치면 첫번쨰 구인 종로구의 다각형(polygon) 좌표가 나온다!

첫번째 구는 다시 3개의 원소로 구성된 dictionary

_features[0]['type']

'Feature'

pyhon
_features[0]['geometry']

dictinary 형태이며, length=2 이다. 해당 코드 입력시 다각형 좌표들이 나옴

_features[0]['properties']

{'name': '종로구', 'base_year': '2018', 'name_eng': 'Jongno-gu', 'code': '11010'}

index에 해당하는 정보들이 들어가 있는 properties

_features[0]는 다시 dict type, length=3,
key = type, geometry, properties
이 중에서 type은 별 정보가 없음
geometry에는 multipolygon geom 에 대한 좌표값 정리
properties는 이름, 영문이름, 코드와 같은 정보들 정리

_features[0]['properties']['name']

'종로구'

- 전주시완산구, 임실군, 남원시를 찾아보기

[_features[i]['properties']['name'] for i in range(250)]

모든 데이터에서 전주시완산구, 임실군, 남원시 있는지 찾아보기

도시의 크기에 따라서 구/군.시로 표기된 것 같음

_lst=[_features[i]['properties']['name'] for i in range(250)]
_lst[:10]

['종로구', '중구', '용산구', '성동구', '광진구', '동대문구', '중랑구', '성북구', '강북구', '도봉구']

전주시완산구, 임실군, 남원시가 _lst 몇 번째에 위치할까?

(예비학습) 리스트에서 특정 원소가 위치한 인덱스를 알아내는 방법(where)

아래와 같이!

['a','b','c'].index('a')

0

_lst.index('전주시완산구'),_lst.index('임실군'),_lst.index('남원시')
# _lst.index(['전주시완산구',전주시덕진구]) 은 시행되지 않음

(165, 176, 170)

_keywords=['전주시완산구','임실군', '남원시']
[_lst.index(_keywords[i]) for i in range(3)]
# 구식 프로그래밍(?)

[165, 176, 170]

[_lst.index(i) for i in _keywords]
# 신식 프로그래밍(?)

[165, 176, 170]

[_lst.index(keyword) for keyword in _keywords]

[165, 176, 170]

[_lst.index(keyword) for keyword in ['전주시완산구','임실군', '남원시']]

[165, 176, 170]

- local_dict와 형식은 똑같은데 '전주시완산구','임실군', '남원시'만 포함한 local_dict2를 만들자

(예비학습) VIEW vs COPY

dict1={'a':1,'b':2,'c':3}
dict2=dict1
dict2['b']=5
print(dict1)
print(dict2)

{'a': 1, 'b': 5, 'c': 3}
{'a': 1, 'b': 5, 'c': 3}

dict1까지 바뀌어 버림!!!
원본 바뀌는 상황 안 생기게 하려면?

dict1={'a':1,'b':2,'c':3}
dict2=dict1.copy()
dict2['b']=5
print(dict1)
print(dict2) # 안 바뀐다!!

{'a': 1, 'b': 2, 'c': 3}
{'a': 1, 'b': 5, 'c': 3}

참조만하는게 편할 수도 있고, 모든 것을 copy로 만들게 되면 메모리를 차지하는 비율이 커짐. 메모리 적게 가지고 싶은 파이썬..

(예비학습 끝)

local_dict2=local_dict.copy()

local_dict2.keys()

dict_keys(['type', 'features', 'name', 'crs'])

_=local_dict2['features']
_?

Type:        list
String form: [{'type': 'Feature', 'geometry': {'type': 'MultiPolygon', 'coordinates': [[[[126.97468086053324,  <...> 'properties': {'name': '서귀포시', 'base_year': '2018', 'name_eng': 'Seogwipo-si', 'code': '39020'}}]
Length:      250
Docstring:  
Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list.
The argument must be an iterable if specified.

local_dict2['features']=[local_dict2['features'][165],local_dict2['features'][170],local_dict2['features'][176]]

_keywords= ['전주시완산구','임실군', '남원시']
_jj=[_lst.index(keyword) for keyword in _keywords]
_jj

[165, 176, 170]

local_dict2['features']=[local_dict['features'][j] for j in _jj]
# local_dict2['features']
# 길이가 2인 리스트가 만들어지고, 덕진구, 완산구의 좌표값들이 들어감!

local_dict2['features']=[local_dict['features'][j] for j in [_lst.index(i) for i in ['전주시완산구','임실군', '남원시']]]
이어서 쓰기 가능~

'전주시완산구','임실군', '남원시'에 해당하는 구간만 시각화 해보자

'전주시완산구','임실군', '남원시' 시각화

- 아래와 같은 그림에서 '전주시완산구','임실군', '남원시'의 어떠한 value를 비교하고 싶음 (카카오바이크 등)

m = folium.Map([35.48223524067562, 127.32027909816324],zoom_start=9,scrollWheelZoom=False)
folium.Choropleth(geo_data=local_dict2).add_to(m)
#m

<folium.features.Choropleth at 0x7f49b5d54df0>

- 선실습, 후설명

df = pd.DataFrame({'key':['전주시완산구','임실군', '남원시'],'value':[10,40,80]})
df

local_dict2['features'][0]['properties']['name'],local_dict2['features'][1]['properties']['name'],local_dict2['features'][2]['properties']['name']

('전주시완산구', '임실군', '남원시')

- 두 개를 어떻게 합칠까?

m = folium.Map([35.48223524067562, 127.32027909816324],tiles="Stamen Toner",zoom_start=9,scrollWheelZoom=False)
choro = folium.Choropleth(
    data=df, 
    geo_data=local_dict2,
    columns=['key','value'],
    key_on = 'feature.properties.name',
    line_color='blue',
    legend_name='전라북도 중 세 구간의 모습'
)
choro.add_to(m) 
#m
# 구별로 달라진 모습!

<folium.features.Choropleth at 0x7f49b633beb0>

ref: https://python-visualization.github.io/folium/quickstart.html

- folium.Choropleth 의 사용법

(이해를 위해 필요한 약간의 직관)

2개의 데이터를 연결해야하는 구조, 하나는 data(df), 다른하나는 json에서 나온 dict (local_dict2)
데이터를 연결하기 위해서는 공유가능한 연결의 매개체가 필요, (cbind: row-index를 공유, rbind: colnames공유, merge: 양쪽 데이터프레임에서 같은 이름을 가진 특정 column이 있었음)
연결의 매개체는 df와 local_dict2에 각각 존재, df에서는 key라는 이름의 칼럼으로 저장했었고 local_dict2 에서는 local_dict2['features'][?]['properties']['name']에 존재

# 전주시완산구
local_dict2['features'][0]['properties']['name_eng']
# 'Jeonjusiwansangu'
# local_dict2['features'][0]['properties']['base_year']
# 2018
# local_dict2['features'][0]['properties']['code']
# 35011

'Jeonjusiwansangu'

(사용법)

choro = folium.Choropleth(
    data=df,  ## data1 
    geo_data=local_dict2, ## data2 (이 시점에서 폴리곤에 대한 정보가 choro 인스턴스에 전달) 
    columns=['key','value'], ## data1에서 중요한것들을 나열. 항상 [key,value]의 조합으로 써야한다. 
        # 이때 ['key','value']는 [data2와의 매개체,지도에서 색의 단계를 표현하는 변수]로 해석가능
    key_on = 'feature.properties.name' ## data2에서 중요한것: 즉 data1과의 연결매개체 
)

Note: folium.Choropleth 의 key_on 파라메터는 (1) 항상 feature로 시작하고 (2) 이후는에는 local_dict2[’features’][?]의 하위에 data1과 매칭되는 path를 찾음

# 설명 보기

folium.Choropleth(
    geo_data,
    data=None,
    columns=None,
    key_on=None,
    bins=6,
    fill_color=None,
    nan_fill_color='black',
    fill_opacity=0.6,
    nan_fill_opacity=None,
    line_color='black',
    line_weight=1,
    line_opacity=1,
    name=None,
    legend_name='',
    overlay=True,
    control=True,
    show=True,
    topojson=None,
    smooth_factor=None,
    highlight=None,
    **kwargs,
)

df = pd.DataFrame({'key':['Jeonjusiwansangu','임실군', '남원시'],'value':[20,30,90]})
df

m = folium.Map([35.48223524067562, 127.32027909816324],zoom_start=9,scrollWheelZoom=False)
choro = folium.Choropleth(
    data=df, 
    geo_data=local_dict2,
    columns=['key','value'],
    key_on = 'feature.properties.name'
)
choro.add_to(m) 
#m
# name에 matching 하면 임실군, 남원시는 matching이 되지만, 완산구는 되지 않는다.
# 왜냐하면 지금 나는 한국 이름과 매칭해놨기 때문

<folium.features.Choropleth at 0x7f49b62002e0>

m = folium.Map([35.48223524067562, 127.32027909816324],zoom_start=9,scrollWheelZoom=False)
choro = folium.Choropleth(
    data=df, 
    geo_data=local_dict2,
    columns=['key','value'],
    key_on = 'feature.properties.name_eng'
)
choro.add_to(m) 
#m
# name_eng에 matching 하면 완산구는 matching이 되지만, 임실군, 남원시는 되지 않는다.

<folium.features.Choropleth at 0x7f49b60f6f10>

- 한글이름, 영문이름 뿐만이 아니라 code를 key_on으로 쓸 수 있음.

df = pd.DataFrame({'key':['35011', '35350', '35050'],'value':[20,50,80]})
# '35011' 완산구, '35350' 임실군, '35050' 남원시
df

m = folium.Map([35.48223524067562, 127.32027909816324],zoom_start=9,scrollWheelZoom=False)
choro = folium.Choropleth(
    data=df, 
    geo_data=local_dict2,
    columns=['key','value'],
    key_on = 'feature.properties.code'
)
choro.add_to(m) 
#m
# 결과가 잘 출력되는 모습

<folium.features.Choropleth at 0x7f49b60f6f40>

대한민국 인구수

대한민국 인구수 (global scale)_kosis 등 이용 가능

df=pd.read_csv('https://raw.githubusercontent.com/guebin/2021DV/master/_notebooks/2021-11-22-prov.csv')
df

m = folium.Map([35.84195368311022, 127.1155556693179],zoom_start=6,scrollWheelZoom=False)
choro = folium.Choropleth(
    data=df, 
    geo_data=global_dict,
    columns=['행정구역(시군구)별','총인구수 (명)'],
    key_on = 'feature.properties.name',
    line_color='red'
)
choro.add_to(m) 
#m 
# 도별 잘 나뉘어 색이 부여된 모습!

<folium.features.Choropleth at 0x7f49b67e4df0>

df=pd.read_csv('https://raw.githubusercontent.com/guebin/2021DV/master/_notebooks/2021-11-22-muni.csv')
df

m = folium.Map([35.84195368311022, 127.1155556693179],zoom_start=7,scrollWheelZoom=False)
choro= folium.Choropleth(
    data=df, 
    geo_data= local_dict, 
    columns=['행정구역(시군구)별','총인구수 (명)'], 
    key_on='feature.properties.name',
    nan_fill_color='red',
    line_color='blue'
)
choro.add_to(m) 
#m
#시군구별 잘 나뉘어 색이 부여된 모습!
# 즉 컬럼매칭이 안 된 부분은 처리를 해줄 필요가 있겠음 업데이트가 되지 않은 행정구역이니까

<folium.features.Choropleth at 0x7f49b6236e20>

plotly 맛보기

공식홈페이지 예제코드

import plotly.express as px
from IPython.display import HTML
df = px.data.election()

import plotly.express as px
from IPython.display import HTML
df = px.data.election()
geojson = px.data.election_geojson()

fig = px.choropleth_mapbox(df, geojson=geojson, color="Bergeron",
                           locations="district", featureidkey="properties.district",
                           center={"lat": 45.5517, "lon": -73.7073},
                           mapbox_style="carto-positron", zoom=9)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
HTML(fig.to_html(include_plotlyjs='cdn',include_mathjax=False, config=dict({'scrollZoom':False})))
fig.show(config=dict({'scrollZoom':False}))

구조파악

- 공식홈페이지의 예제를 뜯어보자.

df.head()

# 코드 뜯어보기

properties의 district 혹은 id를 key로 한다.

- 코드를 다시 관찰하자.

df = px.data.election() ### 일반적인 데이터 프레임 
geojson = px.data.election_geojson() ### json파일 

fig = px.choropleth_mapbox(df,  ### 데이터프레임 
                           geojson=geojson, ### json파일 
                           color="Bergeron", ### df에서 코로플레스의 단계를 표시 
                           locations="district", ### df에 존재하는 연결변수 
                           featureidkey="properties.district", ### json에 존재하는 연결매개체
                           center={"lat": 45.5517, "lon": -73.7073}, 
                           mapbox_style="carto-positron", 
                           zoom=9)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show(config=dict({'scrollZoom':False})) # scrollZoom은 dict으로 연결해주면 된다

한국의 인구수를 plotly로 시각화

- 우리의 상황으로 바꾸면

df=pd.read_csv('https://raw.githubusercontent.com/guebin/2021DV/master/_notebooks/2021-11-22-prov.csv')
df

fig = px.choropleth_mapbox(df,  ### 데이터프레임 
                           geojson=global_dict, ### json파일 
                           color="총인구수 (명)", ### df에서 코로플레스의 단계를 표시 
                           locations="행정구역(시군구)별", ### df에 존재하는 연결변수 
                           featureidkey="properties.name", ### json에 존재하는 연결매개체
                           center={"lat": 35.84195368311022, "lon": 127.1155556693179}, 
                           mapbox_style="carto-positron", 
                           zoom=5)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
HTML(fig.to_html(include_plotlyjs='cdn',include_mathjax=False, config=dict({'scrollZoom':False})))

Pandas Backend

import

import numpy as np
import pandas as pd
import warnings
from IPython.display import HTML

from pandas_datareader import data as pdr

def show(fig): 
    return HTML(fig.to_html(include_plotlyjs='cdn',include_mathjax=False, config=dict({'scrollZoom':False})))

ref: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html#pandas.DataFrame.plot

The kind of plot to produce:
- ‘line’ : line plot (default)
- ‘bar’ : vertical bar plot
- ‘barh’ : horizontal bar plot
- ‘hist’ : histogram
- ‘box’ : boxplot
- ‘kde’ : Kernel Density Estimation plot
- ‘density’ : same as ‘kde’
- ‘area’ : area plot
- ‘pie’ : pie plot
- ‘scatter’ : scatter plot (DataFrame only)
- ‘hexbin’ : hexbin plot (DataFrame only)

line

예제1 (matploblib)

??pdr.get_data_yahoo

Signature: pdr.get_data_yahoo(*args, **kwargs)
Docstring: <no docstring>
Source:   
def get_data_yahoo(*args, **kwargs):
    return YahooDailyReader(*args, **kwargs).read()
File:      ~/anaconda3/envs/csy/lib/python3.8/site-packages/pandas_datareader/data.py
Type:      function

symbols = ['AMZN','AAPL','GOOG','MSFT','NFLX','NVDA','TSLA']
start = '2020-01-01'
end = '2020-11-28'
df = pdr.get_data_yahoo(symbols,start,end)['Adj Close']
# 장 시간이 같지 않아 결측값이 생겨버려 같은 나라의 주식으로 통일!
# 데이터가 열별로 있어 tidy data가 아니긴 해
# tidy 데이터 만드는 melt나 stack 으로 연습해볼 것

wide form으로 되어 있어야 dataframe.plot.line 등이 그려진다!

df.reset_index().head()

tidy data 만들고 싶다면!

df.stack().reset_index()

- 1개의 y를 그리기

plot.line

DataFrame.plot.line(x=None, y=None, kwargs)

Plot Series or DataFrame as lines.

This function is useful to plot lines using DataFrame’s values as coordinates.

Parameters
- x: label or position, optional
  - Allows plotting of one column versus another. If not specified, the index of the DataFrame is used. -y: label or position, optional
  - Allows plotting of one column versus another. If not specified, all numerical columns are used.
- color: str, array-like, or dict, optional
  - The color for each of the DataFrame’s columns. Possible values are:
  - A single color string referred to by name, RGB or RGBA code, for instance ‘red’ or ‘#a98d19’.
  - A sequence of color strings referred to by name, RGB or RGBA code, which will be used for each column recursively. For instance [‘green’,’yellow’] each column’s line will be filled in green or yellow, alternatively. If there is only a single column to be plotted, then only the first color from the color list will be used.
  - A dict of the form {column namecolor}, so that each column will be colored accordingly. For example, if your columns are called a and b, then passing {‘a’: ‘green’, ‘b’: ‘red’} will color lines for column a in green and lines for column b in red.

df.reset_index().plot.line(x='Date',y='AMZN')

<AxesSubplot:xlabel='Date'>

- 2개의 y를 겹쳐그리기

df.reset_index().plot.line(x='Date',y=['AMZN','TSLA'])

<AxesSubplot:xlabel='Date'>

- 모든 y겹처그리기, 및 그림 크기 조정

x 지정 안 해주면 행이 기본으로 y축으로 가네?

df.reset_index().plot.line(x='Date',figsize=(8,8))

<AxesSubplot:xlabel='Date'>

- 서브플랏, 레이아웃 조정, 폰트조정, 투명도 조정, 레전드 삭제

df.reset_index().plot.line(x='Date',figsize=(10,8),subplots=True,layout=(4,2),fontsize=6,alpha=0.3, legend=False)

array([[<AxesSubplot:xlabel='Date'>, <AxesSubplot:xlabel='Date'>],
       [<AxesSubplot:xlabel='Date'>, <AxesSubplot:xlabel='Date'>],
       [<AxesSubplot:xlabel='Date'>, <AxesSubplot:xlabel='Date'>],
       [<AxesSubplot:xlabel='Date'>, <AxesSubplot:xlabel='Date'>]],
      dtype=object)

plot.line 말고도 plot.bar 등 다양한 figure 제시 가능! 찾아보기

bar, barh

plot.bar

DataFrame.plot.bar(x=None, y=None, kwargs)

keyword argument의 줄임말

Vertical bar plot.

A bar plot is a plot that presents categorical data with rectangular bars with lengths proportional to the values that they represent. A bar plot shows comparisons among discrete categories. One axis of the plot shows the specific categories being compared, and the other axis represents a measured value.

Parameters
- x: label or position, optional
  - Allows plotting of one column versus another. If not specified, the index of the DataFrame is used.
  - 카테고리가 표시되는 축입니다. 지정하지 않으면DataFrame의 인덱스가 사용됩니다.
- y: label or position, optional
  - Allows plotting of one column versus another. If not specified, all numerical columns are used.
  - 카테고리에 대해 그려진 값을 나타냅니다. 지정하지 않으면 카테고리에 대해DataFrame의 모든 숫자 열을 플로팅합니다.
- color: str, array-like, or dict, optional
  - The color for each of the DataFrame’s columns. Possible values are:
  - A single color string referred to by name, RGB or RGBA code,
  - for instance ‘red’ or ‘#a98d19’.
  - A sequence of color strings referred to by name, RGB or RGBA code, which will be used for each column recursively. For instance [‘green’,’yellow’] each column’s bar will be filled in green or yellow, alternatively. If there is only a single column to be plotted, then only the first color from the color list will be used.
  - A dict of the form {column namecolor}, so that each column will be colored accordingly. For example, if your columns are called a and b, then passing {‘a’: ‘green’, ‘b’: ‘red’} will color bars for column a in green and bars for column b in red.

예제1 (matplotlib)

df = pd.read_csv('https://raw.githubusercontent.com/kalilurrahman/datasets/main/mobilephonemktshare2020.csv')
df

df.plot.bar(x='Date',y=['Samsung','Huawei'],figsize=(10,5))

<AxesSubplot:xlabel='Date'>

df.plot.bar(x='Date',y=['Samsung','Apple'],figsize=(10,5),width=0.8)

<AxesSubplot:xlabel='Date'>

plot.barh

DataFrame.plot.barh(x=None, y=None, kwargs)

keyword argument의 줄임말

Make a horizontal bar plot.

A horizontal bar plot is a plot that presents quantitative data with rectangular bars with lengths proportional to the values that they represent. A bar plot shows comparisons among discrete categories. One axis of the plot shows the specific categories being compared, and the other axis represents a measured value.

Parameters
- x: label or position, optional
  - Allows plotting of one column versus another. If not specified, the index of the DataFrame is used.
- y: label or position, optional
  - Allows plotting of one column versus another. If not specified, all numerical columns are used.
- color: str, array-like, or dict, optional
  - The color for each of the DataFrame’s columns. Possible values are:
  - A single color string referred to by name, RGB or RGBA code, for instance ‘red’ or ‘#a98d19’.
  - A sequence of color strings referred to by name, RGB or RGBA code, which will be used for each column recursively. For instance [‘green’,’yellow’] each column’s bar will be filled in green or yellow, alternatively. If there is only a single column to be plotted, then only the first color from the color list will be used.
  - A dict of the form {column namecolor}, so that each column will be colored accordingly. For example, if your columns are called a and b, then passing {‘a’: ‘green’, ‘b’: ‘red’} will color bars for column a in green and bars for column b in red.

df.plot.barh(x='Date',y=['Huawei','Apple'],figsize=(5,10))

<AxesSubplot:ylabel='Date'>

그림이 별로임

df.plot.bar(x='Date',figsize=(15,10),subplots=True,layout=(4,4),legend=False)

array([[<AxesSubplot:title={'center':'Samsung'}, xlabel='Date'>,
        <AxesSubplot:title={'center':'Apple'}, xlabel='Date'>,
        <AxesSubplot:title={'center':'Huawei'}, xlabel='Date'>,
        <AxesSubplot:title={'center':'Xiaomi'}, xlabel='Date'>],
       [<AxesSubplot:title={'center':'Oppo'}, xlabel='Date'>,
        <AxesSubplot:title={'center':'Mobicel'}, xlabel='Date'>,
        <AxesSubplot:title={'center':'Motorola'}, xlabel='Date'>,
        <AxesSubplot:title={'center':'LG'}, xlabel='Date'>],
       [<AxesSubplot:title={'center':'Others'}, xlabel='Date'>,
        <AxesSubplot:title={'center':'Realme'}, xlabel='Date'>,
        <AxesSubplot:title={'center':'Google'}, xlabel='Date'>,
        <AxesSubplot:title={'center':'Nokia'}, xlabel='Date'>],
       [<AxesSubplot:title={'center':'Lenovo'}, xlabel='Date'>,
        <AxesSubplot:title={'center':'OnePlus'}, xlabel='Date'>,
        <AxesSubplot:title={'center':'Sony'}, xlabel='Date'>,
        <AxesSubplot:title={'center':'Asus'}, xlabel='Date'>]],
      dtype=object)

이건 사실 라인플랏으로 그려도 괜찮음

- 비율을 평균내는 것은 이상하지만 시각화예제를 위해서 제조사별로 평균점유율을 sorting 한 후 시각화하여보자.

df.melt(id_vars='Date').groupby('variable').agg(np.mean).sort_values('value',ascending=False).\
plot.bar(legend=False)

<AxesSubplot:xlabel='variable'>

예제1 (plotly)

wide form 이 아닌 long form, 즉, tidy data일때 사용

The Plotly backend supports the following kinds of Pandas plots: scatter, line, area, bar, barh, hist and box, via the call pattern df.plot(kind='scatter') or df.plot.scatter().

fig= df.melt(id_vars='Date').groupby('variable').agg(np.mean).sort_values('value',ascending=False).\
plot.barh(backend='plotly')
show(fig)

fig=df.melt(id_vars='Date').\
plot.barh(y='Date',x='value',color='variable',backend='plotly',width=800,height=500)
show(fig)

fig=df.melt(id_vars='Date').query("variable=='Samsung' or variable=='Apple' or variable=='LG'").\
plot.bar(x='Date',y='value',color='variable',backend='plotly',barmode='group')
show(fig)

fig=df.melt(id_vars='Date').query("variable=='Samsung' or variable=='Apple' or variable=='Huawei'" ).\
plot.bar(x='Date',y='value',color='variable',backend='plotly',barmode='group',text='value')
show(fig)

fig=df.melt(id_vars='Date').query("variable=='Samsung' or variable=='Apple' or variable=='Huawei'" ).\
plot.bar(x='Date',y='value',color='variable',backend='plotly',facet_col='variable')
show(fig)

fig=df.melt(id_vars='Date').query("variable=='Samsung' or variable=='Apple' or variable=='Huawei'" ).\
plot.bar(y='Date',x='value',color='variable',backend='plotly',facet_row='variable',height=700)
show(fig)

	행정구역(시군구)별	총인구수 (명)
0	서울특별시	9532428
1	부산광역시	3356311
2	대구광역시	2390721
3	인천광역시	2945009
4	광주광역시	1442454
5	대전광역시	1454228
6	울산광역시	1122566
7	세종특별자치시	368276
8	경기도	13549577
9	강원도	1537717
10	충청북도	1596948
11	충청남도	2118977
12	전라북도	1789770
13	전라남도	1834653
14	경상북도	2627925
15	경상남도	3318161
16	제주특별자치도	676569

	행정구역(시군구)별	총인구수 (명)
0	종로구	145346
1	중구	122781
2	용산구	223713
3	성동구	287174
4	광진구	340814
...	...	...
269	함양군	38475
270	거창군	61242
271	합천군	43029
272	제주시	493225
273	서귀포시	183344

	district	Coderre	Bergeron	Joly	total	winner	result	district_id
0	101-Bois-de-Liesse	2481	1829	3024	7334	Joly	plurality	101
1	102-Cap-Saint-Jacques	2525	1163	2675	6363	Joly	plurality	102
2	11-Sault-au-Récollet	3348	2770	2532	8650	Coderre	plurality	11
3	111-Mile-End	1734	4782	2514	9030	Bergeron	majority	111
4	112-DeLorimier	1770	5933	3044	10747	Bergeron	majority	112

	행정구역(시군구)별	총인구수 (명)
0	서울특별시	9532428
1	부산광역시	3356311
2	대구광역시	2390721
3	인천광역시	2945009
4	광주광역시	1442454
5	대전광역시	1454228
6	울산광역시	1122566
7	세종특별자치시	368276
8	경기도	13549577
9	강원도	1537717
10	충청북도	1596948
11	충청남도	2118977
12	전라북도	1789770
13	전라남도	1834653
14	경상북도	2627925
15	경상남도	3318161
16	제주특별자치도	676569

Symbols	Date	AMZN	AAPL	GOOG	MSFT	NFLX	NVDA	TSLA
0	2019-12-31	1847.839966	72.337982	1337.020020	154.749741	323.570007	58.676849	83.666000
1	2020-01-02	1898.010010	73.988464	1367.369995	157.615112	329.809998	59.826439	86.052002
2	2020-01-03	1874.969971	73.269157	1360.660034	155.652527	325.899994	58.868862	88.601997
3	2020-01-06	1902.880005	73.852982	1394.209961	156.054840	335.829987	59.115738	90.307999
4	2020-01-07	1906.859985	73.505653	1393.339966	154.631989	330.750000	59.831429	93.811996

	Date	Samsung	Apple	Huawei	Xiaomi	Oppo	Mobicel	Motorola	LG	Others	Realme	Google	Nokia	Lenovo	OnePlus	Sony	Asus
0	2019-10	31.49	22.09	10.02	7.79	4.10	3.15	2.41	2.40	9.51	0.54	2.35	0.95	0.96	0.70	0.84	0.74
1	2019-11	31.36	22.90	10.18	8.16	4.42	3.41	2.40	2.40	9.10	0.78	0.66	0.97	0.97	0.73	0.83	0.75
2	2019-12	31.37	24.79	9.95	7.73	4.23	3.19	2.50	2.54	8.13	0.84	0.75	0.90	0.87	0.74	0.77	0.70
3	2020-01	31.29	24.76	10.61	8.10	4.25	3.02	2.42	2.40	7.55	0.88	0.69	0.88	0.86	0.79	0.80	0.69
4	2020-02	30.91	25.89	10.98	7.80	4.31	2.89	2.36	2.34	7.06	0.89	0.70	0.81	0.77	0.78	0.80	0.69
5	2020-03	30.80	27.03	10.70	7.70	4.30	2.87	2.35	2.28	6.63	0.93	0.73	0.72	0.74	0.78	0.76	0.66
6	2020-04	30.41	28.79	10.28	7.60	4.20	2.75	2.51	2.28	5.84	0.90	0.75	0.69	0.71	0.80	0.76	0.70
7	2020-05	30.18	26.72	10.39	8.36	4.70	3.12	2.46	2.19	6.31	1.04	0.70	0.73	0.77	0.81	0.78	0.76
8	2020-06	31.06	25.26	10.69	8.55	4.65	3.18	2.57	2.11	6.39	1.04	0.68	0.74	0.75	0.77	0.78	0.75
9	2020-07	30.95	24.82	10.75	8.94	4.69	3.46	2.45	2.03	6.41	1.13	0.65	0.76	0.74	0.76	0.75	0.72
10	2020-08	31.04	25.15	10.73	8.90	4.69	3.38	2.39	1.96	6.31	1.18	0.63	0.74	0.72	0.75	0.73	0.70
11	2020-09	30.57	24.98	10.58	9.49	4.94	3.50	2.27	1.88	6.12	1.45	0.63	0.74	0.67	0.81	0.69	0.67
12	2020-10	30.25	26.53	10.44	9.67	4.83	2.54	2.21	1.79	6.04	1.55	0.63	0.69	0.65	0.85	0.67	0.64

	key	value
0	전주시완산구	10
1	임실군	40
2	남원시	80

	key	value
0	Jeonjusiwansangu	20
1	임실군	30
2	남원시	90

	key	value
0	35011	20
1	35350	50
2	35050	80