• kaggle 에서 fifa22 검색, fifa official data set 들어가서 fifa22 다운

FIFA 선수들 시각화

FIFA data

- FIFA22라는 축구게임이 있음 (굉장히 인기있음)

- 게임에 실제 선수들이 나오면서 선수들의 능력치가 세밀하게 구현되어 있음

- 이 능력치에 대한 데이터셋은 캐글에 공개되어 있음

import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd

Data

fifa22=pd.read_csv('https://raw.githubusercontent.com/guebin/2021DV/master/_notebooks/2021-10-25-FIFA22_official_data.csv')

- Overall을 기준으로 정렬하여 보자.

fifa22=fifa22.sort_values(by='Overall',ascending=False).reset_index().rename(columns={'index':'index_old'})
fifa22.head()
index_old ID Name Age Photo Nationality Flag Overall Potential Club ... SlidingTackle GKDiving GKHandling GKKicking GKPositioning GKReflexes Best Position Best Overall Rating Release Clause DefensiveAwareness
0 29 158023 L. Messi 34 https://cdn.sofifa.com/players/158/023/22_60.png Argentina https://cdn.sofifa.com/flags/ar.png 93 93 Paris Saint-Germain ... 24.0 6.0 11.0 15.0 14.0 8.0 RW 93.0 €144.3M 20.0
1 33 188545 R. Lewandowski 32 https://cdn.sofifa.com/players/188/545/22_60.png Poland https://cdn.sofifa.com/flags/pl.png 92 92 FC Bayern München ... 19.0 15.0 6.0 12.0 8.0 10.0 ST 92.0 €197.2M 35.0
2 14244 200389 J. Oblak 28 https://cdn.sofifa.com/players/200/389/22_60.png Slovenia https://cdn.sofifa.com/flags/si.png 91 93 Atlético de Madrid ... 18.0 87.0 92.0 78.0 90.0 90.0 GK 91.0 €238M 27.0
3 3 192985 K. De Bruyne 30 https://cdn.sofifa.com/players/192/985/22_60.png Belgium https://cdn.sofifa.com/flags/be.png 91 91 Manchester City ... 53.0 15.0 13.0 5.0 10.0 13.0 CM 91.0 €232.2M 68.0
4 64 190871 Neymar Jr 29 https://cdn.sofifa.com/players/190/871/22_60.png Brazil https://cdn.sofifa.com/flags/br.png 91 91 Paris Saint-Germain ... 29.0 9.0 9.0 15.0 15.0 11.0 LW 91.0 €238.7M 35.0

5 rows × 66 columns

Overall vs Potential

from plotnine import *
ggplot(data=fifa22)+geom_point(aes(x='Overall', y='Potential'))
<ggplot: (8764818255391)>

- 뭔가 Potential > Overall 인 관계가 성립하는것 같다. $\to$ Potetial2= Potential - Overall 인 변수를 새로 만들고 시각화해보자.

  • 판다스: 새로운열 추가
fifa22['Potential2'] = fifa22['Potential'] - fifa22['Overall']
ggplot(data=fifa22)+geom_point(aes(x='Overall', y='Potential2'),alpha=0.1)
<ggplot: (8764817714613)>
ggplot(data=fifa22)+geom_point(aes(x='Overall', y='Potential2'),alpha=0.1,position='jitter')
<ggplot: (8764817671816)>

- 포텐셜2가 너무 0근처인 선수들이 있다. (아마 은퇴한 선수가 아닐까?) $\to$ 제외하고 그리자.

ggplot(data=fifa22.query('Potential2>0.1'))+geom_point(aes(x='Overall', y='Potential2'),alpha=0.1,position='jitter')
<ggplot: (8764817501802)>

- 해석

  • 음의 상관관계가 있다.
  • 오버올이 클수록 포텐셜2의 분산이 작아진다. (오버올이 클수록 더 성장할 부분이 없으니까)

- Overall을 구간별로 나누자: 어느정도가 적당한 구간일까?

fifa22.Overall.describe()
count    16710.000000
mean        67.646320
std          6.457695
min         28.000000
25%         63.000000
50%         68.000000
75%         72.000000
max         93.000000
Name: Overall, dtype: float64
import matplotlib.pyplot as plt 
plt.hist(fifa22.Overall)
(array([1.000e+00, 0.000e+00, 1.800e+01, 2.670e+02, 1.813e+03, 5.079e+03,
        6.594e+03, 2.344e+03, 5.430e+02, 5.100e+01]),
 array([28. , 34.5, 41. , 47.5, 54. , 60.5, 67. , 73.5, 80. , 86.5, 93. ]),
 <BarContainer object of 10 artists>)
fifa22.Overall.hist()
<AxesSubplot:>
def f(x): 
    if x>72: y='Q1' 
    elif x>68: y='Q2'
    elif x>63: y='Q3'
    else: y='Q4'
    return y 
fifa22['Q']=list(map(f,fifa22.Overall))
fifa22[['Q','Overall']]
Q Overall
0 Q1 93
1 Q1 92
2 Q1 91
3 Q1 91
4 Q1 91
... ... ...
16705 Q4 46
16706 Q4 46
16707 Q4 44
16708 Q4 44
16709 Q4 28

16710 rows × 2 columns

ggplot(data=fifa22.query('Potential2>0.1'))\
+geom_boxplot(aes(x='Q',y='Potential2'))
<ggplot: (8764811200844)>

- Q1으로 갈수록 분산이 작아짐! $\to$ 헷갈린다...

- 산점도와 박스플랏을 겹쳐서 그린다면 좀더 이해가 쉬울것 같다.

- x축의 위치를 조정하면 될것 같다 $\to$ Q1, Q2, Q3, Q4 각 그룹별로 x축의 위치를 구하자.

fifa22.query('Q=="Q1"').Overall.mean()
76.3506528835691
np.mean(fifa22.query('Q=="Q1"').Overall)
76.3506528835691
  • 이런식으로 해도 되지만
fifa22.groupby(by='Q').mean().Overall
Q
Q1    76.350653
Q2    70.411781
Q3    66.074449
Q4    59.602691
Name: Overall, dtype: float64
l=fifa22.groupby(by='Q').mean().Overall.to_list()
l
[76.3506528835691, 70.4117807472048, 66.07444942506334, 59.60269121813031]

- 이제 박스플랏이 들어갈 x축의 위치를 저장할 컬럼을 추가하고 그 이름을 Qx 라고 하자.

def g(x): 
    if x=='Q1': y=l[0]
    elif x=='Q2': y=l[1]
    elif x=='Q3': y=l[2]
    else: y=l[3]
    return y
fifa22['Qx']=list(map(g,fifa22.Q))
fifa22
index_old ID Name Age Photo Nationality Flag Overall Potential Club ... GKKicking GKPositioning GKReflexes Best Position Best Overall Rating Release Clause DefensiveAwareness Potential2 Q Qx
0 29 158023 L. Messi 34 https://cdn.sofifa.com/players/158/023/22_60.png Argentina https://cdn.sofifa.com/flags/ar.png 93 93 Paris Saint-Germain ... 15.0 14.0 8.0 RW 93.0 €144.3M 20.0 0 Q1 76.350653
1 33 188545 R. Lewandowski 32 https://cdn.sofifa.com/players/188/545/22_60.png Poland https://cdn.sofifa.com/flags/pl.png 92 92 FC Bayern München ... 12.0 8.0 10.0 ST 92.0 €197.2M 35.0 0 Q1 76.350653
2 14244 200389 J. Oblak 28 https://cdn.sofifa.com/players/200/389/22_60.png Slovenia https://cdn.sofifa.com/flags/si.png 91 93 Atlético de Madrid ... 78.0 90.0 90.0 GK 91.0 €238M 27.0 2 Q1 76.350653
3 3 192985 K. De Bruyne 30 https://cdn.sofifa.com/players/192/985/22_60.png Belgium https://cdn.sofifa.com/flags/be.png 91 91 Manchester City ... 5.0 10.0 13.0 CM 91.0 €232.2M 68.0 0 Q1 76.350653
4 64 190871 Neymar Jr 29 https://cdn.sofifa.com/players/190/871/22_60.png Brazil https://cdn.sofifa.com/flags/br.png 91 91 Paris Saint-Germain ... 15.0 15.0 11.0 LW 91.0 €238.7M 35.0 0 Q1 76.350653
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
16705 15593 235352 18 T. Käßemodel 28 https://cdn.sofifa.com/players/235/352/18_60.png Germany https://cdn.sofifa.com/flags/de.png 46 46 FC Erzgebirge Aue ... 6.0 13.0 6.0 CM 45.0 €47K NaN 0 Q4 59.602691
16706 15685 219735 15 T. Fletcher 19 https://cdn.sofifa.com/players/219/735/15_60.png England https://cdn.sofifa.com/flags/gb-eng.png 46 52 Wycombe Wanderers ... 8.0 14.0 11.0 CB 46.0 NaN NaN 6 Q4 59.602691
16707 16572 19334 10 I. Baraclough 38 https://cdn.sofifa.com/players/019/334/10_60.png England https://cdn.sofifa.com/flags/gb-eng.png 44 65 NaN ... 46.0 20.0 20.0 CM 46.0 NaN NaN 21 Q4 59.602691
16708 15999 220806 16 E. Redman 18 https://cdn.sofifa.com/players/220/806/16_60.png Wales https://cdn.sofifa.com/flags/gb-wls.png 44 57 Newport County ... 16.0 9.0 7.0 CB 44.0 NaN NaN 13 Q4 59.602691
16709 16709 178453 07 A. Censori 17 https://cdn.sofifa.com/players/178/453/07_60.png Italy https://cdn.sofifa.com/flags/it.png 28 38 Arezzo ... 36.0 6.0 9.0 ST 36.0 NaN NaN 10 Q4 59.602691

16710 rows × 69 columns

ggplot(data=fifa22.query('Potential2>0.1'))\
+geom_point(aes(x='Overall', y='Potential2',color='Q'),alpha=0.1,size=0.1,position='jitter')\
+geom_boxplot(aes(x='Qx', y='Potential2',color='Q'))
<ggplot: (8764818247900)>

숙제

fifa22 데이터셋에서 Q==Q4이고, Potentail2>20 인 선수들의 이름을 출력하라.

fifa22.query(('Q=="Q4" and Potential2>20'))['Name']
12496            L. Barry
12540         D. Scarlett
12704             L. Bate
13035      C. Chukwuemeka
13205        20 S. Hansen
               ...       
16670         O. Cisneros
16681          F. Emmings
16682             J. Neal
16699           N. Saliba
16707    10 I. Baraclough
Name: Name, Length: 110, dtype: object