Research - Python Data Analysis for Healthcare

1

import pandas as pd
import plotly.express as px

plotly 사용

df_lab = pd.read_csv("../../../delete/Demo Patient Experience Data_NHC.csv")
df_lab.head()

	Unnamed: 0	Entry Code	Feedback Date	Patient Age	Staff Rating	Speed Rating	Wait Time (minutes)	Test Type	Review
0	0	1	2022-10-22	61	5	3	29	Urine Test	I couldn't have asked for better service from ...
1	1	2	2022-01-12	26	3	4	171	Biopsy	I'm indifferent about the service provided by ...
2	2	3	2022-04-06	53	3	5	176	Urine Test	My experience with the staff was neither good ...
3	3	4	2022-12-11	40	4	5	94	CT Scan	I might consider returning for future tests, b...
4	4	5	2022-10-31	96	2	1	165	Biopsy	The service was poor, there are serious improv...

df_lab = df_lab.drop("Unnamed: 0", axis=1)
df_lab.head()

	Entry Code	Feedback Date	Patient Age	Staff Rating	Speed Rating	Wait Time (minutes)	Test Type	Review
0	1	2022-10-22	61	5	3	29	Urine Test	I couldn't have asked for better service from ...
1	2	2022-01-12	26	3	4	171	Biopsy	I'm indifferent about the service provided by ...
2	3	2022-04-06	53	3	5	176	Urine Test	My experience with the staff was neither good ...
3	4	2022-12-11	40	4	5	94	CT Scan	I might consider returning for future tests, b...
4	5	2022-10-31	96	2	1	165	Biopsy	The service was poor, there are serious improv...

df_lab.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Entry Code           20000 non-null  int64 
 1   Feedback Date        20000 non-null  object
 2   Patient Age          20000 non-null  int64 
 3   Staff Rating         20000 non-null  int64 
 4   Speed Rating         20000 non-null  int64 
 5   Wait Time (minutes)  20000 non-null  int64 
 6   Test Type            20000 non-null  object
 7   Review               20000 non-null  object
dtypes: int64(5), object(3)
memory usage: 1.2+ MB

px.bar(df_lab,x="Test Type", title="Test Type Distribution", color ="Test Type")

bar plot으로 test type 살펴보기

근데 육안으로 보기에 좋아보이진 않은 차이..

fig = px.box(df_lab,x="Staff Rating", y="Speed Rating", title="Staff vs Speed Ratings")
fig.show()

box plot, staff rating 별로 speed rating이 별 차이 없어보임

px.violin(df_lab,x="Wait Time (minutes)", title = "Wait Time distribution", box=True, points="all")

violin은 잘 안 쓰이긴 하는데 히스토그램처럼 분포볼때 좋음

box True 옵션으로 boxplot과 함께 보기 가능, default는 false

개별 관측값을 보기 위해서 points=’all’을 넣기도 함 points=’outliers’로 이상치 보기 가능

2

import pandas as pd
import matplotlib.pyplot as plt
from textblob import TextBlob

df_experience  = pd.read_csv("../../../delete/Demo Patient Experience Data_NHC.csv")
df_experience.head()

	Unnamed: 0	Entry Code	Feedback Date	Patient Age	Staff Rating	Speed Rating	Wait Time (minutes)	Test Type	Review
0	0	1	2022-10-22	61	5	3	29	Urine Test	I couldn't have asked for better service from ...
1	1	2	2022-01-12	26	3	4	171	Biopsy	I'm indifferent about the service provided by ...
2	2	3	2022-04-06	53	3	5	176	Urine Test	My experience with the staff was neither good ...
3	3	4	2022-12-11	40	4	5	94	CT Scan	I might consider returning for future tests, b...
4	4	5	2022-10-31	96	2	1	165	Biopsy	The service was poor, there are serious improv...

df_experience['Review'][0]

"I couldn't have asked for better service from the staff, it was top-notch.. I had a fantastic experience, everything went smoothly and quickly.. The staff's expertise and dedication made my visit truly exceptional.. The staff was outstanding, I couldn't be happier with their service.. The staff was outstanding, I couldn't be happier with their service.. I couldn't have asked for better service from the staff, it was top-notch."

TextBlob(df_experience['Review'][0])

TextBlob("I couldn't have asked for better service from the staff, it was top-notch.. I had a fantastic experience, everything went smoothly and quickly.. The staff's expertise and dedication made my visit truly exceptional.. The staff was outstanding, I couldn't be happier with their service.. The staff was outstanding, I couldn't be happier with their service.. I couldn't have asked for better service from the staff, it was top-notch.")

단지 TextBlob에 씌우기만 하면 TextBlob 내에서 분석하기 좋은 행동이 된다.

TextBlob(df_experience['Review'][0]).sentiment

Sentiment(polarity=0.58, subjectivity=0.765)

리뷰 텍스트의 감성 분석 결과를 수치로 반환

polarity 감정이 긍정 ? 부정 ?

subjectivity 객관적? 주관적??

df_experience['Polarity'] = df_experience['Review'].apply(lambda x : TextBlob(x).polarity)
df_experience

	Unnamed: 0	Entry Code	Feedback Date	Patient Age	Staff Rating	Speed Rating	Wait Time (minutes)	Test Type	Review	Polarity
0	0	1	2022-10-22	61	5	3	29	Urine Test	I couldn't have asked for better service from ...	0.580000
1	1	2	2022-01-12	26	3	4	171	Biopsy	I'm indifferent about the service provided by ...	0.262121
2	2	3	2022-04-06	53	3	5	176	Urine Test	My experience with the staff was neither good ...	0.031506
3	3	4	2022-12-11	40	4	5	94	CT Scan	I might consider returning for future tests, b...	0.171875
4	4	5	2022-10-31	96	2	1	165	Biopsy	The service was poor, there are serious improv...	-0.421429
...	...	...	...	...	...	...	...	...	...	...
19995	19995	19996	2022-09-01	34	3	2	128	CT Scan	It was okay, the staff's performance was nothi...	0.173214
19996	19996	19997	2022-05-26	80	2	4	84	Blood Test	I was quite disappointed, and there is a lot o...	-0.194444
19997	19997	19998	2022-11-01	60	2	1	76	Ultrasound	I was quite disappointed, and there is a lot o...	0.125000
19998	19998	19999	2022-10-09	62	1	5	16	Biopsy	There are no redeeming qualities about the ser...	-0.434091
19999	19999	20000	2022-02-16	13	5	1	86	Ultrasound	The staff's expertise and dedication made my v...	0.604167

20000 rows × 10 columns

df_experience['Subjectivity'] = df_experience['Review'].apply(lambda x : TextBlob(x).subjectivity)
df_experience

	Unnamed: 0	Entry Code	Feedback Date	Patient Age	Staff Rating	Speed Rating	Wait Time (minutes)	Test Type	Review	Polarity	Subjectivity
0	0	1	2022-10-22	61	5	3	29	Urine Test	I couldn't have asked for better service from ...	0.580000	0.765000
1	1	2	2022-01-12	26	3	4	171	Biopsy	I'm indifferent about the service provided by ...	0.262121	0.650000
2	2	3	2022-04-06	53	3	5	176	Urine Test	My experience with the staff was neither good ...	0.031506	0.548752
3	3	4	2022-12-11	40	4	5	94	CT Scan	I might consider returning for future tests, b...	0.171875	0.446875
4	4	5	2022-10-31	96	2	1	165	Biopsy	The service was poor, there are serious improv...	-0.421429	0.650000
...	...	...	...	...	...	...	...	...	...	...	...
19995	19995	19996	2022-09-01	34	3	2	128	CT Scan	It was okay, the staff's performance was nothi...	0.173214	0.638095
19996	19996	19997	2022-05-26	80	2	4	84	Blood Test	I was quite disappointed, and there is a lot o...	-0.194444	0.483333
19997	19997	19998	2022-11-01	60	2	1	76	Ultrasound	I was quite disappointed, and there is a lot o...	0.125000	0.515000
19998	19998	19999	2022-10-09	62	1	5	16	Biopsy	There are no redeeming qualities about the ser...	-0.434091	0.740909
19999	19999	20000	2022-02-16	13	5	1	86	Ultrasound	The staff's expertise and dedication made my v...	0.604167	0.796875

20000 rows × 11 columns

긍부정 결과와 객주관적 결과를 변수로 apply이용하여 할당

df_experience['Sentiment'] = df_experience['Polarity'].apply(lambda x : "Positive" if x > 0 else "Neutral" if x ==0 else "Negative")
df_experience

	Unnamed: 0	Entry Code	Feedback Date	Patient Age	Staff Rating	Speed Rating	Wait Time (minutes)	Test Type	Review	Polarity	Subjectivity	Sentiment
0	0	1	2022-10-22	61	5	3	29	Urine Test	I couldn't have asked for better service from ...	0.580000	0.765000	Positive
1	1	2	2022-01-12	26	3	4	171	Biopsy	I'm indifferent about the service provided by ...	0.262121	0.650000	Positive
2	2	3	2022-04-06	53	3	5	176	Urine Test	My experience with the staff was neither good ...	0.031506	0.548752	Positive
3	3	4	2022-12-11	40	4	5	94	CT Scan	I might consider returning for future tests, b...	0.171875	0.446875	Positive
4	4	5	2022-10-31	96	2	1	165	Biopsy	The service was poor, there are serious improv...	-0.421429	0.650000	Negative
...	...	...	...	...	...	...	...	...	...	...	...	...
19995	19995	19996	2022-09-01	34	3	2	128	CT Scan	It was okay, the staff's performance was nothi...	0.173214	0.638095	Positive
19996	19996	19997	2022-05-26	80	2	4	84	Blood Test	I was quite disappointed, and there is a lot o...	-0.194444	0.483333	Negative
19997	19997	19998	2022-11-01	60	2	1	76	Ultrasound	I was quite disappointed, and there is a lot o...	0.125000	0.515000	Positive
19998	19998	19999	2022-10-09	62	1	5	16	Biopsy	There are no redeeming qualities about the ser...	-0.434091	0.740909	Negative
19999	19999	20000	2022-02-16	13	5	1	86	Ultrasound	The staff's expertise and dedication made my v...	0.604167	0.796875	Positive

20000 rows × 12 columns

수치화된 결과를 문자형으로 수정

sentiment_counts = df_experience['Sentiment']. value_counts()
sentiment_counts

Positive    12740
Negative     7242
Neutral        18
Name: Sentiment, dtype: int64

sentiment_counts.plot(kind='bar')
plt.title('Sentiment Analysis')
plt.xlabel("Sentiment")
plt.ylabel("Count")
plt.show()

중립 결과는 없음