Pyspark

Author

SEOYEON CHOI

Published

November 9, 2025

Pyspark vs Pandas

구분 Pandas PySpark (Spark DataFrame)
기본 목적 소규모 데이터 분석 대용량(수 GB~TB) 분산 처리
데이터 처리 위치 메모리(RAM) 안에서 여러 서버(클러스터)에 분산
속도 단일 CPU 기반 (작은 데이터 빠름) 병렬 처리 (큰 데이터 효율적)
데이터 크기 한계 메모리에 맞는 크기까지만 거의 무제한 (디스크·클러스터 기반)
언어 스타일 Pythonic (직관적) SQL 스타일 + 함수 체인형
적합한 용도 EDA, 통계분석, 머신러닝 전처리 빅데이터 분석, 로그 처리, ETL 파이프라인
대표 함수 df.groupby(), df.apply() df.groupBy(), df.selectExpr()
예시 라이브러리 NumPy, scikit-learn Hadoop, Hive, Spark MLlib
  • pandas
    • 수천~수만 행 데이터는 RAM 안에서 바로 계산 가능
    • 하지만 10GB 이상이면 “MemoryError” 발생 가능 ⚠️
  • pyspark
    • 데이터가 100GB든 1TB든, Spark가 여러 서버(노드) 로 나눠서 병렬 처리
    • 로컬에서도 작은 클러스터처럼 흉내 가능

pandas에서 pyspark로 확장 가능(반대도 가능)

pandas_df = df.toPandas()   # Spark → Pandas
spark_df = spark.createDataFrame(pandas_df)   # Pandas → Spark
  • pyspark는 java기반 engine이기 때문에 openjdk 설치 필요
!apt-get install -y openjdk-11-jdk-headless -qq
!pip install -q pyspark
Selecting previously unselected package java-common.
(Reading database ... 125082 files and directories currently installed.)
Preparing to unpack .../java-common_0.72build2_all.deb ...
Unpacking java-common (0.72build2) ...
Selecting previously unselected package libpcsclite1:amd64.
Preparing to unpack .../libpcsclite1_1.9.5-3ubuntu1_amd64.deb ...
Unpacking libpcsclite1:amd64 (1.9.5-3ubuntu1) ...
Selecting previously unselected package openjdk-11-jre-headless:amd64.
Preparing to unpack .../openjdk-11-jre-headless_11.0.28+6-1ubuntu1~22.04.1_amd64.deb ...
Unpacking openjdk-11-jre-headless:amd64 (11.0.28+6-1ubuntu1~22.04.1) ...
Selecting previously unselected package ca-certificates-java.
Preparing to unpack .../ca-certificates-java_20190909ubuntu1.2_all.deb ...
Unpacking ca-certificates-java (20190909ubuntu1.2) ...
Selecting previously unselected package openjdk-11-jdk-headless:amd64.
Preparing to unpack .../openjdk-11-jdk-headless_11.0.28+6-1ubuntu1~22.04.1_amd64.deb ...
Unpacking openjdk-11-jdk-headless:amd64 (11.0.28+6-1ubuntu1~22.04.1) ...
Setting up java-common (0.72build2) ...
Setting up libpcsclite1:amd64 (1.9.5-3ubuntu1) ...
Setting up openjdk-11-jre-headless:amd64 (11.0.28+6-1ubuntu1~22.04.1) ...
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/java to provide /usr/bin/java (java) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jjs to provide /usr/bin/jjs (jjs) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/keytool to provide /usr/bin/keytool (keytool) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/rmid to provide /usr/bin/rmid (rmid) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/rmiregistry to provide /usr/bin/rmiregistry (rmiregistry) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/pack200 to provide /usr/bin/pack200 (pack200) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/unpack200 to provide /usr/bin/unpack200 (unpack200) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/lib/jexec to provide /usr/bin/jexec (jexec) in auto mode
Setting up openjdk-11-jdk-headless:amd64 (11.0.28+6-1ubuntu1~22.04.1) ...
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jar to provide /usr/bin/jar (jar) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jarsigner to provide /usr/bin/jarsigner (jarsigner) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/javac to provide /usr/bin/javac (javac) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/javadoc to provide /usr/bin/javadoc (javadoc) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/javap to provide /usr/bin/javap (javap) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jcmd to provide /usr/bin/jcmd (jcmd) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jdb to provide /usr/bin/jdb (jdb) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jdeprscan to provide /usr/bin/jdeprscan (jdeprscan) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jdeps to provide /usr/bin/jdeps (jdeps) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jfr to provide /usr/bin/jfr (jfr) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jimage to provide /usr/bin/jimage (jimage) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jinfo to provide /usr/bin/jinfo (jinfo) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jlink to provide /usr/bin/jlink (jlink) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jmap to provide /usr/bin/jmap (jmap) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jmod to provide /usr/bin/jmod (jmod) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jps to provide /usr/bin/jps (jps) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jrunscript to provide /usr/bin/jrunscript (jrunscript) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jshell to provide /usr/bin/jshell (jshell) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jstack to provide /usr/bin/jstack (jstack) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jstat to provide /usr/bin/jstat (jstat) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jstatd to provide /usr/bin/jstatd (jstatd) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/rmic to provide /usr/bin/rmic (rmic) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/serialver to provide /usr/bin/serialver (serialver) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jaotc to provide /usr/bin/jaotc (jaotc) in auto mode
update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jhsdb to provide /usr/bin/jhsdb (jhsdb) in auto mode
Setting up ca-certificates-java (20190909ubuntu1.2) ...
head: cannot open '/etc/ssl/certs/java/cacerts' for reading: No such file or directory
Adding debian:SecureSign_RootCA11.pem
Adding debian:USERTrust_RSA_Certification_Authority.pem
Adding debian:AffirmTrust_Commercial.pem
Adding debian:DigiCert_Global_Root_G3.pem
Adding debian:Entrust_Root_Certification_Authority_-_G2.pem
Adding debian:certSIGN_Root_CA_G2.pem
Adding debian:Starfield_Root_Certificate_Authority_-_G2.pem
Adding debian:SSL.com_EV_Root_Certification_Authority_RSA_R2.pem
Adding debian:QuoVadis_Root_CA_3.pem
Adding debian:GlobalSign_Root_CA_-_R6.pem
Adding debian:D-TRUST_EV_Root_CA_1_2020.pem
Adding debian:HARICA_TLS_RSA_Root_CA_2021.pem
Adding debian:Trustwave_Global_Certification_Authority.pem
Adding debian:QuoVadis_Root_CA_2_G3.pem
Adding debian:Comodo_AAA_Services_root.pem
Adding debian:Certum_EC-384_CA.pem
Adding debian:GlobalSign_ECC_Root_CA_-_R4.pem
Adding debian:Hellenic_Academic_and_Research_Institutions_ECC_RootCA_2015.pem
Adding debian:GlobalSign_Root_CA.pem
Adding debian:HiPKI_Root_CA_-_G1.pem
Adding debian:emSign_Root_CA_-_G1.pem
Adding debian:SSL.com_Root_Certification_Authority_RSA.pem
Adding debian:CFCA_EV_ROOT.pem
Adding debian:COMODO_Certification_Authority.pem
Adding debian:DigiCert_High_Assurance_EV_Root_CA.pem
Adding debian:T-TeleSec_GlobalRoot_Class_3.pem
Adding debian:D-TRUST_Root_Class_3_CA_2_2009.pem
Adding debian:Autoridad_de_Certificacion_Firmaprofesional_CIF_A62634068.pem
Adding debian:Izenpe.com.pem
Adding debian:HARICA_TLS_ECC_Root_CA_2021.pem
Adding debian:D-TRUST_Root_Class_3_CA_2_EV_2009.pem
Adding debian:Security_Communication_RootCA2.pem
Adding debian:UCA_Global_G2_Root.pem
Adding debian:D-TRUST_BR_Root_CA_1_2020.pem
Adding debian:Secure_Global_CA.pem
Adding debian:GTS_Root_R3.pem
Adding debian:ISRG_Root_X1.pem
Adding debian:Entrust_Root_Certification_Authority.pem
Adding debian:Hellenic_Academic_and_Research_Institutions_RootCA_2015.pem
Adding debian:GlobalSign_Root_E46.pem
Adding debian:vTrus_Root_CA.pem
Adding debian:TWCA_Root_Certification_Authority.pem
Adding debian:AffirmTrust_Premium.pem
Adding debian:XRamp_Global_CA_Root.pem
Adding debian:Starfield_Class_2_CA.pem
Adding debian:Buypass_Class_2_Root_CA.pem
Adding debian:Entrust.net_Premium_2048_Secure_Server_CA.pem
Adding debian:SSL.com_EV_Root_Certification_Authority_ECC.pem
Adding debian:Starfield_Services_Root_Certificate_Authority_-_G2.pem
Adding debian:Amazon_Root_CA_2.pem
Adding debian:GLOBALTRUST_2020.pem
Adding debian:Microsoft_ECC_Root_Certificate_Authority_2017.pem
Adding debian:certSIGN_ROOT_CA.pem
Adding debian:OISTE_WISeKey_Global_Root_GC_CA.pem
Adding debian:DigiCert_Assured_ID_Root_G2.pem
Adding debian:OISTE_WISeKey_Global_Root_GB_CA.pem
Adding debian:ePKI_Root_Certification_Authority.pem
Adding debian:Certum_Trusted_Root_CA.pem
Adding debian:Security_Communication_ECC_RootCA1.pem
Adding debian:Amazon_Root_CA_1.pem
Adding debian:ACCVRAIZ1.pem
Adding debian:QuoVadis_Root_CA_2.pem
Adding debian:TWCA_Global_Root_CA.pem
Adding debian:Amazon_Root_CA_3.pem
Adding debian:emSign_Root_CA_-_C1.pem
Adding debian:DigiCert_Global_Root_CA.pem
Adding debian:Security_Communication_RootCA3.pem
Adding debian:UCA_Extended_Validation_Root.pem
Adding debian:GTS_Root_R1.pem
Adding debian:Baltimore_CyberTrust_Root.pem
Adding debian:GDCA_TrustAUTH_R5_ROOT.pem
Adding debian:Certum_Trusted_Network_CA_2.pem
Adding debian:Microsec_e-Szigno_Root_CA_2009.pem
Adding debian:NAVER_Global_Root_Certification_Authority.pem
Adding debian:GTS_Root_R4.pem
Adding debian:Go_Daddy_Root_Certificate_Authority_-_G2.pem
Adding debian:Buypass_Class_3_Root_CA.pem
Adding debian:e-Szigno_Root_CA_2017.pem
Adding debian:Telia_Root_CA_v2.pem
Adding debian:QuoVadis_Root_CA_1_G3.pem
Adding debian:Certainly_Root_E1.pem
Adding debian:AC_RAIZ_FNMT-RCM.pem
Adding debian:DigiCert_TLS_ECC_P384_Root_G5.pem
Adding debian:AffirmTrust_Networking.pem
Adding debian:COMODO_RSA_Certification_Authority.pem
Adding debian:GlobalSign_Root_R46.pem
Adding debian:Trustwave_Global_ECC_P384_Certification_Authority.pem
Adding debian:TUBITAK_Kamu_SM_SSL_Kok_Sertifikasi_-_Surum_1.pem
Adding debian:Go_Daddy_Class_2_CA.pem
Adding debian:Certigna_Root_CA.pem
Adding debian:vTrus_ECC_Root_CA.pem
Adding debian:GlobalSign_ECC_Root_CA_-_R5.pem
Adding debian:NetLock_Arany_=Class_Gold=_Főtanúsítvány.pem
Adding debian:Microsoft_RSA_Root_Certificate_Authority_2017.pem
Adding debian:SZAFIR_ROOT_CA2.pem
Adding debian:Certum_Trusted_Network_CA.pem
Adding debian:CA_Disig_Root_R2.pem
Adding debian:Trustwave_Global_ECC_P256_Certification_Authority.pem
Adding debian:Hongkong_Post_Root_CA_3.pem
Adding debian:QuoVadis_Root_CA_3_G3.pem
Adding debian:SSL.com_Root_Certification_Authority_ECC.pem
Adding debian:Entrust_Root_Certification_Authority_-_G4.pem
Adding debian:GTS_Root_R2.pem
Adding debian:ISRG_Root_X2.pem
Adding debian:emSign_ECC_Root_CA_-_C3.pem
Adding debian:SwissSign_Silver_CA_-_G2.pem
Adding debian:Actalis_Authentication_Root_CA.pem
Adding debian:T-TeleSec_GlobalRoot_Class_2.pem
Adding debian:ANF_Secure_Server_Root_CA.pem
Adding debian:USERTrust_ECC_Certification_Authority.pem
Adding debian:COMODO_ECC_Certification_Authority.pem
Adding debian:DigiCert_Global_Root_G2.pem
Adding debian:Security_Communication_Root_CA.pem
Adding debian:AC_RAIZ_FNMT-RCM_SERVIDORES_SEGUROS.pem
Adding debian:DigiCert_TLS_RSA4096_Root_G5.pem
Adding debian:DigiCert_Assured_ID_Root_G3.pem
Adding debian:TeliaSonera_Root_CA_v1.pem
Adding debian:SecureTrust_CA.pem
Adding debian:DigiCert_Trusted_Root_G4.pem
Adding debian:Certainly_Root_R1.pem
Adding debian:Entrust_Root_Certification_Authority_-_EC1.pem
Adding debian:TunTrust_Root_CA.pem
Adding debian:IdenTrust_Commercial_Root_CA_1.pem
Adding debian:Certigna.pem
Adding debian:Amazon_Root_CA_4.pem
Adding debian:SwissSign_Gold_CA_-_G2.pem
Adding debian:DigiCert_Assured_ID_Root_CA.pem
Adding debian:AffirmTrust_Premium_ECC.pem
Adding debian:Atos_TrustedRoot_2011.pem
Adding debian:GlobalSign_Root_CA_-_R3.pem
Adding debian:IdenTrust_Public_Sector_Root_CA_1.pem
Adding debian:emSign_ECC_Root_CA_-_G3.pem
Adding debian:Sectigo_Public_Server_Authentication_Root_R46.pem
Adding debian:Atos_TrustedRoot_Root_CA_ECC_TLS_2021.pem
Adding debian:Atos_TrustedRoot_Root_CA_RSA_TLS_2021.pem
Adding debian:BJCA_Global_Root_CA2.pem
Adding debian:BJCA_Global_Root_CA1.pem
Adding debian:CommScope_Public_Trust_ECC_Root-01.pem
Adding debian:Sectigo_Public_Server_Authentication_Root_E46.pem
Adding debian:SSL.com_TLS_ECC_Root_CA_2022.pem
Adding debian:SSL.com_TLS_RSA_Root_CA_2022.pem
Adding debian:TrustAsia_Global_Root_CA_G4.pem
Adding debian:CommScope_Public_Trust_RSA_Root-01.pem
Adding debian:CommScope_Public_Trust_RSA_Root-02.pem
Adding debian:TrustAsia_Global_Root_CA_G3.pem
Adding debian:CommScope_Public_Trust_ECC_Root-02.pem
done.
Processing triggers for libc-bin (2.35-0ubuntu3.8) ...
/sbin/ldconfig.real: /usr/local/lib/libtcm.so.1 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libur_adapter_level_zero.so.0 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_0.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libumf.so.1 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libur_adapter_level_zero_v2.so.0 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libhwloc.so.15 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtcm_debug.so.1 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libur_adapter_opencl.so.0 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libur_loader.so.0 is not a symbolic link

Processing triggers for man-db (2.10.2-1) ...
Processing triggers for ca-certificates (20240203~22.04.1) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...

done.
done.
from pyspark.sql import SparkSession

환경변수 설정

import os

os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]

spark session 지정

spark = SparkSession.builder \
    .appName("Colab EHR Demo") \
    .getOrCreate()

spark

SparkSession - in-memory

SparkContext

Spark UI

Version
v3.5.1
Master
local[*]
AppName
Colab EHR Demo
import os, glob

glob.glob("/content/sample_data/*")
['/content/sample_data/anscombe.json',
 '/content/sample_data/README.md',
 '/content/sample_data/mnist_test.csv',
 '/content/sample_data/california_housing_test.csv',
 '/content/sample_data/california_housing_train.csv',
 '/content/sample_data/mnist_train_small.csv']
path = "/content/sample_data/california_housing_train.csv"
df = spark.read.csv(path, header=True, inferSchema=True)
df.show(5)
+---------+--------+------------------+-----------+--------------+----------+----------+-------------+------------------+
|longitude|latitude|housing_median_age|total_rooms|total_bedrooms|population|households|median_income|median_house_value|
+---------+--------+------------------+-----------+--------------+----------+----------+-------------+------------------+
|  -114.31|   34.19|              15.0|     5612.0|        1283.0|    1015.0|     472.0|       1.4936|           66900.0|
|  -114.47|    34.4|              19.0|     7650.0|        1901.0|    1129.0|     463.0|         1.82|           80100.0|
|  -114.56|   33.69|              17.0|      720.0|         174.0|     333.0|     117.0|       1.6509|           85700.0|
|  -114.57|   33.64|              14.0|     1501.0|         337.0|     515.0|     226.0|       3.1917|           73400.0|
|  -114.57|   33.57|              20.0|     1454.0|         326.0|     624.0|     262.0|        1.925|           65500.0|
+---------+--------+------------------+-----------+--------------+----------+----------+-------------+------------------+
only showing top 5 rows
df.printSchema()
root
 |-- longitude: double (nullable = true)
 |-- latitude: double (nullable = true)
 |-- housing_median_age: double (nullable = true)
 |-- total_rooms: double (nullable = true)
 |-- total_bedrooms: double (nullable = true)
 |-- population: double (nullable = true)
 |-- households: double (nullable = true)
 |-- median_income: double (nullable = true)
 |-- median_house_value: double (nullable = true)
df.columns
['longitude',
 'latitude',
 'housing_median_age',
 'total_rooms',
 'total_bedrooms',
 'population',
 'households',
 'median_income',
 'median_house_value']
print("Rows:", df.count())
print("Cols:", len(df.columns))
Rows: 17000
Cols: 9
df.describe().show()
+-------+-------------------+------------------+------------------+-----------------+-----------------+------------------+-----------------+------------------+------------------+
|summary|          longitude|          latitude|housing_median_age|      total_rooms|   total_bedrooms|        population|       households|     median_income|median_house_value|
+-------+-------------------+------------------+------------------+-----------------+-----------------+------------------+-----------------+------------------+------------------+
|  count|              17000|             17000|             17000|            17000|            17000|             17000|            17000|             17000|             17000|
|   mean|-119.56210823529375|  35.6252247058827| 28.58935294117647|2643.664411764706|539.4108235294118|1429.5739411764705|501.2219411764706| 3.883578100000021|207300.91235294117|
| stddev| 2.0051664084260357|2.1373397946570867|12.586936981660406|2179.947071452777|421.4994515798648| 1147.852959159527|384.5208408559016|1.9081565183791036|115983.76438720895|
|    min|            -124.35|             32.54|               1.0|              2.0|              1.0|               3.0|              1.0|            0.4999|           14999.0|
|    max|            -114.31|             41.95|              52.0|          37937.0|           6445.0|           35682.0|           6082.0|           15.0001|          500001.0|
+-------+-------------------+------------------+------------------+-----------------+-----------------+------------------+-----------------+------------------+------------------+
df.select("median_income", "median_house_value").show(5)
+-------------+------------------+
|median_income|median_house_value|
+-------------+------------------+
|       1.4936|           66900.0|
|         1.82|           80100.0|
|       1.6509|           85700.0|
|       3.1917|           73400.0|
|        1.925|           65500.0|
+-------------+------------------+
only showing top 5 rows
df.selectExpr("avg(median_house_value) as avg_house_value").show()
+------------------+
|   avg_house_value|
+------------------+
|207300.91235294117|
+------------------+
df.groupBy("median_income").avg("median_house_value") \
  .orderBy("avg(median_house_value)", ascending=False) \
  .show(10)
+-------------+-----------------------+
|median_income|avg(median_house_value)|
+-------------+-----------------------+
|      11.2866|               500001.0|
|      14.9009|               500001.0|
|       0.7025|               500001.0|
|       7.8647|               500001.0|
|      10.7582|               500001.0|
|       7.1669|               500001.0|
|       5.0222|               500001.0|
|      12.3804|               500001.0|
|       7.8521|               500001.0|
|       4.8482|               500001.0|
+-------------+-----------------------+
only showing top 10 rows