pyspark读写csv文件
2020-01-13 本文已影响0人
_Rango_
读取csv文件
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext()
sqlsc = SQLContext(sc)
df = sqlsc.read.format('csv')\
.option('delimiter', '\t')\
.load('/path/to/file.csv')\
.toDF('col1', 'col2', 'col3')
写入csv文件
df.write.format('csv')\
.option('header','true')\
.save('/path/to/file1.csv')
option支持参数
-
path
: csv文件的路径。支持通配符; -
header
: csv文件的header。默认值是false; -
delimiter
: 分隔符。默认值是','
; -
quote
: 引号。默认值是""
; -
mode
: 解析的模式。支持的选项有:- PERMISSIVE: nulls are inserted for missing tokens and extra tokens are ignored.
- DROPMALFORMED: drops lines which have fewer or more tokens than expected.
- FAILFAST: aborts with a RuntimeException if encounters any malformed line.