<p>另一种方法是,如果有多个字段需要修改,则使用StructField。</p>
<p>例如:</p>
<pre><code>from pyspark.sql.types import StructField,IntegerType, StructType,StringType
newDF=[StructField('CLICK_FLG',IntegerType(),True),
StructField('OPEN_FLG',IntegerType(),True),
StructField('I1_GNDR_CODE',StringType(),True),
StructField('TRW_INCOME_CD_V4',StringType(),True),
StructField('ASIAN_CD',IntegerType(),True),
StructField('I1_INDIV_HHLD_STATUS_CODE',IntegerType(),True)
]
finalStruct=StructType(fields=newDF)
df=spark.read.csv('ctor.csv',schema=finalStruct)
</code></pre>
<p>输出:<br/></p>
<p>以前</p>
<pre><code>root
|-- CLICK_FLG: string (nullable = true)
|-- OPEN_FLG: string (nullable = true)
|-- I1_GNDR_CODE: string (nullable = true)
|-- TRW_INCOME_CD_V4: string (nullable = true)
|-- ASIAN_CD: integer (nullable = true)
|-- I1_INDIV_HHLD_STATUS_CODE: string (nullable = true)
</code></pre>
<p>之后:</p>
<pre><code>root
|-- CLICK_FLG: integer (nullable = true)
|-- OPEN_FLG: integer (nullable = true)
|-- I1_GNDR_CODE: string (nullable = true)
|-- TRW_INCOME_CD_V4: string (nullable = true)
|-- ASIAN_CD: integer (nullable = true)
|-- I1_INDIV_HHLD_STATUS_CODE: integer (nullable = true)
</code></pre>
<p>这是一个稍长的转换过程,但其优点是可以完成所有必需的字段。</p>
<p>需要注意的是,如果只为必需字段分配了数据类型,那么结果数据帧将只包含那些已更改的字段。</p>