擅长:python、mysql、java
<p>我能够在不创建udf的情况下解决它,我确实在stack上引用了一个类似的post(<a href="https://stackoverflow.com/questions/41526705/pyspark-substring-and-aggregation">pyspark substring and aggregation</a>),它工作得非常完美。在</p>
<pre><code>from pyspark.sql.functions import *
format = 'mmddyy'
col = unix_timestamp(df1['DATE_OPENED'], format).cast('timestamp')
df1 = df1.withColumn("DATE_OPENED", col)
df2 = df.withColumn('open_dt', df['DATE_OPENED'].substr(1, 11))
</code></pre>