擅长:python、mysql、java
<p>您应该使用pyspark内置函数<code>date_trunc</code>来截断为<code>hour</code>。您还可以截断为天/月/年等</p>
<pre><code>from pyspark.sql import functions as F
df.withColumn("hour", F.date_trunc('hour',F.to_timestamp("timestamp","yyyy-MM-dd HH:mm:ss 'UTC'")))\
.show(truncate=False)
+ + -+ -+
|identifier |timestamp |hour |
+ + -+ -+
|86311425-0890-40a5-8950-54cbaaa60815|2020-03-18 14:41:55 UTC|2020-03-18 14:00:00|
|38e121a8-f21f-4d10-bb69-26eb045175b5|2020-03-13 15:19:21 UTC|2020-03-13 15:00:00|
|1a69c9b0-283b-4b6d-89ac-66f987280c66|2020-03-16 12:59:51 UTC|2020-03-16 12:00:00|
|c7b5c53f-bf40-498f-8302-4b3329322bc9|2020-03-18 22:05:06 UTC|2020-03-18 22:00:00|
|0d3d807b-9b3a-466e-907c-c22402240730|2020-03-17 18:40:03 UTC|2020-03-17 18:00:00|
+ + -+ -+
</code></pre>