擅长:python、mysql、java
<p>我们应该在过滤条件中使用别名,因为帧具有相似的列</p>
<pre><code>from pyspark.sql import functions as sf
input_frame = input_frame.alias('input_frame')
unique_frame = unique_frame.alias('unique_frame')
duplicate_data = joined_data.filter(sf.col("input_frame.timestamp") != sf.col("unique_data.timestamp")).select("input_frame.*")
duplicate_data.show()
+ + + -+
|student_id|name |timestamp|
+ + + -+
| s1|testuser | t1|
| s2|test123 | t1|
+ + + -+
</code></pre>