擅长:python、mysql、java
<p>您可以将列作为数组传递给自定义项,然后检查所有值是否为零,然后应用过滤器:</p>
<pre><code>from pyspark.sql.types import BooleanType
from pyspark.sql.functions import udf, array, col
all_zeros_udf = udf(lambda arr: arr.count(0) == len(arr), BooleanType())
df = spark.createDataFrame([(0, 1, 1, 2,1), (0, 0, 1, 0, 1), (1, 0, 1, 1 ,1)], ['a', 'b', 'c', 'd', 'e'])
df
.withColumn('all_zeros', all_zeros_udf(array('a', 'b', 'd'))) # pass the columns as array
.filter(~col('all_zeros')) # Filter the columns where all values are NOT zeros
.drop('all_zeros') # Drop the column
.show()
</code></pre>
<p>结果:</p>
<pre><code>+---+---+---+---+---+
| a| b| c| d| e|
+---+---+---+---+---+
| 0| 1| 1| 2| 1|
| 1| 0| 1| 1| 1|
+---+---+---+---+---+
</code></pre>