擅长:python、mysql、java
<p><a href="https://stackoverflow.com/users/15452601/2e0byo">2e0byo</a>的<a href="https://stackoverflow.com/a/69361681/9057615">answer</a>非常正确。我正在添加另一种方法,如何在pyspark中实现这一点</p>
<p>如果我们的条件是SQL条件表达式的字符串(如col_1=='ABC101'),那么我们可以组合所有这些字符串,并将组合后的字符串作为条件提供给<code>where()</code>(或<code>filter()</code>)</p>
<pre><code>df = spark.createDataFrame([(1, "a"),
(2, "b"),
(3, "c"),
(4, "d"),
(5, "e"),
(6, "f"),
(7, "g")], schema="id int, name string")
condition1 = "id == 1"
condition2 = "id == 4"
condition3 = "id == 6"
conditions = [condition1, condition2, condition3]
combined_or_condition = " or ".join(conditions) # Combine the conditions: condition1 or condition2 or condition3
df.where(combined_or_condition).show()
</code></pre>
<p><code>" or ".join(conditions)</code>通过使用<code>or</code>作为分隔符/连接符/组合器连接<code>conditions</code>中存在的所有字符串来创建字符串。这里,<code>combined_or_condition</code>变成了<code>id == 1 or id == 4 or id == 6</code></p>