擅长:python、mysql、java
<p>您可以编写一个<code>udf</code>来执行此操作:</p>
<pre><code>from pyspark.sql.types import ArrayType, IntegerType
import pyspark.sql.functions as F
pad_fix_length = F.udf(
lambda arr: arr[:5] + [0] * (5 - len(arr[:5])),
ArrayType(IntegerType())
)
df.withColumn('feature_indices', pad_fix_length(df.feature_indices)).show()
+ -+
| feature_indices|
+ -+
| [0, 0, 0, 0, 0]|
|[0, 1, 4, 10, 11]|
| [0, 1, 2, 0, 0]|
| [1, 0, 0, 0, 0]|
| [0, 0, 0, 0, 0]|
+ -+
</code></pre>