Pyspark:Pad Array[Int]列带零

+--------------------+ | feature_indices| +--------------------+ | [0]| |[0, 1, 4, 10, 11,...| | [0, 1, 2]| | [1]| | [0]| +--------------------+

2条回答

网友

1楼 · 编辑于 2024-09-29 19:01:01

我最近在Keras中使用了pad_sequences函数来做类似的事情。我不确定你的用例，所以这可能是一个不必要的大依赖。在

无论如何，这里是函数的文档链接：https://keras.io/preprocessing/sequence/#pad_sequences

from keras.preprocessing.sequence import pad_sequences    

input_sequence =[[1,2,3], [1,2], [1,4]]

padded_sequence = pad_sequences(input_sequence, maxlen=3, padding='post', truncating='post', value=0.0)

print padded_sequence

输出：

^{pr2}$

网友

2楼 · 编辑于 2024-09-29 19:01:01

您可以编写一个udf来执行此操作：

from pyspark.sql.types import ArrayType, IntegerType
import pyspark.sql.functions as F

pad_fix_length = F.udf(
    lambda arr: arr[:5] + [0] * (5 - len(arr[:5])), 
    ArrayType(IntegerType())
)

df.withColumn('feature_indices', pad_fix_length(df.feature_indices)).show()
+        -+
|  feature_indices|
+        -+
|  [0, 0, 0, 0, 0]|
|[0, 1, 4, 10, 11]|
|  [0, 1, 2, 0, 0]|
|  [1, 0, 0, 0, 0]|
|  [0, 0, 0, 0, 0]|
+        -+

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pyspark:Pad Array[Int]列带零

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >