向structyp添加新列时的不明确行为

def add_ids(X): schema_new = X.schema.add("id_col", LongType(), False) _X = X.rdd.zipWithIndex().map(lambda l: list(l[0]) + [l[1]]).toDF(schema_new) cols_arranged = [_X.columns[-1]] + _X.columns[0:len(_X.columns) - 1] return _X.select(*cols_arranged)

>>> X.show(4) +-----------+-------+-------------+-------------+-------+----+------------------------+---+-------+ |Pregnancies|Glucose|BloodPressure|SkinThickness|Insulin| BMI|DiabetesPedigreeFunction|Age|Outcome| +-----------+-------+-------------+-------------+-------+----+------------------------+---+-------+ | 6| 148| 72| 35| 0|33.6| 0.627| 50| 1| | 1| 85| 66| 29| 0|26.6| 0.351| 31| 0| | 8| 183| 64| 0| 0|23.3| 0.672| 32| 1| | 1| 89| 66| 23| 94|28.1| 0.167| 21| 0| +-----------+-------+-------------+-------------+-------+----+------------------------+---+-------+ only showing top 4 rows >>> X.columns ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome', 'id_col']

1条回答

网友

1楼 · 发布于 2024-09-22 16:35:17

错误就在这里：

schema_new = X.schema.add("id_col", LongType(), False)

如果选中the source，您将看到add方法在适当的地方修改了数据。在

简单的例子更容易看出：

^{pr2}$

StructType(List(StructField(foo,IntegerType,true)))

如您所见，schema对象已被修改。在

应该重新生成架构，而不是使用add方法：

schema_new = StructType(schema.fields + [StructField("id_col", LongType(), False)])

或者，可以创建对象的深层副本：

import copy

old_schema = StructType()
new_schehma = copy.deepcopy(old_schema).add(StructField("foo", IntegerType()))

old_schema

StructType(List())

new_schehma

StructType(List(StructField(foo,IntegerType,true)))

相关问题更多 >

编程相关推荐

热门问题

热门文章