python将pyspark数据帧写入json而不带头

SomeJson ================= [{ "Number": "1234", "Color": "blue", "size": "Medium" }, { "Number": "2222", "Color": "red", "size": "Small" } ]

{ "SomeJson": [{ "Number": "1234", "Color": "blue", "size": "Medium" }, { "Number": "2222", "Color": "red", "size": "Small" } ] }

1条回答

网友

1楼 · 发布于 2024-10-02 02:23:58

根据这个答案：Convert pyspark dataframe into list of python dictionaries

您可以这样做：

df0.rdd.map(lambda x: [ele.asDict() for ele in x["SomeJson"]]).saveAsTextFile("data/output.json")

它产生如下输出：

[{'Color': 'blue', 'Number': '1234', 'size': 'Medium'}, {'Color': 'red', 'Number': '2222', 'size': 'Small'}]

编辑：

读取json时，Spark不维护顺序。但是我们可以改变我们收到的字典的顺序。由于python3中的dictionary保持插入顺序，因此我们只需要创建一个新的dictionary，并考虑插入顺序。剩下的只是字符串操作。我会这样做的

required_order = ["Number","Color","size"]

def change_order(row_dict, order):
    temp_dict = {}
    for name in order:
        temp_dict[name] = row_dict[name]
    return temp_dict

df0.rdd.map(lambda x: "{" + ",".join([str(ele) for ele in [change_order(ele.asDict(), required_order) for ele in x["SomeJson"]]]) + "}").saveAsTextFile("data/output.json")

它产生以下输出

{{'Number': '1234', 'Color': 'blue', 'size': 'Medium'},{'Number': '2222', 'Color': 'red', 'size': 'Small'}}

相关问题更多 >

编程相关推荐

热门问题

热门文章

python将pyspark数据帧写入json而不带头

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >