Mongodb聚合透视嵌入式lis

pipeline = [ {'$unwind': '$saved_alloys'}, { '$project': { '_id': 0, 'name': '$saved_alloys.name', 'compositions': '$saved_alloys.compositions' } } ] res = db['alloys'].aggregate(pipeline) for e in res: print(e)

{ 'name': 'alloy-1', 'compositions': [ {'symbol': 'C', 'weight': 0.36}, {'symbol': 'Mn', 'weight': 1.41} {'symbol': 'Si', 'weight': 1.03}, {'symbol': 'Ni', 'weight': 1.7} ] } { 'name': 'alloy-2', 'compositions': [ {'symbol': 'C', 'weight': 0.21}, {'symbol': 'Mn', 'weight': 0.23}, {'symbol': 'Si', 'weight': 0.86}, {'symbol': 'Ni', 'weight': 0.67}, {'symbol': 'Cr', 'weight': 0.12}, ] } ...

1条回答

网友

1楼 · 发布于 2024-09-25 00:32:51

毫无疑问，这是可以优化的，但作为一种简单的入门方法，为每个返回的输出构造一个熊猫系列，并附加到数据帧中；最后用0.0替换任何“缺失”值

from pymongo import MongoClient
import pandas as pd
import numpy as np

db = MongoClient()["mydatabase"]

db.alloys.insert_one({
    'saved_alloys': [{
        'name': 'alloy-1',
        'compositions': [
            {'symbol': 'C', 'weight': 0.36},
            {'symbol': 'Mn', 'weight': 1.41},
            {'symbol': 'Si', 'weight': 1.03},
            {'symbol': 'Ni', 'weight': 1.7}
        ]
    },
        {
            'name': 'alloy-2',
            'compositions': [
                {'symbol': 'C', 'weight': 0.21},
                {'symbol': 'Mn', 'weight': 0.23},
                {'symbol': 'Si', 'weight': 0.86},
                {'symbol': 'Ni', 'weight': 0.67},
                {'symbol': 'Cr', 'weight': 0.12},
            ]
        }]
}
)

pipeline = [
    {'$unwind': '$saved_alloys'},
    {
        '$project': {
            '_id': 0,
            'name': '$saved_alloys.name',
            'compositions': '$saved_alloys.compositions'
        }
    }
]

res = db['alloys'].aggregate(pipeline)
df = pd.DataFrame()

for alloy in res:
    ser = pd.Series()
    # Set the series name as the alloy
    ser.name = alloy['name']

    for composition in alloy['compositions']:
        # Add in each alloy to the series
        ser.at[composition['symbol']] = composition['weight']

    df = df.append(ser)

# Once we have our DataFrame, replace any missing values with 0.0
df = df.replace(np.nan, 0.0)
print(df)

结果:

            C    Mn    Ni    Si    Cr
alloy-1  0.36  1.41  1.70  1.03  0.00
alloy-2  0.21  0.23  0.67  0.86  0.12

相关问题更多 >

编程相关推荐

热门问题

热门文章