Python中带有JSON或CSV的VLOOKUP

[ { "Number": "AA23LA13-14", "Status": "S" }, { "Number": "AA23LA14-18", "Status": "F" }, { "Number": "AA23LA14-22", "Status": "F" }, { "Number": "AA23LA14-9", "Status": "F" }, { "Number": "AA23LA18S-30", "Status": "F" }, { "Number": "AA23LB-11", "Status": "F" },` ...

[ { "Number": "#901", "State": "Released" }, { "Number": "07559-03046", "State": "Released" }, { "Number": "07559-03046", "State": "Released" }, { "Number": "07612-02001", "State": "Obsolete" }, { "Number": "07612-02001", "State": "Obsolete" }, { "Number": "07612-02001", "State": "Obsolete" }, ...

1条回答

网友

1楼 · 发布于 2024-05-17 03:19:12

由于您正在处理大量的行，我建议使用^{}。然后您可以执行以下操作：

将每个数据源（作为^{}或^{}文件）导入到单个^{}
假设Number是唯一的，您可以将其设置为索引（为了确保这一点，您可以事先调用^{}）
使用第二个数据帧根据编号（两个数据帧中的编号相同）执行查找。您可以用不同的方法来实现这一点，其中之一是执行^{} on the index
将查找结果作为新列分配给第一个数据帧

这听起来可能有点复杂，但只需几行即可完成：

import random
import string

import numpy as np
import pandas as pd

# generate some sample data
numbers = np.random.randint(500, size=20)
states = random.choices(string.ascii_uppercase, k=4)
status = random.choices(string.ascii_uppercase, k=4)

# combine a Number and a State to create a dummy of your first data souce
first_source = pd.DataFrame(
    [{'Number': n, 'State': random.choice(states)} for n in numbers]
)

# combine a Number and a Status to create a dummy of your second data souce
second_source = pd.DataFrame(
    [{'Number': n, 'Status': random.choice(status)} for n in numbers]
)

# you would need to import your data sources into pandas dataframes
# Since I generated some sample data using the code above, I do not need to do this, but
# using csv files would basically look like this (adopt to your needs)
# first_source = pd.read_csv('first_soruce.csv')
# second_source = pd.read_csv('first_soruce.csv')

# as Number is considered being unique, we can drop all duplicate entries (checking `Number` column only) ans set the index to this column
first_source = first_source.drop_duplicates(subset='Number').set_index('Number')
second_source = second_source.drop_duplicates(subset='Number').set_index('Number')

# we can now use the index of `first_source` to lookup the according `Status` value from `second_source` and assign in to a new column `Status`
# `first_source` will then contain all information (from both data sources) and can be used for further processing or to export the information
first_source['Status'] = first_source.index.map(lambda x: second_source.loc[x].iloc[0])
print(first_source)

相关问题更多 >

编程相关推荐

热门问题

热门文章