Python中带有JSON或CSV的VLOOKUP

2024-05-17 03:19:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个可用的JSON和CSV数据源。 第一个JSON看起来像:

[
 {
   "Number": "AA23LA13-14",
   "Status": "S"
 },
 {
   "Number": "AA23LA14-18",
   "Status": "F"
 },
 {
   "Number": "AA23LA14-22",
   "Status": "F"
 },
 {
   "Number": "AA23LA14-9",
   "Status": "F"
 },
 {
   "Number": "AA23LA18S-30",
   "Status": "F"
 },
 {
   "Number": "AA23LB-11",
   "Status": "F"
 },`
 ...

CSV等价物为:

Number;Status
AA23LA13-14;S
AA23LA14-18;F
AA23LA14-22;F
AA23LA14-9;F
AA23LA18S-30;F
AA23LB-11;F
...

CSV文件中有151.815行。 JSON中的第二个排序如下所示:

[
 {
   "Number": "#901",
   "State": "Released"
 },
 {
   "Number": "07559-03046",
   "State": "Released"
 },
 {
   "Number": "07559-03046",
   "State": "Released"
 },
 {
   "Number": "07612-02001",
   "State": "Obsolete"
 },
 {
   "Number": "07612-02001",
   "State": "Obsolete"
 },
 {
   "Number": "07612-02001",
   "State": "Obsolete"
 },
 ...

和CSV格式:

Number;State
#901;Released
07559-03046;Released
07559-03046;Released
07612-02001;Obsolete
...

第二个CSV有537.123行

这两个来源之间的关系如下:

  • 数字=数字
  • 状态=状态

如何以最有效的方式使用python,比较文件(JSON或CSV)并在第二个源中搜索第一个源的编号,如果存在匹配项,则将第一个源的状态添加到第二个源的对象

例如:

[
 {
   "Number": "#901",
   "State": "Released"
   "Status": "F"
 },
 {
   "Number": "07559-03046",
   "State": "Released"
   "Status": "F"
 },
 ...

Tags: 文件csvjsonnumber状态status数字数据源
1条回答
网友
1楼 · 发布于 2024-05-17 03:19:12

由于您正在处理大量的行,我建议使用^{}。然后您可以执行以下操作:

  1. 将每个数据源(作为^{}^{}文件)导入到单个^{}
  2. 假设Number是唯一的,您可以将其设置为索引(为了确保这一点,您可以事先调用^{}
  3. 使用第二个数据帧根据编号(两个数据帧中的编号相同)执行查找。您可以用不同的方法来实现这一点,其中之一是执行^{} on the index
  4. 将查找结果作为新列分配给第一个数据帧

这听起来可能有点复杂,但只需几行即可完成:

import random
import string

import numpy as np
import pandas as pd

# generate some sample data
numbers = np.random.randint(500, size=20)
states = random.choices(string.ascii_uppercase, k=4)
status = random.choices(string.ascii_uppercase, k=4)

# combine a Number and a State to create a dummy of your first data souce
first_source = pd.DataFrame(
    [{'Number': n, 'State': random.choice(states)} for n in numbers]
)

# combine a Number and a Status to create a dummy of your second data souce
second_source = pd.DataFrame(
    [{'Number': n, 'Status': random.choice(status)} for n in numbers]
)

# you would need to import your data sources into pandas dataframes
# Since I generated some sample data using the code above, I do not need to do this, but
# using csv files would basically look like this (adopt to your needs)
# first_source = pd.read_csv('first_soruce.csv')
# second_source = pd.read_csv('first_soruce.csv')

# as Number is considered being unique, we can drop all duplicate entries (checking `Number` column only) ans set the index to this column
first_source = first_source.drop_duplicates(subset='Number').set_index('Number')
second_source = second_source.drop_duplicates(subset='Number').set_index('Number')

# we can now use the index of `first_source` to lookup the according `Status` value from `second_source` and assign in to a new column `Status`
# `first_source` will then contain all information (from both data sources) and can be used for further processing or to export the information
first_source['Status'] = first_source.index.map(lambda x: second_source.loc[x].iloc[0])
print(first_source)

相关问题 更多 >