将两个CSV文件合并到JSON中

1条回答

网友

1楼 · 发布于 2024-09-27 07:30:41

通过检查您的代码和所需的输出，猜猜您所说的“它不起作用”是什么意思，我想我已经能够编写一些起作用的代码了

首先，这里是我使用的数据，其中»右guillemet字符用于表示tab'\t'字符-换句话说，您有以tab分隔的CSV文件，在使用csv.DictReader读取它们时必须指定这些文件，因为默认情况下，值之间要有逗号

输入文件

gpo_full.csv：

Date»hearing_sub_type»Old_ID»witnesses
January,1997»Oversight»22»Babbitt
June,2000»General»21»Blitzer
January,1997»General»23»Alldridge
April,2001»Oversight»25»Michael
June,2000»Oversight»24»Bruce

CAP_cols.csv：

majortopic»id»Chamber»subtopic»Date
21»79846»1»2103»January,1997
4»79847»2»705»June,2000
13»79848»1»1802»May,2001
7»79849»2»201»June,2000
21»79850»1»401»January,1997

Code:

最重要的变化是使output成为一个字典子类，其键是日期（参见下面关于日期的注意事项），每个都与一个嵌套的字典关联，该字典包含自己的一对"CAP"和"GPO"列表。这样做可以很容易地检测到以前是否遇到过日期，并为其初始化嵌套字典-这会在另一个CSV文件中搜索匹配的日期（这就是为什么定义并使用字典子类）

✶我认为值得一提的是，通过file.seek(0)来“倒带”CSV文件以重新读取其中的数据行是行不通的。相反，必须关闭、重新打开它们，并重新创建它们的csv.DictReader。这至少在一定程度上是因为在本例中，文件的开头有一个特殊的标题行，用于定义其字段的名称

一个警告我想提到的是，由于日期字段是“月，年”格式，不能用于按时间顺序对它们进行排序，因为它们将按字典顺序作为字符串进行比较，而不是按数字进行比较，即June,2021将被视为在December,2001之前出现-这是我禁用排序的主要原因。这是可以解决的，但这不是你问题的主要主题

import csv
import json
from pprint import pprint


class OutputDict(dict):
    """ Dictionary subclass to create initial entry for each date. """
    def __init__(self, cap_filename):
        super().__init__()  # Base class initialization.
        self.cap_filename = cap_filename

    def __missing__(self, date):
        """ Initialize entry for a date when it's first encountered. """
        self[date] = {'Date': date, 'GPO': [], 'CAP': []}

        # Scan cap csv file for matching dates - occurs once per unique date added.
        with open(self.cap_filename) as cap_file:  # Add matches from other CSV file.
            cap_reader = csv.DictReader(cap_file, delimiter='\t')
            for cap_row in cap_reader:
                if date == cap_row['Date']:
                    del cap_row['Date']
                    self[date]['CAP'].append(cap_row)

        # If there weren't any matching dates, create and add an empty row.
        if not self[date]['CAP']:  # No matching CAP dates?
            cap_row = {field: None for field in cap_reader.fieldnames if field != 'Date'}
            self[date]['CAP'].append(cap_row)

        return self[date]


# Add all the dates in the gpo_full.csv file to an output dictionary.
output = OutputDict('CAP_cols.csv')  # Automatically add matching dates in CAP_cols file.
with open('gpo_full.csv') as gpo_file:
    gpo_reader = csv.DictReader(gpo_file, delimiter='\t')
    for gpo_row in gpo_reader:
        date = gpo_row.pop('Date')
        output[date]['GPO'].append(gpo_row)

# Add any dates in CAP_cols.csv that weren't in gpo_full.csv file.
with open('CAP_cols.csv') as cap_file:
    cap_reader = csv.DictReader(cap_file, delimiter='\t')
    for cap_row in cap_reader:
        date = cap_row.pop('Date')
        if date not in output:  # Create entry for it with a null gpo row.
            gpo_row = {field: None for field in gpo_reader.fieldnames if field != 'Date'}
            output[date]['GPO'].append(gpo_row)  # Automatically adds cap_row.

output = list(output.values())  # Only want a list of the values.
#pprint(output, sort_dicts=0)  # Show what's going into JSON file.

with open('datamerge.json', 'w') as json_file:
    json.dump(output, json_file, indent=4, sort_keys=False)

print('-Fini-')

Results

下面是它放入datamerge.json文件的内容：

[
    {
        "Date": "January,1997",
        "GPO": [
            {
                "hearing_sub_type": "Oversight",
                "Old_ID": "22",
                "witnesses": "Babbitt"
            },
            {
                "hearing_sub_type": "General",
                "Old_ID": "23",
                "witnesses": "Alldridge"
            }
        ],
        "CAP": [
            {
                "majortopic": "21",
                "id": "79846",
                "Chamber": "1",
                "subtopic": "2103"
            },
            {
                "majortopic": "21",
                "id": "79850",
                "Chamber": "1",
                "subtopic": "401"
            }
        ]
    },
    {
        "Date": "June,2000",
        "GPO": [
            {
                "hearing_sub_type": "General",
                "Old_ID": "21",
                "witnesses": "Blitzer"
            },
            {
                "hearing_sub_type": "Oversight",
                "Old_ID": "24",
                "witnesses": "Bruce"
            }
        ],
        "CAP": [
            {
                "majortopic": "4",
                "id": "79847",
                "Chamber": "2",
                "subtopic": "705"
            },
            {
                "majortopic": "7",
                "id": "79849",
                "Chamber": "2",
                "subtopic": "201"
            }
        ]
    },
    {
        "Date": "April,2001",
        "GPO": [
            {
                "hearing_sub_type": "Oversight",
                "Old_ID": "25",
                "witnesses": "Michael"
            }
        ],
        "CAP": [
            {
                "majortopic": null,
                "id": null,
                "Chamber": null,
                "subtopic": null
            }
        ]
    },
    {
        "Date": "May,2001",
        "GPO": [
            {
                "hearing_sub_type": null,
                "Old_ID": null,
                "witnesses": null
            }
        ],
        "CAP": [
            {
                "majortopic": "13",
                "id": "79848",
                "Chamber": "1",
                "subtopic": "1802"
            }
        ]
    }
]

输入文件

Code:

Results

相关问题更多 >

编程相关推荐

热门问题

热门文章