如何在csv文件的列中填充空值?

2024-05-20 17:59:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要读取一个csv文件,并根据此人的地址填写“电话和电子邮件”列中的空/空值,然后写入一个新的csv文件。例如:如果一个人“Jonas Kahnwald”没有电话号码或电子邮件地址,但与上面或下面的人有相同的地址,比如“Hannah Kahnwald”,那么我们应该用这些人的详细信息填充空/空值

我将无法使用python pandas,因为其余的代码/程序完全基于python 2.7(不幸的是),所以我只需要编写一个函数或逻辑来单独捕获这些信息

输入格式/表格如下所示,单元格为空(csv文件):

FirstName,LastName,Phone,Email,Address
Hannah,Kahnwald,1457871452,hannkahn@gmail.com,145han street
Micheal,Kahnwald,6231897383,,145han street
Jonas,Kahnwald,,,145han street
Mikkel,Nielsen,4509213887,mikneil@yahoo.com,887neil ave
Magnus,Nielsen,,magnusneil@kyle.co,887neil ave
Ulrich,Nielsen,,,887neil ave
katharina,Nielsen,,,887neil ave
Elisabeth,Doppler,5439001211,elsisop@amaz.com,211elis park
Peter,Doppler,,,211elis park
bartosz,Tiedmannn,6263172828,tiedman@skype.com,828alex street
Alexander,washington,,,321notsame street
claudia,Tiedamann,,,828alex street

输出格式应如下所示:

Hannah,Kahnwald,1457871452,hannkahn@gmail.com,145han street
Micheal,Kahnwald,6231897383,hannkahn@gmail.com,145han street
Jonas,Kahnwald,1457871452,hannkahn@gmail.com,145han street
Mikkel,Nielsen,4509213887,mikneil@yahoo.com,887neil ave
Magnus,Nielsen,4509213887,magnusneil@kyle.co,887neil ave
Ulrich,Nielsen,4509213887,mikneil@yahoo.com,887neil ave
katharina,Nielsen,4509213887,mikneil@yahoo.com,887neil ave
Elisabeth,Doppler,5439001211,elsisop@amaz.com,211elis park
Peter,Doppler,5439001212,elsisop@amaz.com,211elis park
bartosz,Tiedmannn,6263172828,tiedman@skype.com,828alex street
Alexander,washington,,,321notsame street
claudia,Tiedamann,6263172828,tiedman@skype.com,828alex street
import csv,os

def get_info(file path):
    data = []
    with open(file, 'rb') as fin:
        csv_reader =  csv.reader(fin)
        next(reader)
        for each in csv_reader:

            FirstName = each[0]
            LN = each[1]
            Phone =  "some function or logic"
            email = " some function or logic"
            Address = each[4]
            login = ""
            logout = ""

            data.append([FirstName,LN,Phone,email,Address,login,logout])

   f.close()
   return data

Tags: 文件csvcomstreetparkyahoogmailreader
1条回答
网友
1楼 · 发布于 2024-05-20 17:59:43

这里有一个显著更新的版本,它试图填充文件中其他条目中缺失的数据,但前提是它们具有相同的Address字段。为了加快搜索速度,它构建了一个名为attr_dict的内部使用字典,其中包含具有特定地址的所有记录。它还在内部使用namedtuple使代码更具可读性

请注意,在检索缺少的信息时,它将使用它在Address的内部字典中找到的第一个条目中的数据。此外,我不认为您提供的示例数据包含所有可能的情况,因此需要进行额外的测试

import csv
from collections import namedtuple


def get_info(file_path):

    # Read data from file and convert to list of namedtuples, also create address
    # dictionary to use to fill in missing information from others at same address.
    with open(file_path, 'rb') as fin:
        csv_reader =  csv.reader(fin, skipinitialspace=True)

        header = next(csv_reader)
        Record = namedtuple('Record', header)

        newheader = header + ['Login', 'Logout'] # Add names of new columns.
        NewRecord = namedtuple('NewRecord', newheader)

        addr_dict = {}
        data = [newheader]

        for rec in (Record._make(row) for row in csv_reader):
            if rec.Email or rec.Phone:  # Worth saving?
                addr_dict.setdefault(rec.Address, []).append(rec)  # Remember it.

            login, logout = "",  ""  # Values for new columns.
            data.append(NewRecord._make(rec + (login, logout)))

    # Try to fill in missing data from any other records with same Address.
    for i, row in enumerate(data[1:], 1):
        if not (row.Phone and row.Email):  # Info missing?
            # Try to copy it from others at same address.
            updated = False
            for other in addr_dict.get(row.Address, []):
                if not row.Phone and other.Phone:
                    row = row._replace(Phone=other.Phone)
                    updated = True
                if not row.Email and other.Email:
                    row = row._replace(Email=other.Email)
                    updated = True
                if row.Phone and row.Email:  # Info now filled in?
                    break

            if updated:
                data[i] = row

    return data


INPUT_FILE = 'null_cols.csv'
OUTPUT_FILE = 'fill_cols.csv'

data = get_info(INPUT_FILE)

with open(OUTPUT_FILE, 'wb') as fout:
    writer = csv.DictWriter(fout, data[0])  # First elem has column names.
    writer.writeheader()
    for row in data[1:]:
        writer.writerow(row._asdict())

print('Done')

Excel中结果的屏幕截图:

Screenshot of results

相关问题 更多 >