用python读取数据集的特定列

redfile = open(file_path,'r') import csv reader=csv.reader(redfile) names=next(reader) for elem in names: if elem.startswith("W")==True: names.remove(elem) for elem in names: if elem.startswith("P")==True: names.remove(elem) for elem in names: if elem.startswith("X")==True: names.remove(elem) names.remove("SCH_ID") names.remove("STRAT_ID") names.remove("STU_ID") nameind = [] line0 = '' wfile = open('reduced.csv','w') for i, line in enumerate(redfile): redarray = [x for x in line.split(",")] line1 = '' if i == 0: for ii in range(0,len(redarray)): if redarray[ii] in names: nameind.append(ii) line0 = line0+redarray[ii]+',' line0 = line0[:-1] print(line0) wfile.write(line0) wfile.write('\n') nameindarray = np.array(nameind) elif i < 25000: for ii in nameind: line1 = line1+redarray[ii]+',' line1 = line1[:-1] wfile.write(line1) wfile.write('\n') else: break redfile.close() wfile.close() print(i)

1条回答

网友

1楼 · 发布于 2024-09-30 00:31:47

我想，您只需要将文件file_path的内容复制到reduced.csv，删除所有列，这些列以X、P、W中的一个字符开头，而不包括SCH_ID、STRAT_ID、STU_ID列。你知道吗

如果是这样的话，你可以这样对待熊猫：

import pandas as pd

# read the first row only to get the column names
df= pd.read_csv(file_path, sep=',', dtype='str', nrows=1)
use_cols= [col for col in df.columns if col[:1] not in 'XPW' and col not in ['SCH_ID', 'STRAT_ID', 'STU_ID']]

df= pd.read_csv(file_path, sep=',', dtype='str', usecols=use_cols)
df.to_csv('reduced.csv', index=False, sep=',')

请认为这是伪代码，因为我没有可能测试它没有数据，但我很有信心它的工作。如果发现引号不是您喜欢的那样，您可以尝试将quotechar关键字添加到^{}和^{}。你知道吗

顺便说一句，如果您想简化代码并使用with来确保文件在任何情况下都是关闭的，您可以重写上一个while循环，如下所示：

with open('reduced.csv','w') as wfile:
    for i, line in enumerate(redfile):
        redarray = list(line.split(','))
        line1 = ''
        if i == 0:
            for ii, token in enumerate(redarray):
                if token in names:
                    nameind.append(ii)
                    line0= line0 + token + ','
            line0 = line0[:-1]
            print(line0)
            wfile.write(line0)
            wfile.write('\n')
            nameindarray = np.array(nameind)
    elif i < 25000:
        line1= ','.join([redarray[i] for i in nameind])
        wfile.write(line1)
        wfile.write('\n')
    else:
        break

如果您想切换到第二个方案，您可能还需要在with子句中打开输入文件。如果使用with，则不需要显式地关闭文件。当with块终止时，这将自动为您完成。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章