我有190个CSV,每个都有相同的列名。下面是共享的csv示例:
从每个csv中,我只需要选择第一行的Item
,Predicted_BelRd(D2)
,Predicted_Ulsoor(D2)
,Predicted_ChrchStrt(D2)
,Predicted_BlrClub(D2)
,Predicted_Indrangr(D1)
,Predicted_Krmngl(D1)
,Predicted_KrmnglBkry(D1)
,Predicted_HSR(D1)
列只第一行,并需要将所有这些行存储到单独的csv中。所以最后的CSV应该是190行。在
为此我写了一个代码:
path = '/home/hp/products1'
all_files = glob.glob(path + "/*.csv")
#print(all_files)
columns = ['Item', 'Predicted_BelRd(D2)', 'Predicted_Ulsoor(D2)', 'Predicted_ChrchStrt(D2)', 'Predicted_BlrClub(D2)', 'Predicted_Indrangr(D1)', 'Predicted_Krmngl(D1)', 'Predicted_KrmnglBkry(D1)', 'Predicted_HSR(D1)']
#columns = []
#df.iloc[:, np.r_[1:10, 15, 17, 50:100]]
rows_list = []
for filename in all_files:
origin_data = pd.read_csv(filename)
my_data = origin_data[columns]
rows_list.append(my_data.head(1))
output = pd.DataFrame(rows_list)
#output.to_csv(file_name, sep='\t', encoding='utf-8')
output.to_csv('smallys_final.csv', encoding='utf-8', index=False)
它给出以下错误:
^{pr2}$其中一个数据帧的内容:
prod = pd.read_csv('/home/hp/products1/' + 'prod[' + str(0) + '].csv', engine='python')
print(prod)
输出:
Category Item UOM BelRd(D2) Ulsoor(D2) \
0 Food/Bakery BAKING POWDER SPARSH (1KGS) PKT 0 0
1 Food/Bakery BAKING POWDER SPARSH (1KGS) PKT 0 0
2 Food/Bakery BAKING POWDER SPARSH (1KGS) PKT 0 0
3 Food/Bakery BAKING POWDER SPARSH (1KGS) PKT 0 0
4 Food/Bakery BAKING POWDER SPARSH (1KGS) PKT 0 0
ChrchStrt(D2) BlrClub(D2) Indrangr(D1) Krmngl(D1) KrmnglBkry(D1) \
0 0 0 0 0 1
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 1
HSR(D1) date Predicted_BelRd(D2) Predicted_Ulsoor(D2) \
0 0 10 FEB 19 0.0 0.0
1 0 17 FEB 19 NaN NaN
2 0 24 FEB 19 NaN NaN
3 0 4 MARCH 19 NaN NaN
4 0 11 MARCH 19 NaN NaN
Predicted_ChrchStrt(D2) Predicted_BlrClub(D2) Predicted_Indrangr(D1) \
0 0.0 0.0 0.0
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
Predicted_Krmngl(D1) Predicted_KrmnglBkry(D1) Predicted_HSR(D1)
0 0.0 0.0 0.0
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
编辑:
prod = pd.read_csv('/home/hp/products1/' + 'prod[' + str(0) + '].csv', engine='python')
print(list(prod))
输出:
['Category', 'Item', 'UOM', 'BelRd(D2)', 'Ulsoor(D2)', 'ChrchStrt(D2)', 'BlrClub(D2)', 'Indrangr(D1)', 'Krmngl(D1)', 'KrmnglBkry(D1)', 'HSR(D1)', 'date', 'Predicted_BelRd(D2)', 'Predicted_Ulsoor(D2)', 'Predicted_ChrchStrt(D2)', 'Predicted_BlrClub(D2)', 'Predicted_Indrangr(D1)', 'Predicted_Krmngl(D1)', 'Predicted_KrmnglBkry(D1)', 'Predicted_HSR(D1)']
我认为您需要删除制表符空格并重新执行代码。它可能工作得很好。我认为由于不需要的制表符空格,dataframe无法区分实际值并输出“KeyError”。在
相关问题 更多 >
编程相关推荐