使用python修改csv文件中的某些行

2024-09-26 18:16:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个csv文件,如下所示:

name,row,column,length_of_field
AB000M,8,12,1
AB000M,9,12,1
AB000M,10,0,80
AB000M,10,12,1
AB000M,11,1,1
AB000M,21,0,80
AB000M,22,0,80

我试图做的是,每当列字段为0时,将字段长度添加到列中,行应该减1。另外,如果后续行中的列再次为0,则将第一个字段长度添加到列中,将第一次出现的行减少1,并将字段长度一起添加。然后删除旧的行

因此,在上述csv中,“AB000M,10,0,80”将变为“AB000M,9,80,80”,并且 “AB000M,21,0,80
AB000M,22,0,80”--这两条线应替换为“AB000M,20,80160”

我试图通过这段代码实现这一点,但它不起作用:

df = pd.read_csv("file.csv")
for ind in df.index:
    if ind >= len(df)-1:
        break
    if df['column'][ind] == 0 and df['column'][ind + 1] != 0:
        df['row'][ind] -=  1
        df['column'][ind] = 80
    elif df['column'][ind] == 0 and df['column'][ind + 1] == 0:
        df['row'][ind] -=  1
        df['column'][ind] = 80
        df['length_of_field'][ind] += df['length_of_field'][ind + 1]
        df.drop([df.index[ind + 1]], axis=0)

Tags: and文件ofcsvnamefielddfindex
2条回答

这是一个可能对你有用的例子

import pandas as pd

df = pd.read_csv('test.csv')
newRows = []
last_val_is_zero = False
tempRow = None
for row in df.iterrows():
    vals = row[1]
    if vals['column'] == 0:
        if not last_val_is_zero:
            vals['row'] = vals['row'] - 1
            vals['column'] = vals['length_of_field']
            tempRow = vals
            last_val_is_zero = True
        else:
            tempRow['length_of_field'] = tempRow['length_of_field'] + vals['length_of_field']
    else:
        if tempRow is not None:
            newRows.append(tempRow)
        newRows.append(vals)
        tempRow = None
        last_val_is_zero = False

if tempRow is not None:
    newRows.append(tempRow)
    
newData = [[val for val in row] for row in newRows]
newDf = pd.DataFrame(newData, columns=[x for x in newRows[0].keys()])

样本数据:

df_str = '''
name,row,column,length_of_field
AB000M,8,12,1
AB000M,9,12,1
AB000M,10,0,80
AB000M,10,12,1
AB000M,11,1,1
AB000M,21,0,80
AB000M,22,0,80
AB000M,23,11,1
AB000M,24,11,1
AB000M,25,0,80
AB000M,26,0,80
AB000M,27,0,80
AB000M,28,11,1
AB000M,29,0,80
'''
df = pd.read_csv(io.StringIO(df_str.strip()), sep=',', index_col=False)

解决方案:

# split the row which to update or left
cond = df['column'] == 0
df_to_update = df[cond].copy()
df_left = df[~cond].copy()

# modify the update rows
df_to_update['column'] = df_to_update['length_of_field']
df_to_update['row'] -= 1

# create tag for which is diff 1 with the previous row
cond = df_to_update['row'].diff() != 1
df_to_update['tag'] = np.where(cond, 1, 0)

# cumsum tag to creat group
df_to_update['label'] = df_to_update['tag'].cumsum()

print(df_to_update)

#       name  row  column  length_of_field  tag  label
# 2   AB000M    9      80               80    1      1
# 5   AB000M   20      80               80    1      2
# 6   AB000M   21      80               80    0      2
# 9   AB000M   24      80               80    1      3
# 10  AB000M   25      80               80    0      3
# 11  AB000M   26      80               80    0      3
# 13  AB000M   28      80               80    1      4


# agg groupy left first row, and sum(length_of_field)
obj_list = []
for tag, group in df_to_update.groupby('label'):
    obj = group.iloc[0].copy()
    obj['length_of_field'] = group['length_of_field'].sum()
    obj_list.append(obj)
dfn_to_update = pd.concat(obj_list,axis=1).T[df.columns]    

# merge final result
dfn = df_left.append(dfn_to_update).sort_index()

结果:

print(dfn)

      name row column length_of_field
0   AB000M   8     12               1
1   AB000M   9     12               1
2   AB000M   9     80              80
3   AB000M  10     12               1
4   AB000M  11      1               1
5   AB000M  20     80             160
7   AB000M  23     11               1
8   AB000M  24     11               1
9   AB000M  24     80             240
12  AB000M  28     11               1
13  AB000M  28     80              80

print(df)

      name  row  column  length_of_field
0   AB000M    8      12                1
1   AB000M    9      12                1
2   AB000M   10       0               80
3   AB000M   10      12                1
4   AB000M   11       1                1
5   AB000M   21       0               80
6   AB000M   22       0               80
7   AB000M   23      11                1
8   AB000M   24      11                1
9   AB000M   25       0               80
10  AB000M   26       0               80
11  AB000M   27       0               80
12  AB000M   28      11                1
13  AB000M   29       0               80

相关问题 更多 >

    热门问题