是否可以根据单元格的前五个字符匹配数据帧的行?

2024-07-05 14:03:19 发布

您现在位置:Python中文网/ 问答频道 /正文

如果有人拥有此数据帧:

    0           1           2
0   RC100_1     RC100_1     RC100_1
1   RC101_1     RC101_1     RC101_1
2   RC101_3     RC102_1     RC102_1
3   RC102_1     RC103_3     RC103_3
4   RC102_3     RC104_1     RC104_1
5   RC103_3     RC109_2     RC109_2
6   RC104_1     RC111_1     RC111_1
7   RC109_2     RC114_2     RC114_2
8   RC111_1     RC115_2     RC115_2
9   RC114_1     RC116_1     RC116_1
10  RC115_4     RC116_2     RC116_2
11  RC116_1     RC117_2     RC117_2
12  RC117_4     RC117_4     RC117_4
13  RC117_4     RC118_2     RC118_2
14  RC118_1     RC119_2     RC119_2
15  RC119_4     RC120_2     RC120_2
16  RC120_4     RC121_2     RC121_2
17  RC121_4     RC122_2     RC122_2
18  RC122_4     RC125_2     RC125_2
19  RC125_2     RC126_3     RC126_3
20  RC126_3     RC129_2     RC129_2
21  RC129_4     RC12_24     RC12_24

是否可以将其转换为每行仅包含前五个字符匹配的内容,否则将对其进行排序?我的意思是不需要手动更改每个单元格。以下是我想将其转换为的内容

    0           1           2
0   RC100_1     RC100_1     RC100_1
1   RC101_1     RC101_1     RC101_1
2   RC101_3     NaN         NaN
3   RC102_1     RC102_1     RC102_1
4   RC102_3     NaN         NaN
5   RC103_3     RC103_3     RC103_3
6   RC104_1     RC104_1     RC104_1
7   RC109_2     RC109_2     RC109_2
8   RC111_1     RC111_1     RC111_1
9   RC114_1     RC114_2     RC114_2
10  RC115_4     RC115_2     RC115_2
11  RC116_1     RC116_1     RC116_1
12  NaN         RC116_2     RC116_2
13  RC117_4     RC117_2     RC117_2
14  RC117_4     RC117_4     RC117_4
15  RC118_1     RC118_2     RC118_2
16  RC119_4     RC119_2     RC119_2
17  RC120_4     RC120_2     RC120_2
18  RC121_4     RC121_2     RC121_2
19  RC122_4     RC122_2     RC122_2
20  RC125_2     RC125_2     RC125_2
21  RC126_3     RC126_3     RC126_3
22  RC129_4     RC129_2     RC129_2
23  NaN         RC12_24     RC12_24

Tags: rc117rc115rc109rc118rc100rc111rc116rc114
1条回答
网友
1楼 · 发布于 2024-07-05 14:03:19

只需列出所有文件名,并使用它对所有文件名进行索引和对齐。下面是您发布的数据的工作示例

import pandas as pd
import numpy as np

# in your case do something like names_idx = excel_names + csv_names + txt_names
names_idx = ['RC100_1', 'RC100_1', 'RC100_1',
             'RC101_1', 'RC101_1', 'RC101_1',
             'RC101_3', 'RC102_1', 'RC102_1',
             'RC102_1', 'RC103_3', 'RC103_3',
             'RC102_3', 'RC104_1', 'RC104_1',
             'RC103_3', 'RC109_2', 'RC109_2',
             'RC104_1', 'RC111_1', 'RC111_1',
             'RC109_2', 'RC114_2', 'RC114_2',
             'RC111_1', 'RC115_2', 'RC115_2',
             'RC114_1', 'RC116_1', 'RC116_1',
             'RC115_4', 'RC116_2', 'RC116_2',
             'RC116_1', 'RC117_2', 'RC117_2',
             'RC117_4', 'RC117_4', 'RC117_4',
             'RC117_4', 'RC118_2', 'RC118_2',
             'RC118_1', 'RC119_2', 'RC119_2',
             'RC119_4', 'RC120_2', 'RC120_2',
             'RC120_4', 'RC121_2', 'RC121_2',
             'RC121_4', 'RC122_2', 'RC122_2',
             'RC122_4', 'RC125_2', 'RC125_2',
             'RC125_2', 'RC126_3', 'RC126_3',
             'RC126_3', 'RC129_2', 'RC129_2',
             'RC129_4', 'RC12_24', 'RC12_24']

# This is not needed if you already have them separately from globbing earlier
csv_names = names_idx[::3]
excel_names = names_idx[1::3]
txt_names = names_idx[2::3]

# remove duplicates
names_idx = set(names_idx)

# create an empty dataframe with index as unique file names
df = pd.DataFrame(index=names_idx)

# create empty columns and position the file names 
df['csv'] = np.nan
df.csv[csv_names] = csv_names

df['excel'] = np.nan
df.excel[excel_names] = excel_names

df['txt'] = np.nan
df.txt[txt_names] = txt_names

print(df)

输出

             csv    excel      txt
RC111_1  RC111_1  RC111_1  RC111_1
RC125_2  RC125_2  RC125_2  RC125_2
RC114_2      NaN  RC114_2  RC114_2
RC116_2      NaN  RC116_2  RC116_2
RC129_4  RC129_4      NaN      NaN
RC118_1  RC118_1      NaN      NaN
RC12_24      NaN  RC12_24  RC12_24
RC121_4  RC121_4      NaN      NaN
RC126_3  RC126_3  RC126_3  RC126_3
RC129_2      NaN  RC129_2  RC129_2
RC102_1  RC102_1  RC102_1  RC102_1
RC101_1  RC101_1  RC101_1  RC101_1
RC109_2  RC109_2  RC109_2  RC109_2
RC119_4  RC119_4      NaN      NaN
RC100_1  RC100_1  RC100_1  RC100_1
RC120_2      NaN  RC120_2  RC120_2
RC122_2      NaN  RC122_2  RC122_2
RC121_2      NaN  RC121_2  RC121_2
RC117_4  RC117_4  RC117_4  RC117_4
RC118_2      NaN  RC118_2  RC118_2
RC103_3  RC103_3  RC103_3  RC103_3
RC117_2      NaN  RC117_2  RC117_2
RC102_3  RC102_3      NaN      NaN
RC119_2      NaN  RC119_2  RC119_2
RC114_1  RC114_1      NaN      NaN
RC116_1  RC116_1  RC116_1  RC116_1
RC101_3  RC101_3      NaN      NaN
RC115_2      NaN  RC115_2  RC115_2
RC122_4  RC122_4      NaN      NaN
RC104_1  RC104_1  RC104_1  RC104_1
RC115_4  RC115_4      NaN      NaN
RC120_4  RC120_4      NaN      NaN

相关问题 更多 >