基于表中特定列中的行的新列

2024-10-02 16:22:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我有像图片上这样的数据。 我需要获得类型为“secstr”的序列,将其填充到列序列旁边的新列,该列具有相同的PDB_ID编号和链。最后,我想删除带有“secstr”序列的行

到目前为止,我有这样的想法:

["Secstr"] = sequences.Sequence[
    (sequences['PDB_ID'] == sequences['PDB_ID']) & 
    (sequences['Chain'] == sequences['Chain']) & 
    (sequences['Type'] == 'secstr')]

Image with table

我需要的数据如下所示:

    PDB_ID  Chain          Sequence                  Secstr
0   101M     A       MVLSEGEWQLVLHVWAKVEA       HHHH  HHHHGGHH HHHH
1   102L     A       MVLSEGEWQLVLHVWAKVEA    HHHH  HHHHHHHGGHH   HH
2   102M     A       MVLSEGEWQLVLHVWAKVEA    HHHHHHHHHGGHH HHH     
3   103L     A       MVLSEGEWQLVLHVWAKVEA       HHHHH HHHHHH HHGGH 
4   103L     B       MVLSEGEWQLVLHVWAKVEA       HHHHH HHHHHH HHHHH 

Tags: 数据id类型chain图片序列pdbsequence
2条回答

将原始DF和“secstr”提取的DF组合起来,以删除不必要的列。这是否符合问题的意图

# Splitting the DF by 'Type'
df2 = df[df['Type'] == 'secstr']
df2.set_index(['PDB_ID','Chain'], inplace=True)
# Extract and divide 'Type' except 'secstr' ('sequence' extraction)
df = df[~(df['Type'] == 'secstr')]
df.set_index(['PDB_ID','Chain'], inplace=True)

# Combining DF and DF2 (in the column direction)
new_df = pd.concat([df,df2], axis=1)
new_df.reset_index(inplace=True)

# Renaming a column
new_cols = ['PDB_ID', 'Chain', 'Type', 'Sequence', 'Type1', 'Secstr']
new_df.columns = new_cols

# Deleting unnecessary columns
new_df.drop(columns=new_df.columns[[2,4]], inplace=True)

    new_df
    PDB_ID  Chain   Sequence    Secstr
0   101M    A   HJGDSDDLKEIEWUSKDSK OLKDSJDJFYEUKIBK
1   102L    A   HJGDSDDLKEIEWUSKDSK OLKDSJDJFYEUKIBK
2   102M    A   HJGDSDDLKEIEWUSKDSK OLKDSJDJFYEUKIBK
3   103L    A   HJGDSDDLKEIEWUSKDSK OLKDSJDJFYEUKIBK
4   103M    A   HJGDSDDLKEIEWUSKDSK OLKDSJDJFYEUKIBK
    PDB_ID  Chain          Sequence                  Secstr
0   101M     A       MVLSEGEWQLVLHVWAKVEA       HHHH  HHHHGGHH HHHH
1   102L     A       MVLSEGEWQLVLHVWAKVEA    HHHH  HHHHHHHGGHH   HH
2   102M     A       MVLSEGEWQLVLHVWAKVEA    HHHHHHHHHGGHH HHH     
3   103L     A       MVLSEGEWQLVLHVWAKVEA       HHHHH HHHHHH HHGGH 
4   103L     B       MVLSEGEWQLVLHVWAKVEA       HHHHH HHHHHH HHHHH 

我需要这样的数据

相关问题 更多 >