根据一些标准删除表中的行

Cluster1 NP_075076 Cluster1 AMN16433 Cluster1 YP_063711 Cluster1 KQ976470.1:66008-66163(-):Cattus_sylvestris Cluster1 AJP07295 Cluster1 AMN15329 Cluster2 YP_00999 Cluster2 YP_00989 Cluster2 YP_00971 Cluster2 YP_00988 Cluster2 AJP07295 Cluster3 KI976478.1:66021-66123(-):Canis_lupus Cluster3 AJP07232 Cluster3 AJP07212 Cluster3 AZ976430.1:66045-66190(+):Cavia_porsellus Cluster4 AHHYUIIY Cluster5 AZ976490:66042-66190(-):Felis_porsellus Cluster5 AA976490:66021-66130(+):Felis_porsellus

Cluster1 NP_075076 Cluster1 AMN16433 Cluster1 YP_063711 Cluster1 KQ976470.1:66008-66163(-):Cattus_sylvestris Cluster1 AJP07295 Cluster1 AMN15329 Cluster3 KI976478.1:66021-66123(-):Canis_lupus Cluster3 AJP07232 Cluster3 AJP07212 Cluster3 AZ976430.1:66045-66190(+):Cavia_porsellus

2条回答

网友

1楼 · 编辑于 2024-10-02 00:24:28

根据您的描述，让我们命名两列name和value。所以您需要找到值包含+和-符号的名称列表。然后找到值不包含这些符号的名称列表。然后找到这两个列表的交叉点，例如，找到上面两个列表中出现的姓名的最终列表。然后您需要过滤名称出现在最终列表中的原始数据帧

import pandas as pd
import io

data = """Cluster1    NP_075076
Cluster1    AMN16433
Cluster1    YP_063711
Cluster1    KQ976470.1:66008-66163(-):Cattus_sylvestris
Cluster1    AJP07295
Cluster1    AMN15329
Cluster2    YP_00999
Cluster2    YP_00989
Cluster2    YP_00971
Cluster2    YP_00988
Cluster2    AJP07295
Cluster3    KI976478.1:66021-66123(-):Canis_lupus
Cluster3    AJP07232
Cluster3    AJP07212
Cluster3    AZ976430.1:66045-66190(+):Cavia_porsellus
Cluster4    AHHYUIIY
Cluster5    AZ976490:66042-66190(-):Felis_porsellus
Cluster5    AA976490:66021-66130(+):Felis_porsellus"""

df = pd.read_csv(io.StringIO(data), sep="\s+", header=None)

df.columns = ["name", "value"]

list1 = df.loc[df.value.str.contains("[+-]")].name.unique()
list2 = df.loc[~df.value.str.contains("[+-]")].name.unique()


final_list = set(list1).intersection(set(list2))

>>> df.loc[df.name.isin(final_list)]
        name                                        value
0   Cluster1                                    NP_075076
1   Cluster1                                     AMN16433
2   Cluster1                                    YP_063711
3   Cluster1  KQ976470.1:66008-66163(-):Cattus_sylvestris
4   Cluster1                                     AJP07295
5   Cluster1                                     AMN15329
11  Cluster3        KI976478.1:66021-66123(-):Canis_lupus
12  Cluster3                                     AJP07232
13  Cluster3                                     AJP07212
14  Cluster3    AZ976430.1:66045-66190(+):Cavia_porsellus

网友

2楼 · 编辑于 2024-10-02 00:24:28

您还可以使用regex在线获取它，如下所示

df[df['Cluster'].isin(set(df[df['Name'].str.contains('\+|-')]['Cluster'].unique()).intersection(set(df[~df['Name'].str.contains('\+|-')]['Cluster'].unique())))]

结果是

    Cluster     Name
0   Cluster1    NP_075076
1   Cluster1    AMN16433
2   Cluster1    YP_063711
3   Cluster1    KQ976470.1:66008-66163(-):Cattus_sylvestris
4   Cluster1    AJP07295
5   Cluster1    AMN15329
11  Cluster3    KI976478.1:66021-66123(-):Canis_lupus
12  Cluster3    AJP07232
13  Cluster3    AJP07212
14  Cluster3    AZ976430.1:66045-66190(+):Cavia_porsellus

相关问题更多 >

编程相关推荐

热门问题

热门文章