合并多个数据帧,只保留一组列名

2024-10-02 08:26:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用的是一个包,对于列表中的每个元素,在文件中打印以下行:

Entry   Entry name  Status  Protein names   Gene names  Organism
A0A20CSC4   A0A20CSC4_1PHYC unreviewed  Uncharacterized protein OlL7_200    Ostreococcus lucimarinus virus 7

Entry   Entry name  Status  Protein names   Gene names  Organism
A0A0P0DZ8   A0A0PCDZ8_9PLYC unreviewed  Uncharacterized protein OlL7_159    Ostreococcus lucimarinus virus 7

Entry   Entry name  Status  Protein names   Gene names  Organism
A0A1P0BY71  A0A1P0BY71_9PHYC    unreviewed  Uncharacterized protein OlL7_111c   Ostreococcus lucimarinus virus 7

。。。x1000

因此,如果我用pandas打开此文件,会得到一个数据帧,如:

>>> blast
        Entry        Entry name      Status            Protein names  Gene names
0   A0A20CSC4   A0A20CSC4_1PHYC  unreviewed  Uncharacterized protein    OlL7_200
1         NaN               NaN         NaN                      NaN         NaN
2   A0A0P0DZ8   A0A0PCDZ8_9PLYC  unreviewed  Uncharacterized protein    OlL7_159
3         NaN               NaN         NaN                      NaN         NaN
4       Entry        Entry name      Status            Protein names  Gene names
5  A0A1P0BY71  A0A1P0BY71_9PHYC  unreviewed  Uncharacterized protein   OlL7_111c

我只想用colname创建一个数据帧:

Entry   Entry name  Status  Protein names   Gene names  Organism
A0A20CSC4   A0A20CSC4_1PHYC unreviewed  Uncharacterized protein OlL7_200    Ostreococcus lucimarinus virus 7
A0A0P0DZ8   A0A0PCDZ8_9PLYC unreviewed  Uncharacterized protein OlL7_159    Ostreococcus lucimarinus virus 7
A0A1P0BY71  A0A1P0BY71_9PHYC    unreviewed  Uncharacterized protein OlL7_111c   Ostreococcus lucimarinus virus 7

你知道在Python3中使用熊猫的方法吗

更新的数据框:

        Entry        Entry name      Status            Protein names  Gene names
0   A0A20CSC4   A0A20CSC4_1PHYC  unreviewed  Uncharacterized protein    OlL7_200
2   A0A0P0DZ8   A0A0PCDZ8_9PLYC  unreviewed  Uncharacterized protein    OlL7_159
4       Entry        Entry name      Status            Protein names  Gene names
5  A0A1P0BY71  A0A1P0BY71_9PHYC  unreviewed  Uncharacterized protein   OlL7_111c

第4行仍然具有行名称


Tags: namenamesstatusnanentrygeneproteinuncharacterized
1条回答
网友
1楼 · 发布于 2024-10-02 08:26:29

因此,获得这种类型输出的一种方法是删除NaN值

所以你可以, blast.dropna(inplace=True)

blast.drop(blast[blast['Entry'] == 'Entry'].index, inplace=True)

这应该行得通

相关问题 更多 >

    热门问题