我有一个文本文件,其结构如下:
>hsa:9934 K04299 purinergic receptor P2Y, G protein-coupled
MINSTSTQPPDESCSQNLLITQQIIPVLYCMVFIAGILLNGVSGWIFFYVPSSKSFIIYL
KNIVIADFVMSLTFPFKILGDSGLGPWQLNVFVCRVSAVLFYVNMYVSIVFFGLISFDRY
>hsa:9934 K04299 purinergic receptor P2Y, G protein-coupled
MINSTSTQPPDESCSQNLLITQQIIPVLYCMVFIAGILLNGVSGWIFFYVPSSKSFIIYL
KNIVIADFVMSLTFPFKILGDSGLGPWQLNVFVCRVSAVLFYVNMYVSIVFFGLISFDRY
我需要按以下表格结构加载和转换此文件:
--------------------------------------------------------------
|>hsa:9934 K04299 purinergic receptor P2Y, G protein-coupled |
|MINSTSTQPPDESCSQNLLITQQIIPVLYCMVFIAGILLNGVSGWIFFYVPSSKSFIIYL|
|KNIVIADFVMSLTFPFKILGDSGLGPWQLNVFVCRVSAVLFYVNMYVSIVFFGLISFDRY|
--------------------------------------------------------------
|>hsa:9934 K04299 purinergic receptor P2Y, G protein-coupled |
|MINSTSTQPPDESCSQNLLITQQIIPVLYCMVFIAGILLNGVSGWIFFYVPSSKSFIIYL|
|KNIVIADFVMSLTFPFKILGDSGLGPWQLNVFVCRVSAVLFYVNMYVSIVFFGLISFDRY|
--------------------------------------------------------------
我试过以下代码:
dataset = pd.read_csv(path, sep = ">")
但它没有像我预期的那样工作
我怎样才能得到准确的格式
可以使用str.split(“>;”)所以每个值都有一个数组。 除非'>;'可能出现在哈希表中
相关问题 更多 >
编程相关推荐