用两个函数制作字典(Python)

2024-10-04 05:29:26 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要从两个函数(ZoekAccesieCode+zoekorganime)中创建一个字典。函数ZoekAccesieCode返回类似于“Q6GZX2”的行和类似于“frogvirus3(isolategoorha)”的zoekorgisme。ZoekAccesieCode需要是关键,zoekorganime需要是价值。这是我的密码:

import re
file = open("ploop.txt")
text = file.read()
file.close()

def main():
    hits = VindHits()
    accesie = ZoekAccesieCode(hits)
    organisme = ZoekOrganisme(hits, accesie)
    MaakDict(accesie, organisme)

def VindHits():
    eiwitten = text.split("\n\n")[1:]
    eiwitHits = []

    for eiwit in eiwitten:
        if re.search(r"[AG].{4}GK[ST]", eiwit):
            eiwitHits.append(eiwit)
    return(eiwitHits)

def ZoekAccesieCode(hits):
    for eiwit in hits:
        accesieCode = re.findall(r">sp\|(.{6})", eiwit)[0]
    return accesieCode

def ZoekOrganisme(hits, accesie):
    for eiwit in hits:
        organisme = re.findall(r"\n.+?\[(.+?)\]", eiwit)[0]
    return organisme


def MaakDict(accesie, organisme):

main()

文件中的一些示例数据:

    Hits for PS00017|ATP_GTP_A (pattern) ATP/GTP-binding site motif A (P-loop) :  [occurs frequently]
   Pattern: [AG]-x(4)-G-K-[ST]
   Approximate number of expected random matches in ~ 100'000 sequences (50'000'000 residues): 3371


>sp|Q6GZX2|003R_FRG3G  (438 aa)
Uncharacterized protein 3R.  [Frog virus 3 (isolate Goorha) (FV-3)]
MARPLLGKTSSVRRRLESLSACSIFFFLRKFCQKMASLVFLNSPVYQMSNILLTERRQVDRAMGGSDDDGVMVVALSPSD
FKTVLGSALLAVERDMVHVVPKYLQTPGILHDMLVLLTPIFGEALSVDMSGATDVMVQQIATAGFVDVDPLHSSVSWKDN
VSCPVALLAVSNAVRTMMGQPCQVTLIIDVGTQNILRDLVNLPVEMSGDLQVMAYTKDPLGKVPAVGVSVFDSGSVQKGD
AHSVGAPDGLVSFHTHPVSSAVELNYHAGWPSNVDMSSLLTMKNLMHVVVAEEGLWTMARTLSMQRLTKVLTDAEKDVMR
AAAFNLFLPLNELRVMGTKDSNNKSLKTYFEVFETFTIGALMKHSGVTPTAFVDRRWLDNTIYHMGFIPWGRDMRFVVEY
DLDGTNPFLNTVPTLMSVKRKAKIQEMFDNMVSRMVTS
      2 - 9:          ArpllGKT


>sp|Q6GZX1|004R_FRG3G  (60 aa)
Uncharacterized protein 004R.  [Frog virus 3 (isolate Goorha) (FV-3)]
MNAKYDTDQGVGRMLFLGTIGLAVVVGGLMAYGYYYDGKTPSSGTSFHTASPSFSSRYRY
      33 - 40:        GyyydGKT


>sp|Q6GZW0|015R_FRG3G  (322 aa)
Uncharacterized protein 015R.  [Frog virus 3 (isolate Goorha) (FV-3)]
MEQVPIKEMRLSDLRPNNKSIDTDLGGTKLVVIGKPGSGKSTLIKALLDSKRHIIPCAVVISGSEEANGFYKGVVPDLFI
YHQFSPSIIDRIHRRQVKAKAEMGSKKSWLLVVIDDCMDNAKMFNDKEVRALFKNGRHWNVLVVIANQYVMDLTPDLRSS
VDGVFLFRENNVTYRDKTYANFASVVPKKLYPTVMETVCQNYRCMFIDNTKATDNWHDSVFWYKAPYSKSAVAPFGARSY
WKYACSKTGEEMPAVFDNVKILGDLLLKELPEAGEALVTYGGKDGPSDNEDGPSDDEDGPSDDEEGLSKDGVSEYYQSDL
DD
      34 - 41:        GkpgsGKS


>sp|P32234|128UP_DROME  (368 aa)
GTP-binding protein 128up.  [Drosophila melanogaster (Fruit fly)]
MSTILEKISAIESEMARTQKNKATSAHLGLLKAKLAKLRRELISPKGGGGGTGEAGFEVAKTGDARVGFVGFPSVGKSTL
LSNLAGVYSEVAAYEFTTLTTVPGCIKYKGAKIQLLDLPGIIEGAKDGKGRGRQVIAVARTCNLIFMVLDCLKPLGHKKL
LEHELEGFGIRLNKKPPNIYYKRKDKGGINLNSMVPQSELDTDLVKTILSEYKIHNADITLRYDATSDDLIDVIEGNRIY
IPCIYLLNKIDQISIEELDVIYKIPHCVPISAHHHWNFDDLLELMWEYLRLQRIYTKPKGQLPDYNSPVVLHNERTSIED
FCNKLHRSIAKEFKYALVWGSSVKHQPQKVGIEHVLNDEDVVQIVKKV
      71 - 78:        GfpsvGKS


>sp|P05080|194K_TRVSY  (1707 aa)
Replicase large subunit.  [Tobacco rattle virus (strain SYM)]
MANGNFKLSQLLNVDEMSAEQRSHFFDLMLTKPDCEIGQMMQRVVVDKVDDMIRERKTKDPVIVHEVLSQKEQNKLMEIY
PEFNIVFKDDKNMVHGFAAAERKLQALLLLDRVPALQEVDDIGGQWSFWVTRGEKRIHSCCPNLDIRDDQREISRQIFLT
AIGDQARSGKRQMSENELWMYDQFRKNIAAPNAVRCNNTYQGCTCRGFSDGKKKGAQYAIALHSLYDFKLKDLMATMVEK
KTKVVHAAMLFAPESMLVDEGPLPSVDGYYMKKNGKIYFGFEKDPSFSYIHDWEEYKKYLLGKPVSYQGNVFYFEPWQVR
GDTMLFSIYRIAGVPRRSLSSQEYYRRIYISRWENMVVVPIFDLVESTRELVKKDLFVEKQFMDKCLDYIARLSDQQLTI
SNVKSYLSSNNWVLFINGAAVKNKQSVDSRDLQLLAQTLLVKEQVARPVMRELREAILTETKPITSLTDVLGLISRKLWK
QFANKIAVGGFVGMVGTLIGFYPKKVLTWAKDTPNGPELCYENSHKTKVIVFLSVVYAIGGITLMRRDIRDGLVKKLCDM
FDIKRGAHVLDVENPCRYYEINDFFSSLYSASESGETVLPDLSEVKAKSDKLLQQKKEIADEFLSAKFSNYSGSSVRTSP
PSVVGSSRSGLGLLLEDSNVLTQARVGVSRKVDDEEIMEQFLSGLIDTEAEIDEVVSAFSAECERGETSGTKVLCKPLTP
PGFENVLPAVKPLVSKGKTVKRVDYFQVMGGERLPKRPVVSGDNSVDARREFLYYLDAERVAQNDEIMSLYRDYSRGVIR
TGGQNYPHGLGVWDVEMKNWCIRPVVTEHAYVFQPDKRMDDWSGYLEVAVWERGMLVNDFAVERMSDYVIVCDQTYLCNN
RLILDNLSALDLGPVNCSFELVDGVPGCGKSTMIVNSANPCVDVVLSTGRAATDDLIERFASKGFPCKLKRRVKTVDSFL
MHCVDGSLTGDVLHFDEALMAHAGMVYFCAQIAGAKRCICQGDQNQISFKPRVSQVDLRFSSLVGKFDIVTEKRETYRSP
ADVAAVLNKYYTGDVRTHNATANSMTVRKIVSKEQVSLKPGAQYITFLQSEKKELVNLLALRKVAAKVSTVHESQGETFK
DVVLVRTKPTDDSIARGREYLIVALSRHTQSLVYETVKEDDVSKEIRESAALTKAALARFFVTETVLXRFRSRFDVFRHH
EGPCAVPDSGTITDLEMWYDALFPGNSLRDSSLDGYLVATTDCNLRLDNVTIKSGNWKDKFAEKETFLKPVIRTAMPDKR
KTTQLESLLALQKRNQAAPDLQENVHATVLIEETMKKLKSVVYDVGKIRADPIVNRAQMERWWRNQSTAVQAKVVADVRE
LHEIDYSSYMYMIKSDVKPKTDLTPQFEYSALQTVVYHEKLINSLFGPIFKEINERKLDAMQPHFVFNTRMTSSDLNDRV
KFLNTEAAYDFVEIDMSKFDKSANRFHLQLQLEIYRLFGLDEWAAFLWEVSHTQTTVRDIQNGMMAHIWYQQKSGDADTY
NANSDRTLCALLSELPLEKAVMVTYGGDDSLIAFPRGTQFVDPCPKLATKWNFECKIFKYDVPMFCGKFLLKTSSCYEFV
PDPVKVLTKLGKKSIKDVQHLAEIYISLNDSNRALGNYMVVSKLSESVSDRYLYKGDSVHALCALWKHIKSFTALCTLFR
DENDKELNPAKVDWKKAQRAVSNFYDW
      904 - 911:      GvpgcGKS


>sp|P03589|1A_AMVLE  (1126 aa)
Replication protein 1a.  [Alfalfa mosaic virus (strain 425 / isolate Leiden)]
MNADAQSTDASLSMREPLSHASIQEMLRRVVEKQAADDTTAIGKVFSEAGRAYAQDALPSDKGEVLKISFSLDATQQNIL
RANFPGRRTVFSNSSSSSHCFAAAHRLLETDFVYRCFGNTVDSIIDLGGNFVSHMKVKRHNVHCCCPILDARDGARLTER
ILSLKSYVRKHPEIVGEADYCMDTFQKCSRRADYAFAIHSTSDLDVGELACSLDQKGVMKFICTMMVDADMLIHNEGEIP
NFNVRWEIDRKKDLIHFDFIDEPNLGYSHRFSLLKHYLTYNAVDLGHAAYRIERKQDFGGVMVIDLTYSLGFVPKMPHSN
GRSCAWYNRVKGQMVVHTVNEGYYHHSYQTAVRRKVLVDKKVLTRVTEVAFRQFRPNADAHSAIQSIATMLSSSTNHTII
GGVTLISGKPLSPDDYIPVATTIYYRVKKLYNAIPEMLSLLDKGERLSTDAVLKGSEGPMWYSGPTFLSALDKVNVPGDF
VAKALLSLPKRDLKSLFSRSATSHSERTPVRDESPIRCTDGVFYPIRMLLKCLGSDKFESVTITDPRSNTETTVDLYQSF
QKKIETVFSFILGKIDGPSPLISDPVYFQSLEDVYYAEWHQGNAIDASNYARTLLDDIRKQKEESLKAKAKEVEDAQKLN
RAILQVHAYLEAHPDGGKIEGLGLSSQFIAKIPELAIPTPKPLPEFEKNAETGEILRINPHSDAILEAIDYLKSTSANSI
ITLNKLGDHCQWTTKGLDVVWAGDDKRRAFIPKKNTWVGPTARSYPLAKYERAMSKDGYVTLRWDGEVLDANCVRSLSQY
EIVFVDQSCVFASAEAIIPSLEKALGLEAHFSVTIVDGVAGCGKTTNIKQIARSSGRDVDLILTSNRSSADELKETIDCS
PLTKLHYIRTCDSYLMSASAVKAQRLIFDECFLQHAGLVYAAATLAGCSEVIGFGDTEQIPFVSRNPSFVFRHHKLTGKV
ERKLITWRSPADATYCLEKYFYKNKKPVKTNSRVLRSIEVVPINSPVSVERNTNALYLCHTQAEKAVLKAQTHLKGCDNI
FTTHEAQGKTFDNVYFCRLTRTSTSLATGRDPINGPCNGLVALSRHKKTFKYFTIAHDSDDVIYNACRDAGNTDDSILAR
SYNHNF
      838 - 845:      GvagcGKT


>sp|Q9AT00|TGD3_ARATH  (345 aa)
Protein TRIGALACTOSYLDIACYLGLYCEROL 3, chloroplastic.  [Arabidopsis thaliana (Mouse-ear cress)]
MLSLSCSSSSSSLLPPSLHYHGSSSVQSIVVPRRSLISFRRKVSCCCIAPPQNLDNDATKFDSLTKSGGGMCKERGLEND
SDVLIECRDVYKSFGEKHILKGVSFKIRHGEAVGVIGPSGTGKSTILKIMAGLLAPDKGEVYIRGKKRAGLISDEEISGL
RIGLVFQSAALFDSLSVRENVGFLLYERSKMSENQISELVTQTLAAVGLKGVENRLPSELSGGMKKRVALARSLIFDTTK
EVIEPEVLLYDEPTAGLDPIASTVVEDLIRSVHMTDEDAVGKPGKIASYLVVTHQHSTIQRAVDRLLFLYEGKIVWQGMT
HEFTTSTNPIVQQFATGSLDGPIRY
      117 - 124:      GpsgtGKS

有人能帮我找到正确的密码吗?你知道吗


Tags: inrefordefspfileaahits
3条回答

这段代码的前提条件是accessie中的值可以是键(它们是字符串或unicode),accessie和organime是长度相同的列表,否则必须使用切片使它们具有相同的长度

dict(zip(accessie, organisme))
dictionary_name = {accesie:organisme}

你的代码几乎不可读。你知道吗

def make_dict(a, b):
    return {a:b}

相关问题 更多 >