如何在python中只使用部分文件?

2024-04-27 21:49:26 发布

您现在位置:Python中文网/ 问答频道 /正文

因此,我一直试图使用条件只打印文件的一部分,但由于某种原因,当我在ipython中运行代码时,它只是不断地运行,从不停止。你知道吗

我运行它的文件是:

Use the -noinfo option to turn off this help.
Use the -help option to get a list of command line options.

pilercr v1.06
By Robert C. Edgar

Temp1.None.fasta: 523 putative CRISPR arrays found.



DETAIL REPORT



Array 1
>contig-856000000 902 nucleotides

       Pos  Repeat     %id  Spacer  Left flank    Repeat                                      Spacer
==========  ======  ======  ======  ==========    ========================================    ======
        28      40    95.0      26  TGCTTCCCCG    -.....................................T.    CTTGGTCTTGCTGGTTCTCACCGACT
        94      40    95.0      25  CTCACCGACT    .T....................................C.    GTCAGCGTGTAGCGACTGTATCTGG
       159      40   100.0          CTGTATCTGG    ........................................    TTGCTCGAA
==========  ======  ======  ======  ==========    ========================================
         3      40              25                TAGTTGTGAATAGCTGACAAAATCATATCATATACAACAG


Array 2
>contig-2277000000 590 nucleotides

       Pos  Repeat     %id  Spacer  Left flank    Repeat                                   Spacer
==========  ======  ======  ======  ==========    =====================================    ======
        19      37   100.0      37  GAGGGTGAGG    .....................................    ACTTTAGGTTCAAATCCGTAGAGCTGATCTGTAATAG
        93      37   100.0      37  TCTGTAATAG    .....................................    ATTCCGTTGTTGAAATAAAGTATGAATAATATTTGGT
       167      37   100.0      35  AATATTTGGT    .....................................    TTCTCGAACGTTCCATGCTTCATAATATACCTCCT
       239      37   100.0      39  TATACCTCCT    .....................................    CTGATGAATCTTACCTCGTACAGTGATGTAGCCAGGTAA
       315      37   100.0          AGCCAGGTAA    .....................................    CGTCAGTCATG
==========  ======  ======  ======  ==========    =====================================
         5      37              37                GTAGAAATGAGACGTCCGCTGTAAAGGACATTGATAC


Array 3
>contig-2766000000 540 nucleotides

       Pos  Repeat     %id  Spacer  Left flank    Repeat                                   Spacer
==========  ======  ======  ======  ==========    =====================================    ======
       172      37   100.0      29  GTTTTAGATG    .....................................    TATCGTAGCATCCCACTCCCCTGGTGTAA
       238      37   100.0      29  CCTGGTGTAA    .....................................    GTTGGACGCGCTGCTGGACGATAGGCTGC
       304      37    97.3      29  GATAGGCTGC    T....................................    ACGCCTTACAAGCTGACCCGCGCCCAATT
       370      37   100.0          GCGCCCAATT    .....................................    GTACCTTGTTC
==========  ======  ======  ======  ==========    =====================================
         4      37              29                GGCTGTAAAAAGCCACCAAAATGATGGTAATTACAAG


SUMMARY BY SIMILARITY



Array          Sequence    Position      Length  # Copies  Repeat  Spacer  +  Consensus
=====  ================  ==========  ==========  ========  ======  ======  =  =========
    5  contig-504300000          18         364         6      33      33  +  --------------------------GTCGCT-C---CCCGCATGGGGAGCG--T-GGATTGAAAT-----
    8  contig-974700000          15         229         4      32      33  -  --------------------------GTCGCC-C---CCCATGCG-GGGGCG--T-GGATTGAAAC-----
   12  contig-759000001         464         503         8      33      34  +  --------------------------GTCGCT-C---CCTTTACGGGGAGCG--T-GGATTGAAAT-----
   16  contig-293000000          77         406         6      37      36  -  -----------------------GTAGAAATGAG---TTCCCCGATGAGAAG--G-GGATTGACAC-----
   17  contig-457600000          28         416         6      37      38  -  -----------------------GTAGAAATGGG---TGTCCCGATAGATAG--G-GGATTGACAC-----
   18  contig-527300000           1         351         6      33      32  +  -----------------------ATCGCG----C---CCCCACGGGGGCGTG--T-GAATTGAAAC-----
   27  contig-132220000          21         234         4      33      34  +  --------------------------GTCGCT-C---CCTTCACGGGGAGCG--T-GGATTGAAAT-----
   36  contig-602400000          35         304         5      33      34  -  --------------------------GTCGCC-C---CCCACGTGGGGGGCG--T-GGATTGAAAC-----
   38  contig-124860000         131         232         4      32      34  +  --------------------------GTCGCA-C---CCCTCGC-GGGTGCG--T-GGATTGAAAC-----
   54  contig-979400000         138         231         4      32      34  -  --------------------------GTCGCC-C---CTCTTGCA-GGGGCG--T-GGATTGAAAC-----
   61  contig-992000005         149         693        11      30      36  -  --------------------GTTAAAATCA--GA---CC---ATTTTG--------GGATTGAAAT-----
   68  contig-103110000          37         238         4      34      34  +  -----------------------GTCGTC----C---CCCACACGGGGGACG--T-GGATTGAAATA----
   73  contig-372900000        1627        1013        16      30      35  +  ----------------------------ATTAGAATCGTACTT--ATGTAGAATTGAAAT-----------

到目前为止我的代码是:

fname = 'crispr_pilrcr_1.out'
start=False
end=False
counter = 0
for line in open(fname, 'r'): # Open up the file
    s = line.split() # Split each line into words
    if not s: continue # Remove empty lines which would otherwise cause errors
    if '==' in s[0]: continue # Removes seperation lines which consist of long '=======' strings 
    try:
        if s[0] == 'DETAIL': # Only start in the section which starts with 'DETAIL'
            start=True
            print 'Starting'
        if s[0] == 'SUMMARY': # Only end once this section has ended
            end=True
            print 'Ending'
        while start==True or end==False: # Whilst in the section of the PILER-CR output which provides spacer sequences 
            try:
                int(s[0])
                print s[7]
            except ValueError:
                continue
    except ValueError:
        continue

我认为while循环可能有问题,但是当我使用and而不是or时,同样的连续运行发生了。你知道吗

正如我所说的,我想在“详细报告”和“相似性摘要”之间选择文件的一部分,因此我设置了条件,以便在找到它们之后进行尝试。你知道吗

你们能提供的任何帮助都会很好。你知道吗

谢谢你, 汤姆


Tags: oftheinwhichiflinearraystart
2条回答

考虑一下

fname = 'crispr_pilrcr_1.out'
counter = 0
printing = False
for line in open(fname, 'r'): # Open up the file
    s = line.split() # Split each line into words
    if not s: continue # Remove empty lines which would otherwise cause errors
    if '==' in s[0]: continue # Removes seperation lines which consist of long '=======' strings 
    try:
        if s[0] == 'DETAIL': # Only start in the section which starts with 'DETAIL'
            printing = True
            print 'Starting'
        elif s[0] == 'SUMMARY': # Only end once this section has ended
            printing = False
            print 'Ending'
        elif printing:
            try:
                # Anything you put here will only be called for the lines
                #   between DETAIL... and SUMMARY...
            except ValueError:
                continue
    except ValueError:
        continue

基本上,您使用的是一个变量printing,它被初始化为False,当for循环遇到“DETAIL…”时设置为True,当for循环遇到“SUMMARY…”时重置为False。你知道吗

对于与“DETAIL…”或“SUMMARY…”不匹配的行,如果printing为真(即对于两个标题之间的行),将执行try块。你知道吗

问题是,您永远不会更改while循环中startend的值。因此,无论它们有什么值允许您进入循环,在每次迭代中都是相同的。你知道吗

如果不彻底改变你的逻辑,我猜你可能会想做如下事情:

while start or not end:
    try:
        int(s[0])
        print s[7]
    except ValueError:
        end = True
        start = False

相关问题 更多 >