使用enumerate在python中打印前一行

3条回答

网友

1楼 · 编辑于 2024-10-06 13:33:22

下面是一个可能的实现parse_file包含以下变量：

this_info：包含与当前行相关信息的字典
previous_info：this_info来自上一次迭代
start_info：this_info来自新操纵子ID开头的最近一行

所需的输出并不完全清楚，但调整主程序（在末尾）以以您选择的任何形式写入提取的字段

def parse_file(input_file):
    """
    reads an opr file, returns a list of dictionaries with info about the operon ids
    """
    results = []
    start_info = previous_info = {}
    with open(input_file) as f:
        next(f)  # ignore first line
        for line in f:
            bits = line.split()

            # dictionary containing information extracted from a
            # particular line
            this_info = {'operon_id': int(bits[0]),
                         'start': int(bits[3]),
                         'end': int(bits[4]),
                         'strand': bits[5]}

            if not previous_info:
                # first line of file
                start_info = this_info

            elif previous_info['operon_id'] != this_info['operon_id']:
                # this is the first line with NEW Operon ID,
                # so add result for previous Operon ID,  
                # of which the end line was the PREVIOUS line
                _add_result(results, start_info, previous_info)
                start_info = this_info  # start line for this ID

            # also adding a sanity check here - the strand
            # should be the same for every line of a given
            # operon ID
            if start_info["strand"] != this_info["strand"]:
                print("warning, strand info inconsistent")

            previous_info = this_info  # ready for next iteration

        _add_result(results, start_info, this_info)  # last ID

    return results


def _add_result(results, start_info, end_info):
    """
    add to the results a dictionary based on start line info
    but with end line info used for the 'end' field
    """
    info = start_info.copy()
    info['end'] = end_info['end']
    results.append(info)


for result in parse_file('operonmap.opr'):
    # write out some info
    print(result['operon_id'],
          result['start'],
          result['end'],
          result['strand'])

这使得：

1132034 2052 4997 +
1132035 5123 9818 +
1132036 11421 11692 -
1132037 14089 14877 +

网友

2楼 · 编辑于 2024-10-06 13:33:22

也许试试这种逻辑？它只是有一个临时变量，跟踪您看到的最后一个操作ID，并在更改后切换开始/结束：

In [21]: lines = open("test.csv").read().splitlines()

In [22]: lines
Out[22]:
['OperonID,GI,Synonym,Start,End,Strand,Length',
 '1132034,397671780,RVBD_0002,2052,3260,+,402',
 '1132034,397671781,RVBD_0003,3280,4437,+,385',
 '1132034,397671782,RVBD_0004,4434,4997,+,187',
 '1132035,397671783,RVBD_0005,5123,7267,+,714',
 '1132035,397671784,RVBD_0006,7302,9818,+,838',
 '1132036,397671786,RVBD_0007Ac,11421,11528,-,35',
 '1132036,397671787,RVBD_0007Bc,11555,11692,-,45',
 '1132037,397671792,RVBD_0012,14089,14877,+,262']

In [23]: cur_operonid = ''

In [24]: cur_end = None
In [27]: cur_start = None
    ...: for line in lines[1:]:
    ...:     cols = line.split(','). # or line.split('\t') for tab-delimit
    ...:     if cur_operonid != cols[0]:  # New OperonID reached
    ...:         if cur_start is not None:
    ...:             print(f"{cur_operonid} went from {cur_start} to {cur_end}")
    ...:         cur_operonid = cols[0]
    ...:         cur_start = cols[3]
    ...:     else:
    ...:         cur_end = cols[4]
    ...:
1132034 went from 2052 to 4997
1132035 went from 5123 to 9818
1132036 went from 11421 to 11692

网友

3楼 · 编辑于 2024-10-06 13:33:22

如果你使用熊猫，如果你想走那条路，这是很容易的

我能够将您的数据读入pandas DataFrame，然后删除了其他列：

   Start    End Strand OperonID
0   2052   3260      +  1132034
1   3280   4437      +  1132034
2   4434   4997      +  1132034
3   5123   7267      +  1132035
4   7302   9818      +  1132035
5  11421  11528      -  1132036
6  11555  11692      -  1132036
7  14089  14877      +  1132037

然后我按OperonID分组，并将Start和End和Strand值存储为列表，并创建一个新列，其中第一个Start和最后一个Endper OperonID值以及唯一的Strand值。您可以根据需要重新组织它

df2 = df.groupby('OperonID')[['Start', 'End', 'Strand']].agg(list)
df2['result'] = df2.apply(lambda x: (x['Start'][0], x['End'][-1], set(x['Strand'])), axis=1)

df2['result']:

OperonID
1132034      (2052, 4997, {+})
1132035      (5123, 9818, {+})
1132036    (11421, 11692, {-})
1132037    (14089, 14877, {+})

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用enumerate在python中打印前一行

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >