解析CSV文件并修改列

name,time,Operations Cassandra,2015-10-06T15:07:22.333662984Z,INSERT Cassandra,2015-10-06T15:07:24.334536781Z,INSERT Cassandra,2015-10-06T15:07:27.339662984Z,READ Cassandra,2015-10-06T15:07:28.344493608Z,READ Cassandra,2015-10-06T15:07:28.345221189Z,READ Cassandra,2015-10-06T15:07:29.345623750Z,READ Cassandra,2015-10-06T15:07:31.352725607Z,UPDATE Cassandra,2015-10-06T15:07:33.360272493Z,UPDATE Cassandra,2015-10-06T15:07:38.366408708Z,UPDATE

1条回答

网友

1楼 · 发布于 2024-05-08 13:40:07

从您的示例中可以看出，您可以（大概地）保证操作列中某一种类的第一个条目和最后一个同类条目是开始和停止时间。如果你不能保证这一点，那就稍微复杂一点，但是让我们假设你不能更健壮。在

我们可以假设CSV中表示的数据是完整的。如果你遗漏了一个特定操作的条目，我们无能为力。我们还想读取时间戳，这可以使用dateutil.parser模块完成。在

所以我们可以先建立一个简短的字典来记录我们的值，然后建立一个填充字典的函数，它一次只接受一行。在

import dateutil.parser

ops = dict()

def update_ops(opsdict, row):

    # first get the timestamp and op name in a useable format
    timestamp = dateutil.parser.parse(row[1])
    op_name = row[2]

    ## now populate, or update the dictionary
    if op_name not in opsdict:
        # sets a new dict entry with the operation's timestamp.
        # since we don't know what the start time and end time 
        # is yet, for the moment set them both.
        opsdict[op_name] = { 'start_time': timestamp,
                            'end_time': timetstamp }
    else:
        # now evaluate the current timestamp against each start_time
        # and end_time value. Update as needed.
        if opsdict[op_name]['start_time'] > timestamp:
            opsdict[op_name]['start_time'] = timestamp
        if opsdict[op_name]['end_time'] < timestamp:
            opsdict[op_name]['end_time'] = timestamp

现在我们有了一个函数来进行排序，运行CSV文件读取器并填充ops。完成后，我们可以用字典中的内容生成一个新的CSV文件。在

^{pr2}$

你完了！我已经尽可能地把这件事弄清楚。您可能可以将许多内容分解为更少、更聪明的步骤（例如从一个csv中读取并直接写出）。但如果你遵循接吻原则，你以后读这篇文章，再从中吸取教训会更容易。在

相关问题更多 >

编程相关推荐

热门问题

热门文章