在python中通过namedtuple csv循环跟踪进度

2024-09-24 22:20:17 发布

您现在位置:Python中文网/ 问答频道 /正文

使用collections.namedtuple,下面的Python代码通过数据库中记录的标识符(列ContentItemId中的整数)的csv文件工作。一个示例记录是https://api.aucklandmuseum.com/id/library/ephemera/21291。你知道吗

其目的是检查给定id的HTTP状态并将其写入磁盘:

import requests
from collections import namedtuple
import csv

with open('in.csv', mode='r') as f:
    reader = csv.reader(f)

    all_records = namedtuple('rec', next(reader))
    records = [all_records._make(row) for row in reader]

    #Create output file
    with open('out.csv', mode='w+') as o:
        w = csv.writer(o)
        w.writerow(["ContentItemId","code"])

        count = 1
        for r in records:
            id   = r.ContentItemId
            url  = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
            req  = requests.get(url, allow_redirects=False)
            code = req.status_code
            w.writerow([id, code])

如何通过后一个循环将代码的进度(最好是25%、50%和75%的连接点)打印到控制台?另外,如果我在底部添加一个未缩进的print("Complete"),会到达那一行吗?你知道吗

提前谢谢。你知道吗


编辑:谢谢你的帮助。我的(正在工作!)代码现在如下所示:

import csv
import requests
import pandas
import time
from collections import namedtuple
from tqdm import tqdm

with open('active_true_pub_no.csv', mode='r') as f:
    reader = csv.reader(f)

    all_records = namedtuple('rec', next(reader))
    records = [all_records._make(row) for row in reader]

    with open('out.csv', mode='w+') as o:
        w = csv.writer(o)
        w.writerow(["ContentItemId","code"])

        num = len(records)
        print("Checking {} records...\n".format(num))

        with tqdm(total=num, bar_format="{percentage:3.0f}% {bar} [{n_fmt}/{total_fmt}]  ", ncols=64) as pbar:
            for r in records:
                pbar.update(1)
                id   = r.ContentItemId
                url  = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
                req  = requests.get(url, allow_redirects=False)
                code = req.status_code
                w.writerow([id, code])
                # time.sleep(.25)

print ('\nSummary: ')
df = pandas.read_csv("out.csv")
print(df['code'].value_counts())

我用pandas'^{}来总结最后的结果。你知道吗


Tags: csvinimportidmodeaswithcode
3条回答

要获取进度条,请使用TQDM:

数据(来自in.csv):

ContentItemId
21200
21201
21202
21203
21204
21205
21206
...
21296
21297
21298
21299
21300

代码:

from collections import namedtuple
import csv
import requests
from tqdm import tqdm


with open('in.csv', mode='r') as f:
    reader = csv.reader(f)

    all_records = namedtuple('rec', next(reader))
    records = [all_records._make(row) for row in reader]

    #Create output file
    with open('out.csv', mode='w+') as o:
        w = csv.writer(o)
        w.writerow(["ContentItemId","code"])

        count = 1

        with tqdm(total=len(records)) as pbar:
            for r in records:
                pbar.update(1)
                id   = r.ContentItemId
                url  = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
                req  = requests.get(url, allow_redirects=False)
                code = req.status_code
                w.writerow([id, code])
    print('Complete!')
  • 注意在for-loop前面加了with tqdm(total=len(records)) as pbar:
  • 从控制台运行时,将出现一个进度条,显示完成百分比。你知道吗
  • enter image description here
  • enter image description here
  • 注意图像的左边,21/101,这是通过records列表长度的计数。
    • tqdm提供百分比进度条和计数complete/total
# sudo pip3 install tqdm

import time
import tqdm

records = ['a', 'b', 'c', 'd', 'e']

with tqdm.tqdm(smoothing=0.1, total=len(records)) as pbar:
    for k, record in enumerate(records):
        time.sleep(1)
        pbar.update()

enter image description here


这都是相对的,所以让我们做一些一般的数学。:)

# sudo pip3 install tqdm

import time
import tqdm

total = 5000
_number_left = 5000
with tqdm.tqdm(smoothing=0.1, total=total) as pbar:
    relatively_done = 0
    relatively_done_sum = 0
    for k in range(0, 5000, 2):  # 0, 2, 4, ... 4998
        time.sleep(0.0005)
        _number_left -= 2  # input from some worker process for example
        absolutely_done = total - _number_left
        relatively_done = absolutely_done - relatively_done_sum
        relatively_done_sum += relatively_done
        pbar.update(relatively_done)

enter image description here

我假设你指的是已经处理的记录的百分比。您也可以在循环中执行print("Complete")。你知道吗

count = 0
for r in records:
    id   = r.ContentItemId
    url  = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
    req  = requests.get(url, allow_redirects=False)
    code = req.status_code
    w.writerow([id, code])
    count += 1
    if count == len(records):
        print("Complete")
    # Need the round in case list of records isn't divisible by 4
    elif count % round(len(records) / 4) == 0:
        # Round fraction to two decimal points and multiply by 100 for
        # integer percentage
        progress = int(round(count / len(records), 2) * 100)
        print("{}%".format(progress))

相关问题 更多 >