将打印输出写入CSV Numpy

2024-06-28 19:08:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我想通过CSV写这个输出

['https://www.lendingclub.com/loans/personal-loans' '6.16% to 35.89%'] ['https://www.lendingclub.com/loans/personal-loans' '1% to 6%'] ['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%'] ['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%'] ['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%'] ['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%'] ['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%'] ['https://www.discover.com/personal-loans/' '6.99% to 24.99%']

但是,当我运行代码将输出写入CSV时,我只得到写入CSV文件的最后一行:

['https://www.discover.com/personal-loans/' '6.99% to 24.99%']

可能是因为我的打印输出不是逗号分隔的吗?我试图通过使用空格作为分隔符来避免在其中添加逗号。让我知道你的想法。我希望能在这方面得到一些帮助,因为我正在最艰难的时间重塑这些收集到的数据

plcompetitors = ['https://www.lendingclub.com/loans/personal-loans',
                'https://www.marcus.com/us/en/personal-loans',
                'https://www.discover.com/personal-loans/']

#cycle through links in array until it finds APR rates/fixed or variable using regex
for link in plcompetitors:
    cdate = datetime.date.today()
    l = r.get(link)
    l.encoding = 'utf-8'
    data = l.text
    soup = bs(data, 'html.parser')
    #captures Discover's rate perfectly but catches too much for lightstream/prosper
    paragraph = soup.find_all(text=re.compile('[0-9]%'))
    for n in paragraph:
        matches = re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string)
        try:
            irate = str(matches[0])
            array = np.asarray(irate)
            array2 = np.append(link,irate)
            array2 = np.asarray(array2)
            print(array2)
            #with open('test.csv', "w") as csv_file:
            #    writer = csv.writer(csv_file, delimiter=' ')
            #    for line in test:
            #        writer.writerow(line)
        except IndexError:
            pass

Tags: csvtoinhttpscomforwwwpersonal
2条回答

当谈到使用csv文件时,pandas很方便

import datetime
import requests as r
from bs4 import BeautifulSoup as bs
import numpy as np
import regex as re
import pandas as pd

plcompetitors = ['https://www.lendingclub.com/loans/personal-loans',
                'https://www.marcus.com/us/en/personal-loans',
                'https://www.discover.com/personal-loans/']

df = pd.DataFrame({'Link':[],'APR Rate':[]})
#cycle through links in array until it finds APR rates/fixed or variable using regex
for link in plcompetitors:
    cdate = datetime.date.today()
    l = r.get(link)
    l.encoding = 'utf-8'
    data = l.text
    soup = bs(data, 'html.parser')
    #captures Discover's rate perfectly but catches too much for lightstream/prosper
    paragraph = soup.find_all(text=re.compile('[0-9]%'))
    for n in paragraph:
        matches = re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string)
        irate = ''
        try:
            irate = str(matches[0])
            df2 = pd.DataFrame({'Link':[link],'APR Rate':[irate]})
            df = pd.concat([df,df2],join="inner")
        except IndexError:
            pass
df.to_csv('CSV_File.csv',index=False)        

我已将每个链接及其irate值存储在数据帧df2中,并将其连接到父数据帧df。 最后,我将父数据帧df写入csv文件

我认为问题是您正在以写模式打开文件(在open('test.csv', "w")中的"w"),这意味着Python会覆盖文件中已经写入的内容。我想你在找附加模式:

# open the file before the loop, and close it after
csv_file = open("test.csv", 'a')             # change the 'w' to an 'a'
csv_file.truncate(0)                         # clear the contents of the file
writer = csv.writer(csv_file, delimiter=' ') # make the writer beforehand for efficiency

for n in paragraph:
    matches = re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string)
    try:
        irate = str(matches[0])
        array = np.asarray(irate)
        array2 = np.append(link,irate)
        array2 = np.asarray(array2)
        print(array2)

        for line in test:
            writer.writerow(line)

    except IndexError:
        pass

# close the file
csv_file.close()

如果这样不行,请告诉我

相关问题 更多 >