Python写入json文件中的刮取数据

2024-10-03 02:46:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我写了一个网页抓取脚本,它工作得很好。我正试图将刮取的数据写入json文件,但失败了

这是我的片段:

def scrape_post_info(url):
    content = get_page_content(url)
    title, description, post_url = get_post_details(content, url)
    job_dict = {}
    job_dict['title'] = title
    job_dict['Description'] = description
    job_dict['url'] = post_url

    json_job = json.dumps(job_dict)
    with open('data.json', 'a') as f:
        json.dump(json_job, f)

if __name__ == '__main__':
    urls = ['url1', 'url2', 'url3', 'url4']
    for url in urls:
        scrape_post_info(url)

忽略我在函数内部调用的两个函数,它们没有问题

我的问题只是写json

目前,我得到的是下面这样的数据,有错误的格式

data.json如下所示:

{
    "title": "this is title",
    "Description": " Fendi is an Italian luxury labelarin. ",
    "url": "https:/~"
}

{
    "title": " - Furrocious Elegant Style", 
    "Description": " the Italian luxare vast. ", 
    "url": "https://www.s"
}
    
{
    "title": "Rome, Fountains and Fendi Sunglasses",
    "Description": " Fendi started off as a store. ",
    "url": "https://www.~"
}
    
{
    "title": "Tipsnglasses",
    "Description": "Whether irregular orn season.", 
    "url": "https://www.sooic"
}

但应该是这样的:

[
{
    "title": "this is title",
    "Description": " Fendi is an Italian luxury labelarin. ",
    "url": "https:/~"
},

{
    "title": " - Furrocious Elegant Style", 
    "Description": " the Italian luxare vast. ", 
    "url": "https://www.s"
},
    
{
    "title": "Rome, Fountains and Fendi Sunglasses",
    "Description": " Fendi started off as a store. ",
    "url": "https://www.~"
},
    
{
    "title": "Tipsnglasses",
    "Description": "Whether irregular orn season.", 
    "url": "https://www.sooic"
},

]

我不明白为什么json文件中的数据格式不正确

有人能帮我吗


Tags: 数据httpsjsonurltitleisaswww
1条回答
网友
1楼 · 发布于 2024-10-03 02:46:28

您可以尝试使用此代码来解决您的问题。 您将获得如上所述的确切文件,以下是代码:

import json
def scrape_post_info(url, f):
    content = get_page_content(url)
    title, description, post_url = get_post_details(content, url)
    job_dict = {}
    job_dict['title'] = title
    job_dict['Description'] = description
    job_dict['url'] = post_url 

    json_job = json.dumps(job_dict)
    f.seek(0)
    txt = f.readline()
    if txt.endswith("}"):
        f.write(",")
    f.write(json_job)

if __name__ == '__main__':
    urls = ['url1', 'url2', 'url3', 'url4']
    with open('data.json', 'r+') as f:
        f.write("[")
        for url in urls:
            scrape_post_info(url,f)
        f.write("]")

相关问题 更多 >