如何使用Python请求和JSON从基于Java的网页中提取数据?

2024-06-25 05:46:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我是Python新手

我试图从这个页面中提取一般信息:https://www.sunnxt.com/tamil-movie/detail/8168像电影名称、年份、主页上显示的语言

我尝试使用这段代码,但没有成功,因为没有生成完整的html页面

url = 'https://www.sunnxt.com/telugu-movie/detail/31257'

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36',

}

data = requests.get(url, headers=headers)
data = BeautifulSoup(data.content, 'html.parser').prettify()

Tags: httpscom信息urldata电影htmlwww
1条回答
网友
1楼 · 发布于 2024-06-25 05:46:48

页面的源文件中有一个脚本元素
此元素包含您正在查找的数据。
你所要做的就是从那里获取数据

<script type="application/ld+json">
        {
            "@context": "https://schema.org",
            "@type": "VideoObject",
            "name" : "Yaaradi Nee Mohini",
            "description": "Vasu falls in love with Keerthi.When he expresses his feelings to her,she rejects him saying that her marriage has been fixed.Later,he learns that she is about to marry his close friend Cheenu. Vasu&#039;s father meets Keerthi to talk about his son, but she insults him in front of her coworkers. The very same night, Vasu&#039;s father dies and Vasu is all alone except for his close friends Cheenu and Ganesh.  Cheenu forces Vasu that he should come with him to native for his wedding with Keerthi. Will Vasu be able to hold himself together while Keerthi gets married to Cheenu?",
            "url": "https://www.sunnxt.com/tamil-movie/detail/8168",
            "embedUrl": "https://www.sunnxt.com/tamil-movie/detail/8168",
            "contentUrl": "https://www.sunnxt.com/tamil-movie/detail/8168",
            "uploadDate" : "2017-03-31T00:00:00.000Z",
            "image": ["/images/logo.png", "https://sund-images.sunnxt.com/8168/500x750_ccd804fa-f7df-4ddf-8be7-c7752ea75bf0.jpg"],
            "thumbnailUrl": "https://sund-images.sunnxt.com/8168/500x750_ccd804fa-f7df-4ddf-8be7-c7752ea75bf0.jpg",
            "duration":"P0DT2H38M0S",
            "requiresSubscription": {
                "@type": "MediaSubscription",
                "name": "SUNNXT guest user",
                "authenticator": {
                    "@type": "Organization",
                    "name": "SUNNXT"
                }
            }
        }
    </script>

相关问题 更多 >