python如何解析指定div标记中的标记

2024-10-04 11:31:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经尝试过这个网站上的其他解决方案,但仍然无法解决问题,我的问题是:

<div class="ds"><div title="Today" class="dh">...<div title="Pazartesi" class="dh">26 Agu Pzt

我想解析这个“ds”类中的所有div标题,谢谢你的帮助


Tags: div标题todaytitle网站ds解决方案class
1条回答
网友
1楼 · 发布于 2024-10-04 11:31:01

使用BeautifulSouplxml或类似模块代替regex。你知道吗


BeautifulSoup

from bs4 import BeautifulSoup

text = '<div class="ds"><div title="Today" class="dh">...<div title="Pazartesi" class="dh">26 Agu Pzt'

soup = BeautifulSoup(text, 'html.parser')

for item in soup.select('.ds div[title]'):
    print(item['title'])

# or as list comprehensions    

titles = [item['title'] for item in soup.select('.ds div[title]')]
print(titles)

lxml

import lxml.html

text = '<div class="ds"><div title="Today" class="dh">...<div title="Pazartesi" class="dh">26 Agu Pzt'

soup = lxml.html.fromstring(text)

for item in soup.cssselect('.ds div[title]'):
    print(item.attrib['title'])

# or as list comprehensions    

titles = [item.attrib['title'] for item in soup.cssselect('.ds div[title]')]
print(titles)

PyQuery

import pyquery

text = '<div class="ds"><div title="Today" class="dh">...<div title="Pazartesi" class="dh">26 Agu Pzt'

soup = pyquery.PyQuery(text)

for item in soup('.ds div[title]'):
    print(item.attrib['title'])

# or as list comprehensions    

titles = [item.attrib['title'] for item in soup('.ds div[title]')]
print(titles)

parsel:(由Scrapy's Selectors使用)

import parsel

sel = parsel.Selector(text)

for item in sel.css('.ds div[title]'):
    print(item.attrib['title'])

titles = [item.attrib['title'] for item in sel.css('.ds div[title]')]
print(titles)

相关问题 更多 >