硒刮美容素

2024-10-08 18:21:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在寻找一些在python中使用selenium进行刮片的帮助。您需要付费帐户才能查看此页面,因此无法创建可复制的帐户

The page I'm trying to scrape

我正试图从蓝色斑点和黑色箭头中提取数据。 数据在这段HTML中

<svg viewBox="0 0 105 68" class="video-summaries__field-arrows" preserveAspectRatio="none" xmlns="http://www.w3.org/2000/svg">
   <defs>
      <marker fill="#000" id="default_arrow" markerWidth="5" markerHeight="4" orient="auto" refX="5" refY="2" stroke="none">
         <polygon points="0 0, 5 2, 0 4"></polygon>
      </marker>
      <marker fill="#0033ff" id="hover_arrow" markerWidth="2.9" markerHeight="2.4" orient="auto" refX="2.5" refY="1.2" stroke="none">
         <polygon points="0 0, 2.9 1.2, 0 2.4"></polygon>
      </marker>
   </defs>
   <path class="videosummaries-arrows" d="M52.5 35.1 37.6 33.3" fill="none" marker-end="url(#default_arrow)" stroke="url(#gradient_0)" style="stroke-width: 0.25;"></path>
   <linearGradient gradientUnits="userSpaceOnUse" id="gradient_0" x1="52.5" x2="37.6" y1="35.1" y2="33.3">
      <stop offset="5%" stop-color="#000" stop-opacity="0.1"></stop>
      <stop offset="100%" stop-color="#000" stop-opacity="1"></stop>
   </linearGradient>
   <path class="videosummaries-arrows" d="M38.2 34.7 76.6 62" fill="none" marker-end="url(#default_arrow)" stroke="url(#gradient_1)" style="stroke-width: 0.25;"></path>
   <linearGradient gradientUnits="userSpaceOnUse" id="gradient_1" x1="38.2" x2="76.6" y1="34.7" y2="62">
      <stop offset="5%" stop-color="#000" stop-opacity="0.1"></stop>
      <stop offset="100%" stop-color="#000" stop-opacity="1"></stop>
   </linearGradient>
   <path class="videosummaries-arrows" d="M61.6 67.8 36.3 63.9" fill="none" marker-end="url(#default_arrow)" stroke="url(#gradient_2)" style="stroke-width: 0.25;"></path>
   <linearGradient gradientUnits="userSpaceOnUse" id="gradient_2" x1="61.6" x2="36.3" y1="67.8" y2="63.9">
      <stop offset="5%" stop-color="#000" stop-opacity="0.1"></stop>
      <stop offset="100%" stop-color="#000" stop-opacity="1"></stop>
   </linearGradient>
   <path class="videosummaries-arrows" d="M36.3 63.9 36.5 26.700000000000003" fill="none" marker-end="url(#default_arrow)" stroke="url(#gradient_3)" style="stroke-width: 0.25;"></path>
   <linearGradient gradientUnits="userSpaceOnUse" id="gradient_3" x1="36.3" x2="36.5" y1="63.9" y2="26.700000000000003">
      <stop offset="5%" stop-color="#000" stop-opacity="0.1"></stop>
      <stop offset="100%" stop-color="#000" stop-opacity="1"></stop>
   </linearGradient>

我特别是想刮干净地板 x1x2y1y2 来自linearGradient标记的数据

我通过运行获取页面源代码

options = Options()
options.add_argument("start-maximized")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Users\James\OneDrive\Desktop\webdriver\chromedriver.exe')
driver.get('https://football.instatscout.com/teams/9487/video')
print("Page Title is : %s" %driver.title)
driver.find_element_by_name('email').send_keys('')
driver.find_element_by_name('pass').send_keys('')
driver.find_element_by_xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "hRAqIl", " " ))]').click() 
driver.implicitly_wait(10)
#driver.find_element_by_css_selector('.dropdown-btn:nth-child(12) .video-summaries__checkbox_red ').click()
driver.find_element_by_css_selector('.dropdown-btn:nth-child(12) > .video-summaries__checkbox').click()
driver.implicitly_wait(10)
driver.find_element_by_xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "ixmoFk", " " ))]').click()
driver.implicitly_wait(10)
driver.find_element_by_xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "video-summaries__checkbox-column-inner", " " ))]//*[contains(concat( " ", @class, " " ), concat( " ", "video-summaries__checkbox-column-row", " " )) and (((count(preceding-sibling::*) + 1) = 10) and parent::*)]//*[contains(concat( " ", @class, " " ), concat( " ", "video-summaries__checkbox", " " ))]').click()
driver.find_element_by_xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "dropdown-btn", " " )) and (((count(preceding-sibling::*) + 1) = 12) and parent::*)]//*[contains(concat( " ", @class, " " ), concat( " ", "video-summaries__checkbox_red", " " ))]').click()
html = driver.page_source

硒元素——但我不知道从那里可以走到哪里

最后,我想将其刮取到一个数据帧,其中包含“Name”“X1”“Y1”“X2”“Y2”列


Tags: pathurlstrokevideodriverfindmarkerclass
1条回答
网友
1楼 · 发布于 2024-10-08 18:21:49

您可以使用以下方法刮取数据:

from bs4 import BeautifulSoup as bs

html="""
<svg viewBox="0 0 105 68" class="video-summaries__field-arrows" preserveAspectRatio="none" xmlns="http://www.w3.org/2000/svg">
   <defs>
      <marker fill="#000" id="default_arrow" markerWidth="5" markerHeight="4" orient="auto" refX="5" refY="2" stroke="none">
         <polygon points="0 0, 5 2, 0 4"></polygon>
      </marker>
      <marker fill="#0033ff" id="hover_arrow" markerWidth="2.9" markerHeight="2.4" orient="auto" refX="2.5" refY="1.2" stroke="none">
         <polygon points="0 0, 2.9 1.2, 0 2.4"></polygon>
      </marker>
   </defs>
   <path class="videosummaries-arrows" d="M52.5 35.1 37.6 33.3" fill="none" marker-end="url(#default_arrow)" stroke="url(#gradient_0)" style="stroke-width: 0.25;"></path>
   <linearGradient gradientUnits="userSpaceOnUse" id="gradient_0" x1="52.5" x2="37.6" y1="35.1" y2="33.3">
      <stop offset="5%" stop-color="#000" stop-opacity="0.1"></stop>
      <stop offset="100%" stop-color="#000" stop-opacity="1"></stop>
   </linearGradient>
   <path class="videosummaries-arrows" d="M38.2 34.7 76.6 62" fill="none" marker-end="url(#default_arrow)" stroke="url(#gradient_1)" style="stroke-width: 0.25;"></path>
   <linearGradient gradientUnits="userSpaceOnUse" id="gradient_1" x1="38.2" x2="76.6" y1="34.7" y2="62">
      <stop offset="5%" stop-color="#000" stop-opacity="0.1"></stop>
      <stop offset="100%" stop-color="#000" stop-opacity="1"></stop>
   </linearGradient>
   <path class="videosummaries-arrows" d="M61.6 67.8 36.3 63.9" fill="none" marker-end="url(#default_arrow)" stroke="url(#gradient_2)" style="stroke-width: 0.25;"></path>
   <linearGradient gradientUnits="userSpaceOnUse" id="gradient_2" x1="61.6" x2="36.3" y1="67.8" y2="63.9">
      <stop offset="5%" stop-color="#000" stop-opacity="0.1"></stop>
      <stop offset="100%" stop-color="#000" stop-opacity="1"></stop>
   </linearGradient>
   <path class="videosummaries-arrows" d="M36.3 63.9 36.5 26.700000000000003" fill="none" marker-end="url(#default_arrow)" stroke="url(#gradient_3)" style="stroke-width: 0.25;"></path>
   <linearGradient gradientUnits="userSpaceOnUse" id="gradient_3" x1="36.3" x2="36.5" y1="63.9" y2="26.700000000000003">
      <stop offset="5%" stop-color="#000" stop-opacity="0.1"></stop>
      <stop offset="100%" stop-color="#000" stop-opacity="1"></stop>
   </linearGradient>
   </svg>
"""

soup=bs(html,"xml")
for lg in soup.find_all("linearGradient",attrs={"gradientUnits":"userSpaceOnUse"}):
    print(lg["x1"],lg["y1"],lg["x2"],lg["y2"])

"""
52.5 35.1 37.6 33.3
38.2 34.7 76.6 62
61.6 67.8 36.3 63.9
36.3 63.9 36.5 26.700000000000003
"""

我们正在使用xml解析器从svg中提取数据。我也用其他+lxml解析器进行了测试。但是没有成功。其他事情是基本的,使用tag name和属性gradientUnits查找元素。以及从element中查找属性

相关问题 更多 >

    热门问题