Python/Selenium如何从内容中提取文本?

2024-10-03 04:40:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从网站的人员列表中提取bios: https://blueprint.connectiv.com/speakers/

我想摘录他们的头衔、公司和简历。但是,只有当您单击网站上的每张照片时,bio才可用。 enter image description here

下面是我提取标题的代码&;公司:

driver.find_element_by_xpath("//*[@id='speakers']/div/div/div/div/div/div/div").text.split('\n')

有人能帮我提取每个人的bios吗?任何建议都将不胜感激


Tags: httpsdivcom标题列表人员网站公司
2条回答

如果您要查找的所有信息都在一个段落标记<p>内,该标记具有一个类bio(so <p class='bio'>),并且源代码中已经存在所有模态,那么您只需使用以下选项选择all:

bios = driver.find_elements_by_xpath('//p[@class="bio"]') 

它将选择作为<p>标记的所有元素,该标记也有一个等于'bio'的类,并在列表中返回它。如果某些p标记中有其他类(即<p class='bio someotherclass'>),则需要在xpath中使用contains()方法,如下所示:

bios = driver.find_elements_by_xpath('//p[contains(@class, "bio")]') 

然后,您可以像这样循环查看结果:

for bio in bios:
    print(bio.text)

您不必单击图像,因为每个扬声器的所有模态都已在源中完全填充。您可以使用driver.execute_script从这些模态中提取内容:

from selenium import webdriver
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://blueprint.connectiv.com/speakers/')
results = d.execute_script("""
  var people = [];
  for (var i of document.querySelectorAll('.modal.speakerCard')){
     people.push({
        name:i.querySelector('.description h4').textContent,
        title:i.querySelector('p.title').textContent,
        company:i.querySelector('p.company').textContent,
        bio:i.querySelector('p.bio').textContent,
     });
  }
  return people;
""")

输出(前20个结果):

[{'bio': 'Andrew is a recovering consultant turned serial entrepreneur, startup mentor and angel investor. He is the Managing Director of Dreamit Urbantech, investing in Proptech and Construction Tech. Andrew has written for Fortune, Forbes, Propmodo, CREtech, Builders Online, Architect Magazine, Multifamily Executive, AlleyWatch, Edsurge, The 74 Million, et. al. Andrew founded two companies and has a keen appreciation for how hard it is to build a successful startup, even under the best of circumstances.', 'company': 'Dreamit Ventures', 'name': 'Andrew Ackerman', 'title': 'Venture Partner'}, {'bio': 'Salman Ahmad is the CEO and co-founder of Mosaic, a construction technology company focused on making homebuilding scalable. By standardizing the process (homebuilding) and not the product (homes), Mosaic is delivering places people love and creating better communities. Salman holds a PhD in Electrical Engineering and Computer Science from MIT, focusing on Programming Language Design for Service-Oriented Systems, an MS in Computer Science from Stanford University focusing on Human Computer Interaction, and a BSE in Computer Systems Engineering from Arizona State University.\u2028 He also has 20 technical publications and patents in the areas of software systems, programming languages, machine learning, human-computer interaction, and sensor hardware. With a passion for construction, software, and computer science, Salman co-founded Mosaic to build places people love and make them widely available. ', 'company': 'Mosaic', 'name': 'Salman Ahmad', 'title': 'CEO and Co-Founder '}, {'bio': 'Dafna Akiva is a 10+ year veteran in the real estate investment, development, management and construction industries. Before assuming the role of Chief Revenue Officer at Veev, Dafna oversaw day-to-day operations and drove a number of company-scaling initiatives as Chief Operating Officer. Now, as Chief Revenue Officer, Dafna leads the development of new Veev projects that redefine customers’ living experiences, and drive revenue growth for the company’s bottom line. She oversees all real estate acquisitions and operation strategies, the real estate developments and account management, as well as sales, marketing, legal and HR.', 'company': 'Veev', 'name': 'Dafna Akiva', 'title': 'CRO & Co-Founder'}, {'bio': 'Min Alexander serves as CEO of PunchListUSA, the real estate platform digitizing home inspections for online ordering of repairs and lifecycle services. For the past decade, Min has been driving digital disruption to democratize real estate. She has led two national B2B2C platforms, field operations and created a top-10 U.S. brokerage, transforming the industry to increase access, quality and transparency.\n\nPrior to joining PunchListUSA, Min served as COO for Auction.com, as CEO and President of REALHome Services and Solutions and as SVP of Real Estate Services at Altisource. Min holds a BA from Duke and MBA from MIT. ', 'company': 'PunchlistUSA', 'name': 'Min Alexander', 'title': 'CEO & Co-Founder'}, {'bio': 'Nora Apsel is the Co-founder and CEO of Morty, the online mortgage marketplace. Morty provides homebuyers a place to evaluate competitive offers from multiple lenders, then lock and close their loans through an automated platform. Founded and led by engineers, Morty uses technology to forge a new path in mortgage: fully digital, free of legacy infrastructure, and backed by the flexible, scalable capital base of traditional lenders. As CEO, Nora is leading the Morty team through rapid, product-driven growth and nationwide expansion. Morty is a venture-backed company whose investors include Thrive Capital, Lerer Hippeau, MetaProp, March Capital, Prudence Holdings, FJ Labs and Rethink Impact. Trained as a software engineer before becoming an operator, Nora holds a M.S. in Computer Science from the University of Pennsylvania and a B.S. from Emory University.', 'company': 'Morty', 'name': 'Nora Apsel', 'title': 'CEO & Co-Founder'}, {'bio': 'Carey Armstrong is the co-founder and chief revenue officer of Tomo, a fintech startup that will provide the most customer-centric way to buy a home. Tomo was founded in the fall of 2020, raising an initial seed round of $40 million led by Ribbit Capital, NFX and Zigg Capital.\n\nCarey’s focus is on defining and delivering a delightful home buying experience for Tomo customers. She leads the development of our core transactional product offering as well as the growth and evolution of the business units that support it, including mortgage and brokerage. \n\nBefore co-founding Tomo, Carey was Vice President, Premier Agent, at Zillow Group, where she led business strategy, product strategy, and core operations for the $1B buyer services business. In this capacity, she was responsible for major leaps forward with initiatives including Connections, Home Tours, and Flex Select teams. \n\nPrior to Zillow, Carey was a strategy consultant and industry analyst with Boston Consulting Group and Forrester Research, respectively. Carey has a B.A. from Harvard University and an M.B.A.  from the Tuck School of Business at Dartmouth. She and her family reside in Seattle.', 'company': 'Tomo', 'name': 'Carey Armstrong', 'title': 'CRO & Co-Founder'}, {'bio': 'Arie is the founder and CEO of WiredScore, the pioneer behind the international WiredScore certification system that evaluates and distinguishes best-in-class Internet connectivity in commercial buildings. Prior to founding WiredScore, Arie worked as a consultant with the Boston Consulting Group in New York City where he focused on the technology and media industries. Arie holds an MBA from the Wharton School and a BA and BS in Business and Political Science from the University of California, Berkeley.', 'company': 'WiredScore', 'name': 'Arie Barendrecht', 'title': 'CEO & Founder'}, {'bio': 'Demetrios Barnes is the Chief Operating Officer of SmartRent, where he leads the client engagement, supply chain and field operations teams. With over a decade of experience in property management operations, he is passionate about helping owners and operators understand the innovations technology can produce, while forging strong interpersonal relationships and participating in thought leadership discussions. Prior to co-founding SmartRent, he was Vice President of Technology for Colony Starwood Homes, Previously, Mr. Barnes was Director of Property Management and Technology with Beazer Pre-Owned Rental Homes, and a Regional Manager for several multifamily companies. Mr. Barnes holds a Bachelor of Science in Business Administration from Arizona State University.', 'company': 'SmartRent', 'name': 'Demetrios Barnes', 'title': 'COO & Co-Founder'}, {'bio': "Ryan J. S. Baxter is PropTech Advisor to the New York State Energy Research and Development Authority (NYSERDA), Cofounder of the PropTech Challenge, NYC Community Growth Lead for MetaProp NYC, and the founder of PASSNYC. Previously, Ryan served as a Vice President at the Real Estate Board of New York (REBNY). He is a native New Yorker who works passionately to make the City's built environment more educational.\n", 'company': 'Proptech Challenge', 'name': 'Ryan Baxter', 'title': 'Co-Founder'}, {'bio': 'Gary is CEO of Roofstock, a leading real estate investment marketplace which he co-founded in 2015. Gary has spent most of his career building businesses in the real estate, hospitality and tech sectors. After earning his BA in economics from Northwestern, Gary ventured west to earn his MBA from Stanford, where he caught the entrepreneurial bug and still serves as a regular guest lecturer. Previously Gary was instrumental in acquiring and integrating more than $800 million of resort properties for KSL Resorts, and spent five years as CFO of online brokerage pioneer ZipRealty, which he led through its successful IPO in 2004. Gary also served as CEO of Joie de Vivre Hospitality, then the second largest boutique hotel management company in the country. Immediately before starting Roofstock, Gary led one of the largest single-family rental platforms in the U.S. through its IPO as co-CEO of Starwood Waypoint Residential Trust, now part of Invitation Homes.', 'company': 'Roofstock', 'name': 'Gary Beasley', 'title': 'CEO & Co-Founder '}, {'bio': "Robyn has a track record of taking sophisticated climate and clean energy-related technical concepts and transforming them into commercially-oriented strategies that lead to impact, scale and results. She began her career in 2004 at Google in Mt View, CA, reporting directly to the co-founders working on strategic initiatives as they took the company public. Robyn went on to found Google's first business unit focused on incorporating clean energy generation across the company's global operations. In this capacity, she oversaw and catalyzed Google’s first clean energy initiatives, including large-scale clean energy procurement for data centers and the development and installation of a 1.7MW rooftop solar installation at the Mountain View HQ. Since then she has built, invested in, and raised $50M+ for new ventures and programs for Vestas Wind A/S in Copenhagen, Dean Kamen at DEKA R&D, and NRG Energy. Most recently she was an executive at Lennar Corp, where she built the firm’s first corporate venture platform while incubating Blueprint Power Technologies. Today, Robyn Beavers is the CEO and co-founder of Blueprint Power, a NYC-based real estate tech company that turns buildings into revenue-generating clean power plants. Robyn was named EY’s NY Entrepreneur of the Year in 2020. Robyn holds both a B.S. in Civil Engineering and an MBA from Stanford University.", 'company': 'Blueprint Power', 'name': 'Robyn Beavers', 'title': 'CEO  & Co-Founder'}, {'bio': 'Liza Benson is a Partner with Moderne Ventures and helps lead and manage investment activity with particular focus on high-growth technology companies that can achieve rapid adoption and scale. Moderne Ventures is an early stage investment fund and industry immersion program which is focused on investing in technology companies in and around the multi-trillion dollar industries of real estate, mortgage, finance, insurance and home services.\n\nPrior to Moderne, Liza was a Partner with StarVest Partners, a $400M venture fund focused on expansion stage B2B SaaS investments. Previously, Liza was a Managing Director in the growth equity group at Highbridge Principals Strategies, a multi-billion asset manager. Before her experience at Highbridge, Liza was a Managing Director with Bear Stearns’ Constellation Growth Capital and an investment banker at Patricof & Co and First Union where she started her career.', 'company': 'Moderne Ventures', 'name': 'Liza Benson', 'title': 'Partner'}, {'bio': 'Jeremy Bernard is the CEO, North America at essensys, the world’s leading provider of software and technology to the flexible real estate industry. He has over 25 years of experience in the real estate and technology sectors. Most recently, Jeremy was the Global Head of Real Estate for Knotel where he grew and oversaw a portfolio of 5.5MM sq ft of flexible office space around the world. In previous roles, he has held C-level positions at real estate investment firms and launched several proptech companies. Jeremy resides in Westport, CT with his wife Jamie, daughter Morgan and son Brody.', 'company': 'essensys', 'name': 'Jeremy Bernard', 'title': 'CEO, North America'}, {'bio': "Benjamin Birnbaum is a Partner at Keyframe – a NYC based investment firm.  His focus is primarily on how technology is causing market change across a number of physical infrastructure categories, like transportation and energy, inspired by earlier career experiences as an operating leader for one of the world's largest passenger transportation companies.  Ben is also a co-founder of TeraWatt Infrastructure, a specialized owner of electric vehicle charging infrastructure focused on fleet electrification. ", 'company': 'Keyframe Capital', 'name': 'Ben Birnbaum', 'title': 'Partner'}, {'bio': 'Sean is the Co-Founder & CEO of BLACK, a tech-powered and cloud based CRE brokerage platform based in NYC. Prior to founding BLACK, Sean served as EVP of Real Estate and Enterprise Sales at WeWork, He has been involved in millions of square feet of commercial real estate leasing transactions over his 20 year tenure, and has worked at many of the world’s largest commercial brokerage firms including Cushman & Wakefield, JLL, Newmark, and Grubb & Ellis.  ', 'company': 'BlackRE', 'name': 'Sean Black', 'title': 'CEO & Co-Founder'}, {'bio': 'As chief operating officer of CA Student Living, Steve Boyack is responsible for driving the performance and growth of CASL’s property management platform, as well as overseeing its corporate operational functions including technology, human resources, communications and culture. Steve leverages his decades of experience in the industry to develop and advance the people, processes and technologies that form the foundation of the business.\n\nBoyack previously served as global head of property management for CA Ventures, a parent company of CA Student Living, where he laid the foundation for the firm’s European student operating platform (Novel Student), global sustainability initiative, wellness program and innovation department. Prior to joining CA, Steve was a senior managing director at Greystar where he was responsible for overseeing real estate operations and leading the expansion of the company’s footprint in key Midwest markets. In addition, he oversaw Greystar’s national construction and maintenance operations and worked with their global innovation team.\n\nSteve earned a BS in Economics from the University of Iowa and a CPM® designation from the Institute of Real Estate Management. As a\xa0member of several industry advisory boards and associations, Steve is a\xa0recognized subject matter expert and thought leader, with particular focus on integrated property technology.', 'company': 'CA Ventures', 'name': 'Steve Boyack', 'title': 'COO, Student Living'}, {'bio': 'Laura Cain is the CEO and co-founder of Willow Servicing, a technology company focused on streamlining mortgage servicing. Willow’s platform automates core workflows, enabling lenders to provide digital-first borrower experiences while reducing operational costs and ensuring compliance with industry policies & regulations. Prior to Willow, Laura was a product manager at Snapdocs, where she built out their initial eClose product offering to lenders, and a venture investor at Thomvest, where she focused on early stage fintech investments.', 'company': 'Willow Servicing', 'name': 'Laura Cain', 'title': 'CEO & Co-Founder'}, {'bio': 'Madhu Chamarty is the co-founder and CEO of BeyondHQ, a startup that helps companies plan and scale distributed teams. An engineer and math nerd at heart, he has 15+ yrs of startup experience in Silicon Valley, as an early employee and co-founder at 3 high-growth B2B startups in digital media (Adify - Cox acq. @ $300MM), employee communities (Dynamic Signal), and geospatial analytics (Descartes Labs). He has scaled sales & support teams globally, in both colocated and remote formats. He grew up in a fully distributed family across 4 countries, so believes he was destined to build BeyondHQ even before he knew it.', 'company': 'BeyondHQ', 'name': 'Madhu Chamarty', 'title': 'CEO & Co-Founder'}, {'bio': 'Alex Chatzielftheriou is a Greek entrepreneur and CEO and co-founder of Blueground — a real estate tech company founded in 2013. Blueground provides a network of fully-furnished, move-in ready apartments in 14 cities across the globe for stays of a month, a year, or longer. Having lived and worked in more than 15 cities around the world, Alex sought to provide business and leisure travelers with a hassle-free way to find places that feel like home — to show up and start living from day one. Along the way, Alex disrupted the traditional lease model, enabling flexible living to encourage travel and exploration of the world and its cultures while providing a place to feel "grounded" and call home. ', 'company': 'Blueground', 'name': 'Alex Chatzieleftheriou', 'title': 'CEO & Co-Founder'}, {'bio': 'Jit Kee Chin is the Chief Data & Innovation Officer and Executive Vice President at Suffolk. Ms. Chin is responsible for leveraging big data and advanced analytics to improve the organization’s core business. Ms. Chin is also responsible for helping to position Suffolk to achieve its vision of transforming the construction experience while working closely with the company’s Innovation and Strategy teams to fundamentally reinvent the future of construction in the digital age. \n\nPrior to her role at Suffolk, Ms. Chin spent 10 years with management consulting firm McKinsey and Company where she counseled senior executives on strategic, commercial and advanced analytics topics. Most recently, she was a Senior Expert in Analytics in McKinsey’s Boston office where she specialized in the design and implementation of end- to-end analytics transformations. Prior to that role, Ms. Chin was an Associate Principal in McKinsey’s London office where she helped organizations drive multi-year business transformations and change programs and developing strategies for profitable growth.', 'company': 'Suffolk Construction', 'name': 'Jit Kee Chin', 'title': 'Chief Data & Innovation Officer'}]

pandas中:

import pandas as pd
df = pd.DataFrame(results)
print(df)

输出:

                                                   bio           company             name                        title
0    Andrew is a recovering consultant turned seria...  Dreamit Ventures  Andrew Ackerman              Venture Partner
1    Salman Ahmad is the CEO and co-founder of Mosa...            Mosaic     Salman Ahmad          CEO and Co-Founder 
2    Dafna Akiva is a 10+ year veteran in the real ...              Veev      Dafna Akiva             CRO & Co-Founder
3    Min Alexander serves as CEO of PunchListUSA, t...      PunchlistUSA    Min Alexander             CEO & Co-Founder
4    Nora Apsel is the Co-founder and CEO of Morty,...             Morty       Nora Apsel             CEO & Co-Founder
..                                                 ...               ...              ...                          ...
128  Ms. Wong joined Tishman Speyer in 2015. Jenny ...    Tishman Speyer       Jenny Wong            Managing Director
129  Joseph is the Founder and CEO of Neighbor.com,...          Neighbor  Joseph Woodbury                CEO & Founder
130  Based in Palo Alto, Michael Yang is a Managing...    OMERS Ventures     Michael Yang             Managing Partner
131  Since joining RET Ventures as Partner in 2019,...      RET Ventures  Christopher Yip  Partner & Managing Director
132  Chris Zlocki, Global Head of Client Experience...          Colliers     Chris Zlocki       EVP, Occupier Services

[133 rows x 4 columns]

您可以使用BeautifulSoup代替driver.execute_script

from bs4 import BeautifulSoup as soup
from selenium import webdriver
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://blueprint.connectiv.com/speakers/')
s = soup(d.page_source, 'html.parser').select('.modal.speakerCard')
r = [dict(zip(['name', 'title', 'company', 'bio'], 
    [b.text for b in i.select(':is(h4, p.title, p.company, p.bio)')])) for i in s]

相关问题 更多 >