为web抓取提取特定HTML数据时出现问题

2024-10-08 20:19:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我对python非常陌生,但我正试图从https://emma.msrb.org/中为一个项目收集一些数据

我正在使用selenium来自动搜索CUSIP,然后希望获得有关CUSIP的具体细节

到目前为止,我已经能够自动从EMMA的主页导航到特定CUSIP的详细信息,但我在查找/提取CUSIP详细信息所需的HTML代码时遇到问题

这是我目前的代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as bs
import requests
# Enter CUSIP you want to search for
CUSIP = "266818BT9"

#EMMA Website Homepage URL
URL = "https://emma.msrb.org/"

#User-Agent
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36'
}

#Uses Chrome webdriver to navigate to the EMMA website stored in the variable URL
driver = webdriver.Chrome()
driver.get(URL)

driver.implicitly_wait(10)

#Waits until the cookie pop-up appears and then clicks accept
wait = WebDriverWait(driver, 10)
wait.until(EC.element_to_be_clickable((By.XPATH,'//*[@id="acceptId"]'))).click()

#Enters the defined CUSIP in the search bar on EMMA homepage
SearchBox = driver.find_element_by_xpath('//*[@id="quickSearchText"]')
SearchBox.send_keys(CUSIP)

#Clicks Search
SearchButton = driver.find_element_by_xpath('//*[@id="quickSearchButton"]')
SearchButton.click()

driver.implicitly_wait(10)

#Clicks Accept to EMMA terms
AcceptButton2 = driver.find_element_by_xpath('//*[@id="ctl00_mainContentArea_disclaimerContent_yesButton"]')
AcceptButton2.click()

driver.implicitly_wait(10)

#Pulls in the new URL for the specific CUSIP
NewURL = driver.current_url

#Loads the webpage contents
r=requests.get(NewURL, headers=headers)

#Convert to beautiful soup object
soup = bs(r.content, 'lxml')

print(soup)

代码正在工作,但是当我print(soup)打印的HTML似乎没有包含我需要的信息的HTML

我试图将HTML代码拉入python的代码如下所示。这段HTML代码包含了我想搜集的信息。具体地说,我正试图删去“日期”文本,然后删去<span class="float-right">08/12/2021</span>(08/12/2021)中的文本;以及<ul class="info-focus-2">下的所有其他信息

 <ul class="info-focus-2">
            <li>
                <span class="label genericQtipHelp" help="Date from which interest begins to accrue" data-hasqtip="162" aria-describedby="qtip-162">Dated Date:</span>
                <span class="float-right">08/12/2021</span>
            </li>
                <li>
                    <span class="label genericQtipHelp" help="Price / Yield at which a new issue of municipal securities is offered to the public" data-hasqtip="163" aria-describedby="qtip-163">Initial Offering Price/Yield:</span>
                    <span class="float-right">102.829% / 0.08%</span>
                </li>
            <li>
                <span class="label genericQtipHelp" help="Face value of a new issue of municipal securities offered to the public" data-hasqtip="164" aria-describedby="qtip-164">Principal Amount at Issuance:</span>
                <span class="float-right">$4,525,000</span>
            </li>
                <li>
                    <span class="label genericQtipHelp" help="Time at which the issuer and underwriter enter into a contract for a new issuance" data-hasqtip="165" aria-describedby="qtip-165">Time of Formal Award:</span>
                    <span class="float-right">07/30/2021 09:14 AM</span>
                </li>
                            <li>
                    <span class="label genericQtipHelp" help="Time at which underwriter executes the first trade" data-hasqtip="166">Time of First Execution:</span>
                    <span class="float-right">07/30/2021 01:30 PM</span>
                </li>
            <li>
                <span class="label genericQtipHelp" help="Date the issuer delivered the securities to the underwriter" data-hasqtip="167" aria-describedby="qtip-167">Closing Date:</span>
                <span class="float-right">08/12/2021</span>
            </li>
                <li>
                    <span class="label genericQtipHelp" help="Date provided by the continuing disclosure submitter to indicate the period covered by an annual and/or audited financial disclosure" data-hasqtip="168">Fiscal Year End Date:</span>
                    <span class="float-right">-</span>
                </li>
        </ul>

当Iprint(soup)

!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head id="ctl00_Head1">
<!-- MainMaster.Master -->
<title>
    Municipal Securities Rulemaking Board::EMMA
</title>
<!--meta http-equiv="X-UA-Compatible" content="IE=9"-->
<!--for xhtml validation-->
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><meta content="en-us" http-equiv="Content-Language"/>
<!--prevent MS Office toolbar from changing layout-->
<meta content="false" http-equiv="imagetoolbar"/><meta content="true" name="MSSmartTagsPreventParsing"/>
<link href="/Content/Images/favicon.ico?v=1.0.9200-139-P1" rel="shortcut icon" type="image/ico"/>
<!-- style sheets -->
<script src="/js/utility.js?v=1.0.9200-139-P1" type="text/javascript"></script>
<link href="/css/qtip.css?v=1.0.9200-139-P1" media="all" rel="stylesheet" type="text/css"/>
<link href="/css/base.css?v=1.0.9200-139-P1" media="all" rel="stylesheet" type="text/css"/>
<link href="/css/tabs.css?v=1.0.9200-139-P1" media="all" rel="stylesheet" type="text/css"/>
<link href="/css/gridStyle.css?v=1.0.9200-139-P1" media="all" rel="stylesheet" type="text/css"/>
<link href="/css/popUp.css?v=1.0.9200-139-P1" media="all" rel="stylesheet" type="text/css"/>
<link href="/css/jquery-ui.css?v=1.0.9200-139-P1" media="all" rel="stylesheet" type="text/css"/>
<link href="/css/jquery.ui.theme.css?v=1.0.9200-139-P1" media="all" rel="stylesheet" type="text/css"/>
<link href="/css/acct.css?v=1.0.9200-139-P1" media="all" rel="stylesheet" type="text/css"/>
<link href="../../images/apple-touch-icon.png" rel="apple-touch-icon"/><link href="../../images/apple-touch-icon-precomposed.png" rel="apple-touch-icon-precomposed"/><link href="../../images/apple-touch-icon-76x76.png" rel="apple-touch-icon" sizes="76x76"/><link href="../../images/apple-touch-icon-120x120.png" rel="apple-touch-icon" sizes="120x120"/><link href="../../images/apple-touch-icon-152x152.png" rel="apple-touch-icon" sizes="152x152"/><link href="../../images/apple-touch-icon-76x76-precomposed.png" rel="apple-touch-icon-precomposed" sizes="76x76"/><link href="../../images/apple-touch-icon-120x120-precomposed.png" rel="apple-touch-icon-precomposed" sizes="120x120"/><link href="../../images/apple-touch-icon-152x152-precomposed.png" rel="apple-touch-icon-precomposed" sizes="152x152"/><link href="https://fonts.googleapis.com/css?family=Open+Sans:400,300,300italic,400italic,600,600italic,700,700italic,800,800italic" rel="stylesheet" type="text/css"/>
<script src="/js/jquery-1.7.2.min.js" type="text/javascript"></script>
<script src="/js/jquery-ui-1.8.22.min.js" type="text/javascript"></script>
<script src="/Content/Css/Themes/bootstrap/js/bootstrap.min.js" type="text/javascript"></script>
<script src="/js/jquery.corner.js" type="text/javascript"></script>
<script src="/js/jquery.jshowoff.js" type="text/javascript"></script>
<script src="/js/jquery.qtip.min.js" type="text/javascript"></script>
<script src="/js/jquery.validate.min.js" type="text/javascript"></script>
<script src="/js/jquery.validate.unobtrusive.min.js" type="text/javascript"></script>
<script type="text/javascript">
        var dlf_Gateway_Url = 'https://gw.msrb.org/Gateway/Login?origin=emma&url=pIMod4sOqR%2BV9SOs82ged0h9Dx5lG%2Fp%2F%2BwrgeNRcL%2BK2OSgZV%2FCaWTj%2BWdaH29WtUUh1gN9hThieJT3GnTKsYrDN5Za8pQYXa%2FjGTqIv10o%3D';
        var dlf_EmmaSiteUrl = 'https://emma.msrb.org/subPath';
        var dlf_MyEmmaSignedIn = 'False';
    </script>
<script src="/js/myemmauseracct.js?v=1.0.9200-139-P1" type="text/javascript"></script>
<script src="/app/searchAhead.js?v=1.0.9200-139-P1" type="text/javascript"></script>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<!-- MainMaster.Master -->
<script type="text/javascript">
    var _gaq = _gaq || [];
    _gaq.push(['_setAccount', 'UA-36137316-1']);
    

    (function () {
        var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
        ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
        var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
    })();

    _gaq.push(['_setCustomVar', 1, 'UserRole', 'Undefined', 3]);
       
    
       
    _gaq.push(['_setCustomVar', 5, 'UserState', 'Undefined', 3]);       
    _gaq.push(['_trackPageview']);

    $(document).ready(function () {
        $("a[href*='.pdf']").click(function () {
            var docType = $(this).data('doctype');
            _gaq.push(['_trackPageview', location.pathname + '/ViewDocument/' + docType]);
        });
    });
        
    function SendToGA(code) {
        _gaq.push(['_trackPageview', location.pathname + '/' + code]);
    }
</script>
</head>
<body class="not-mvc-layout" onclick="showHide('','divHome');">
<form action="../../Disclaimer.aspx" id="aspnetForm" method="post" name="aspnetForm" onkeypress="javascript:return WebForm_FireDefaultButton(event, 'ctl00_Masthead_quickSearchButton')">
<div>
<input id="__EVENTTARGET" name="__EVENTTARGET" type="hidden" value=""/>
<input id="__EVENTARGUMENT" name="__EVENTARGUMENT" type="hidden" value=""/>
<input id="__VIEWSTATE" name="__VIEWSTATE" type="hidden" value="/wEPDwULLTIwNzMxNTM2MDgPZBYCZg9kFgICAw9kFgYCAQ9kFggCAQ8WAh4HVmlzaWJsZWgWAmYPZBYIZg8WAh4FVmFsdWUFAi0xZAIBDxYCHwEFAi0xZAICDxYCHwEFATBkAgMPZBYCZg9kFgICBQ8WAh4Fc3R5bGUFDWRpc3BsYXk6bm9uZTsWBAIBDw8WAh4EVGV4dGVkZAIDDxYCHwEFBUZhbHNlZAIDDw8WAh4NT25DbGllbnRDbGljawVFbG9jYXRpb24uaHJlZj0naHR0cHM6Ly9lbW1hLm1zcmIub3JnL0lzc3VlckhvbWVQYWdlL01hcCc7cmV0dXJuIGZhbHNlZGQCBQ8PFgQfAwUOVHJhZGUgQWN0aXZpdHkeC05hdmlnYXRlVXJsBR5+L1RyYWRlRGF0YS9Nb3N0QWN0aXZlbHlUcmFkZWRkZAIGDw8WAh8FBSR+L1Rvb2xzQW5kUmVzb3VyY2VzL01hcmtldEluZGljYXRvcnNkZAICD2QWAgIBD2QWAgIBD2QWAgIBDxYCHwMFDDIwMi04MzgtMTMzMGQCBA9kFgJmD2QWEmYPDxYCHwMFBDIwMjFkZAIBDw8WAh8DBQQyMDIxZGQCAg8PFgIfAwUEMjAyMWRkAgMPDxYCHwMFBDIwMjFkZAIEDw8WAh8DBQQyMDIxZGQCBQ8PFgIfAwUEMjAyMWRkAgYPDxYCHwMFBDIwMjFkZAIHDw8WAh8DBQQyMDIxZGQCCA8PFgIfAwUEMjAyMWRkGAEFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYBBSBjdGwwMCRNYXN0aGVhZCRxdWlja1NlYXJjaEJ1dHRvbj6VkiYwnZxc6J6xowZ46VbpatT0"/>
</div>
<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['aspnetForm'];
if (!theForm) {
    theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
        theForm.__EVENTTARGET.value = eventTarget;
        theForm.__EVENTARGUMENT.value = eventArgument;
        theForm.submit();
    }
}
//]]>
</script>
<script src="/WebResource.axd?d=G6NlKtB8LDRnBHsRDEklfGVua8PwEM6JG5i8zEQ8wH6EHWSkk8J32v4U1ZSwF4N9_iiW3flWMWdkGM7t7QzjKndK8981&amp;t=637453888754849868" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=gALOrUfhP_6_cKKdD4oFCBl_C1Y1YQq_lVs19J4DWwL9_BBxJ__-ZhKEDdoznjNwb46h3AtnmE08XTdssiSlNnZybGJ4Gg7TRLZqj8uiti21vnByfCYWfGsbhp4Hlr2NoBSeZ6UsCDxWR2_GtCGpyl4BwpM1&amp;t=363be08" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=LWi5yEgfUL58JGBCsev0oTYhc5x06B1azbvXYe9_in0PtFP0IiiVwYHd8T9zZhfGufOPfLKrCB9MB5ft6N4WpdMd7yfocoH_VFgjxeqSs0IfKk7XCgQmNkJeOM78v9ypfdcUjppKYK54YFTApA23jTU6_Usm9JZL795CGqhg-y8Jlekq0&amp;t=363be08" type="text/javascript"></script>
<div>
<input id="__VIEWSTATEGENERATOR" name="__VIEWSTATEGENERATOR" type="hidden" value="758A299B"/>
<input id="__EVENTVALIDATION" name="__EVENTVALIDATION" type="hidden" value="/wEdAAmSZfEGQnCBbE1/HHnWBNZrgJCcgKViQbVYcjwMgfW4OLoCKfLmGA+6UGP4rrv4mTdU2B9tTvPX5tFd7c/VSBbG1/d303CuLLgYjLL6ZwsaEd/wo/9qo5HSom6rHzi+3woSKegRS2vFThOzsPCSJN7hETZ3iJIV3Ry6zeTmucoKCcs8Z9UJYMXY8sFD7l7dBeyvX2cbna3wFNuzYd2OLbQvNgI6Zw=="/>
</div>
<script type="text/javascript">
//<![CDATA[
Sys.WebForms.PageRequestManager._initialize('ctl00$scriptManager', 'aspnetForm', [], [], [], 90, 'ctl00');
//]]>
</script>
<!--BEGIN Header-->
<!--BEGIN Header-->
<div class="header">
<div class="col-lg-4">
<a class="brand" href="https://emma.msrb.org/Home/Index">
<img alt="Electronic Municipal Market Access" src="/Content/Images/EMMA_Logo.png"/>
</a>
</div>
<div class="utilityMenu">
<ul class="nav nav-pills pull-right sec-nav" id="ctl00_Masthead_dlfUtilityMenu">
<script src="/JS/SessionTimeoutWarningModule.js?v=1.0.9200-139-P1" type="text/javascript"></script>
<div id="sessionExpiryPopup" style="display: none;">
<a aria-hidden="true" class="popup-close" id="popup1close">Close</a>
<div class="popup-content">
<p>Your session is about to expire due to inactivity.</p>
<p>
<button aria-hidden="true" class="grn-button-small" id="keep_session" type="button">Keep me signed in</button>
</p>
</div>
</div>
<input id="sessionExpireTimeInMilliSecond" name="ctl00$Masthead$dlfLogOnInfo$sessionExpireTimeInMilliSecond" type="hidden" value="-1"/>
<input id="sessionWarningTimeInMilliSecond" name="ctl00$Masthead$dlfLogOnInfo$sessionWarningTimeInMilliSecond" type="hidden" value="-1"/>
<input id="userSignedOut" name="ctl00$Masthead$dlfLogOnInfo$userSignedOut" type="hidden" value="0"/>
<li id="welcomeText"></li>
<li>
<a href="https://emma.msrb.org/EmmaHelp/EmmaHelp.aspx" id="emmaHelpLink">EMMA Help</a>
</li>
<li class="last-child">
<a href="https://emma.msrb.org/AboutEMMA/ContactUs.aspx" id="contactUsLink">Contact Us</a>
</li>
<div id="logOutDlg" style="display: none;">
<label>Are you sure you want to log out?</label>
</div>
<script type="text/javascript">
    $(document).ready(function () {
        $(function () {
            $("#logOutDlg").dialog({
                autoOpen: false,
                modal: true,
                title: false,
                draggable: false,
                closeOnEscape: true,
                resizable: false,
                position: { my: "center", at: "top", of: "body" },
                buttons: [
                    {
                        text: "Yes",
                        id: "logOutDlgYes",
                        click: function () {
                            $.removeCookie('meacctae', { path: '/' });
                            $.removeCookie('meacctae', { path: '/', domain: '.msrb.org' });
                            window.location = $("#logOutLink").attr("href");
                        }
                    },
                    {
                        text: "No",
                        id: "logOutDlgNo",
                        click: function () {
                            $(this).dialog("close");
                        }
                    }
                ]
            });
        });


        $("#logOutLink").click(function () {

            $("#logOutDlg").dialog("open");
            return false;
        });

    });


</script>
</ul>
<div id="ctl00_Masthead_headerSearchPanel" onkeypress="javascript:return WebForm_FireDefaultButton(event, 'ctl00_Masthead_quickSearchButton')">
<div class="rt w">
<span class="quickSearchErrorSpan" style="display: none">
<label class="alert alert-danger clearfix rt" id="quickSearchError"></label>
</span>
<ul class="nav pull-right clearfix">
<li class="clearBoth">
<input class="searchTextBox" id="ctl00_Masthead_quickSearchText" maxlength="80" name="ctl00$Masthead$quickSearchText" placeholder="Search by CUSIP, Description, State, etc." type="text"/>
<input class="qk-search-btn" id="ctl00_Masthead_quickSearchButton" name="ctl00$Masthead$quickSearchButton" onclick="return validateQuickSearch();" src="../../Content/Images/qk_search_btn.png" style="border-width:0px;" type="image"/></li>
</ul>
<div class="pull-right clearBoth"><a href="https://emma.msrb.org/Search/Search.aspx">Advanced Search</a></div>
</div>
</div>
</div>
</div>
<div class="mainNav">
<div class="container">
<ul class="nav nav-justified">
<li>
<a href="javascript:__doPostBack('ctl00$Masthead$browseIssuers','')" id="ctl00_Masthead_browseIssuers" onclick="location.href='https://emma.msrb.org/IssuerHomePage/Map';return false;">Browse Issuers</a></li>
<li class="dropdown" id="toolsAndResDropdown">
<a class="dropdown-toggle" href="#" id="toolsAndResLink">Tools and Resources <b class="caret hideCaret"></b></a>
<ul class="dropdown-menu pr ddmenu-fix">
<li>
<span id="overview"><a href="https://emma.msrb.org/ToolsAndResources/">Overview</a></span>
</li>
<li>
<span id="yieldCurveInd"><a href="https://emma.msrb.org/ToolsAndResources/MarketIndicators">Yield Curves and Indices</a></span>
</li>
<li>
<span><a href="https://emma.msrb.org/Main/GotoMyEmmaDashBoard" id="toolsMyEmmaLink">MyEMMA<sup>®</sup></a></span>
</li>
<li>
<span id="priceDis"><a href="https://emma.msrb.org/TradeData/PriceDiscovery">Compare Municipal Bonds</a></span>
</li>
<li>
<span id="newIssueCal"><a href="https://emma.msrb.org/ToolsAndResources/NewIssueCalendar">New Issue Calendar</a></span>
</li>
<li>
<span id="marketStat"><a href="https://emma.msrb.org/MarketActivity/ViewStatistics.aspx">Market Statistics</a></span>
</li>
<li>
<span class="end" id="ecoCal"><a href="#" id="fredEco">Economic Calendar</a></span>
</li>
</ul>
</li>
<li class="dropdown" id="marketActivityDropdown">
<a class="dropdown-toggle" href="../../UserControls/#" id="ctl00_Masthead_marketActivityLink">Market Activity <b class="caret hideCaret"></b></a>
<ul class="dropdown-menu pr">
<li>
<span>
<a href="../../TradeData/MostActivelyTraded" id="ctl00_Masthead_tradeDataLink">Trade Activity</a></span>
</li>
<li>
<span>
<a href="../../ToolsAndResources/MarketIndicators" id="ctl00_Masthead_YieldCurveIndicesLink">Yield Curves and Indices</a></span>
</li>
<li>
<span><a href="https://emma.msrb.org/MarketActivity/RecentOfficialstatements" id="officialStatements">Official Statements</a></span>
</li>
<li>
<span><a href="https://emma.msrb.org/MarketActivity/ViewStatistics.aspx" id="viewStatistics">Market Statistics</a></span>
</li>
<li>
<span><a href="https://emma.msrb.org/MarketActivity/RecentPS" id="recentPS_New">Pre-Sale Documents</a></span><br/>
</li>
<li>
<span><a href="https://emma.msrb.org/MarketActivity/RecentARS.aspx" id="recentARS">Auction Rate Securities</a></span>
</li>
<li>
<span>
<a href="https://emma.msrb.org/MarketActivity/RecentCD" id="recentCD">Continuing Disclosure</a>
</span>
</li>
<li>
<span>
<a href="https://emma.msrb.org/MarketActivity/RecentVRDO" id="recentVRDO">Variable Rate Demand Obligations</a>
</span>
</li>
<li>
<span>
<a href="https://emma.msrb.org/MarketActivity/RecentAR" id="recentAR">Refunding Information</a>
</span>
</li>
<li>
<span><a href="https://emma.msrb.org/MarketActivity/PoliticalContributions.aspx" id="politicalContributions">Political Contributions</a></span>
</li>
<li>
<span class="end">
<a href="https://emma.msrb.org/Search/Plan529.aspx" id="savingsPlan">529 Savings Plan/ABLE Program Disclosures</a>
</span>
</li>
</ul>
</li>
<li class="dropdown" id="myEMMADropdown">
<a class="dropdown-toggle" data-toggle="dropdown" href="../../UserControls/#" id="myEmmaLink">MyEMMA<sup>®</sup> <b class="caret hideCaret"></b></a>
<ul class="dropdown-menu me">
<li>
<span><a href="https://emma.msrb.org/Main/GotoMyEmmaDashBoard" id="ssoMyEmmaAlerts">MyEMMA Alerts</a></span>
</li>
<li>
<span class="end"><a href="https://emma.msrb.org/Main/GotoShowSavedSearch" id="ssoSavedSearches">Saved Searches</a></span>
</li>
</ul>
</li>
<li><a href="https://emma.msrb.org/Main/GotoEmmaDataportFromMyEmma" id="ssoDataportLink">EMMA Dataport</a></li>
<li class="zoomTool">
<a class="disabled" href="javascript: void(0)" id="zoomOut">A-</a>
<span class="zoomCounter" id="zoomResult">100%</span>
<a href="javascript: void(0)" id="zoomIn">A+</a>
</li>
</ul>
</div>
</div>
<div class="alert-warning systemsAlert" style="display:none">
<div class="container systemsAlert">
<p>
<label>Systems event. </label>
            Check the <a href="http://www.msrb.org/MSRB-System-Status.aspx" target="_blank">systems status</a> page.
        </p>
</div>
</div>
<div style="display:display">
<div class="yellow-alert"> <div class="container"> <p><a href="http://www.msrb.org/News-and-Events/COVID-19-Information.aspx" target="_blank">Click here for information on the MSRB response to COVID-19</a></p> </div> </div>
</div>
<!--[if lt IE 10]>
<div class="alert-warning systemsAlert" id="oldIEAlert" style="display: none; overflow: hidden;">
    
     <div class="container "><a href="javascript:void(0)" class="r closeIEAlert">Close</a>
      <p><label>Your web browser may not display all features of the EMMA website. Please consider upgrading to the latest version or using another browser for a better experience.</label></p>
     </div>
</div>
<![endif]-->
<div id="popupDlgEco" style="display: none;">
<div class="text">
<h3>Economic Calendar</h3>
<p>
            An economic release calendar provided by the Federal Reserve Bank of St. Louis is available at https://fred.stlouisfed.org/releases/calendar, which is a third-party website.
        </p>
<p>
            The MSRB is not affiliated with the Federal Reserve Bank of St. Louis. There is no MSRB sponsorship, approval or endorsement of the third-party website or its services
                                        (nor, vice versa, is there any by the third party of EMMA or its services).
        </p>
<p class="buttons">
            Do you wish to continue?
                                        <a autofocus="" class="grn-button" href="https://fred.stlouisfed.org/releases/calendar" target="_blank">Yes</a>
<a href="javascript: void(0);">No</a>
</p>
</div>
</div>
<div class="modal-popup" id="errorPopup" style="display: none;">
<div class="text">
<h3>An Error Has Occurred</h3>
<p class="error-text"></p>
<p class="error-detail"></p>
<p>If this error persists, you may provide feedback below. Please include details of what you were attempting to accomplish.</p>
<div class="buttons">
<div class="pull-right">
<input class="error-close-button grn-button-small" type="button" value="Close"/>
<a class="feedback-link" href="https://emma.msrb.org/AboutEMMA/ContactUs.aspx" target="_blank">Feedback</a>
</div>
</div>
</div>
</div>
<!--END Header-->
<script src="/js/jquery.cookie.js" type="text/javascript"></script>
<script type="text/javascript">
    var gaAcctNum = 'UA-36137316-1';
    $('#signOutLink').click(function() {
        $.ajax({
            type: "POST",
            url: GetCrossDomainUrl('https://emma.msrb.org//Account/SignOut'),
            success: function (data) {
                if (data.status === 0) {
                    console.log("Sign out failed: " + data.error);
                    return;
                }
                location.reload();
            },
            error: function (xhr) { alert(xhr.responseText); }
        });
    });

    function validateQuickSearch() {
        var status = false;
        var quickSearchText = $('#ctl00_Masthead_quickSearchText').val();
        var url = 'https://emma.msrb.org/QuickSearch/ValidateMastHeadInput' + '?searchTerms=' + encodeURIComponent(quickSearchText);
        $.ajax({
            url: url,
            type: "GET",
            async: false,
            success: function(data) {
                if (data.IsValid) {
                    status = true;
                    $('#quickSearchError').text('');
                    $('.quickSearchErrorSpan').hide();
                } else {
                    $('#quickSearchError').text(data.Message);
                    $('.quickSearchErrorSpan').show();
                }
            },
            error: function(jqXhr, textStatus, errorThrown) {
                window.__showAjaxError('Quick Search Validation: ajax call returned error', jqXhr, textStatus, errorThrown);
            }
        });
        return status;
    }

    $(document).ready(function () {
        window.onpaint = pageZoom.preload();
        
        if ($("#oldIEAlert").length > 0 && !$.cookie("old-ie")) {
            $("#oldIEAlert").show();
        }
        $(".closeIEAlert").click(function () {
            $("#oldIEAlert").hide();
            $.cookie("old-ie", "1", { expires: 1, path: '/' });
        });
        var isIPad = navigator.userAgent.match(/iPad/i) != null;
        if (isIPad) {
            $(".zoomTool").hide();
        }

        setupSearchAhead('#ctl00_Masthead_quickSearchText', 'https://emma.msrb.org/QuickSearch/SearchAhead', 'Undefined', 'UA-36137316-1');

        $("#zoomIn").click(function () {
            pageZoom.zoomIn();
        });

        $("#zoomOut").click(function () {
            pageZoom.zoomOut();
        });

        $('.hasDatepicker').on('click', function () {
            $(".calendar").position({
                my: 'left top',
                at: 'left bottom',
                of: $(this)
            });
        });

        $("#fredEco")
            .click(function () {
                $('#popupDlgEco')
                    .dialog({
                        modal: true,
                        width: 550,
                        resizable: false,
                        dialogClass: 'extLinkPopup'
                    });
                $('#popupDlgEco a')
                    .click(function () {
                        $("#popupDlgEco").dialog('close');
                    });
            });
        $("#myEmmaLink").attr("data-toggle", "dropdown");
        $("#toolsAndResLink").attr("data-toggle", "dropdown");


        $("#ctl00_Masthead_marketActivityLink").attr("data-toggle", "dropdown");
    });
</script>
<!--BEGIN Wrapper-->
<div class="wrapper" id="mainContentDiv">
<div id="pleaseWaitDiv" style="display: none;">
<div style="text-align: center; padding-top: 45px">
<img alt="Searching..." src="/images/ajax-loader-big.gif"/>
<div id="pleaseWaitTextDiv">Please wait...</div>
</div>
</div>
<div id="confirmSearchDiv" style="display: none;">
<div id="confirmSearchTextDiv"></div>
</div>
<noscript>
<p class="noscript">
<span>To properly display the EMMA website, you must have JavaScript enabled on your browser.</span>
</p>
</noscript>
<!--Begin ContentArea-->
<div class="contentArea">
<!--START breadCrumb-->
<div class="breadCrumb" id="ctl00_breadCrumbDiv">
<a href="/Home">Home</a><span class="divider">&gt;</span>
<span class="selected">Municipal Securities Rulemaking Board's Website Terms of Use</span>
</div>
<!--END breadCrumb-->
<div class="innerContentMainMaster">
<!--BEGIN SectionHeader-->
<h2 class="sectionHeader">
</html>

Process finished with exit code 0

我没有行了,所以我没有包含我的代码返回的所有HTML

我的代码返回的HTML代码不包含<pre>ul class="info-focus-2"</pre>

我的问题是:我如何或是否有办法在<pre>ul class="info-focus-2"</pre>下提取我想要的HTML数据?

对任何格式问题表示歉意。这是我第一次使用堆栈溢出。 谢谢你的帮助


Tags: thetexthttpsorgdividtypescript
1条回答
网友
1楼 · 发布于 2024-10-08 20:19:47

为什么不使用Chrome呢

#Clicks Accept to EMMA terms
AcceptButton2 = driver.find_element_by_xpath('//*[@id="ctl00_mainContentArea_disclaimerContent_yesButton"]')
AcceptButton2.click()

driver.implicitly_wait(10)

# #Pulls in the new URL for the specific CUSIP
# NewURL = driver.current_url

# #Loads the webpage contents
# r=requests.get(NewURL, headers=headers)

# Your may need to wait here using wait.until() or time.sleep() or similar to allow browser to fully load your content
time.sleep(3)
content = driver.page_source
#Convert to beautiful soup object
soup = bs(content, 'lxml')

print(soup)

相关问题 更多 >

    热门问题