使用beautifulsoup解析特定<div>中没有id/类的表

2024-10-06 11:29:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图从一个HTML页面的一部分解析数据,我用beautifulsoup抓取了这个页面。在

我有两个问题。网站上有很多表,但是它们没有唯一的标识符来方便地导航。在

我试过以下几点作为一个测试,但没有成功:

for tag in soup.find_all('div'):
    print tag.find('span')

这甚至找不到我抓取的页面中的div。在

我做错什么了?在

编辑 已更新源文件。在

我的代码返回一个错误TypeError:expected string or buffer是:

^{pr2}$

显然,这里的data是包含上述内容的文本文件。在


编辑:我要刮的页面。在

<!-- Copyright (c) 2001 TrakHealth Pty Limited. ALL RIGHTS RESERVED. -->
<!-- This is a generic page used to display single simple components  -->

<HTML XMLNS=TRAK>
<HEAD>
<TITLE></TITLE>

 <SCRIPT SRC="/csp/broker/cspbroker.js"></SCRIPT><SCRIPT SRC="/csp/broker/cspxmlhttp.js"></SCRIPT>
<SCRIPT SRC="../scripts/websys.js"></SCRIPT>
<SCRIPT SRC="../custom/NHLS-LABTRAK/scripts/websys.js"></SCRIPT>
<LINK REL="stylesheet" TYPE="text/css" HREF="../styles/modern/websys.css">
<LINK REL="stylesheet" TYPE="text/css" HREF="../custom/NHLS-LABTRAK/scripts/websys.css">

<SCRIPT language='javascript'>
var TRELOADPAGE='websys.csp';
var TRELOADID='sRi1272LJScu4';
var tkKeepOpen=0;
function treload(csppage) {
 tkKeepOpen=1;
 window.location.href= "websys.csp?TRELOADID=sRi1272LJScu4&TRELOAD=1";
}
var TRELOADPATLIST='';
</SCRIPT>

<script language=javascript>
var t=new Array();
var tsc=new Array();
var session=new Array();
t['XMISSING']='is a required field but has not been entered';
t['XINVALID']='does not have a valid entry';
t['XDATE']='is a date field but does not have a valid date entered';
t['XTIME']='is a time field but does not have a valid time entered';
t['XNUMBER']='is a numeric field but does not have a valid number entered';
t['XLAYOUTERR']='The TrakCare Layout Editor is not functioning or has not been installed.\\n\\n 1. Please check that your browser security settings allow you to initialize and script activeX controls.\\n 2. Please check that the TRAK Layout Editor has been installed.';
t['XLOCKED']='Record is locked by another user.';
t['XLOCKEDCT']='Code table updates are currently disabled.';
t['XNOTCT']='You must connect to the code table server to update code tables.';
t['XLOCKEDMT']='Record is locked by MEDTRAK.';
t['XDAYS']='Sun,Mon,Tue,Wed,Thu,Fri,Sat';
t['XMONTHS']='January,February,March,April,May,June,July,August,September,October,November,December';
t['XUNSAVED']='There are unsaved changes on this page.';
t['XSORTMAXROWS']='Rows retrieved for sorting exceeds maximum specified in system configuration. Sort terminated.';
t['XDATERANGE']='Date From must be before Date To';
t['XMAXCHARS']='Maximum number of characters exceeded. Text will be truncated to the mamimum allowable length.';
t['XNOTAVAIL']='This functionality is not available.';
t['XUNIQUE']='Code or description is not unique.';
t['XLOADING']='Loading...';
session['LOGON.USERID']='IN0546623';
session['LOGON.USERCODE']='IN0546623';
session['LOGON.USERNAME']='Dr Kate de Villiers';
session['LOGON.GROUPID']='17';
session['LOGON.GROUPDESC']='L0A0';
session['LOGON.CTLOCID']='';
session['XMONTHSSHORT']='Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec';
session['CONTEXT']='W5';
window.status=session['LOGON.USERCODE'];
session['LOGON.SITECODE']='NHLS-LABTRAK';
session['LOGON.REGION']='';
session['LOGON.LANGID']='2';
session['REMOTE_ADDR']='41.13.90.246';
session['SESSIONID']='sRi12LJScu';

function tkMakeServerCall(tkclass,tkmethod) {
 if ((tkclass=='')||(tkmethod=='')) return '';
 var args=new Array('6$Q4nJfQIibx6KRykS2G7fwIO_0CynE1PGbXHXH99mj48F9stJAfA0mxb2lAcJz4',tkclass,tkmethod);
 for (var i=2; i<tkMakeServerCall.arguments.length; i++) {
  args[i+1]=tkMakeServerCall.arguments[i];
 }
 var retval=cspHttpServerMethod.apply(this,args);
 return retval;
}
var tkTUIDP="28"
var tkTUIDG="R$J3Um0CXoe6SMG0APDutHiKdaJienpZcmL2DLLWSok-"
var tkTUIDS="jJnWSu4mVW_TjQfKn6qV91tHsBeG9DAKkdbZxw3gyrk-"
var tkOverlayMethod="b9XqGIkzXAsNiiMNrAW4qI2Pw8JI3a9uQfBtHGd5VbOkMojLvsvmef1ULxa3sbuS"
var tkLongTextMethod="LU9BZtom$6x_UyWrWuudlFjUitSkkO9trUE_sYUqAlCekHzkMYqX0A9AQuZIEkaJ"
</script>

</HEAD>
<BODY><DIV id="PageContent">
<INPUT TYPE="Button" value="<<" onClick="history.back()"><INPUT TYPE="Button" value=">>" onClick="history.forward()">
<DIV id='cmp_DEBDebtor_Banner'><!-- COMPONENT 
Routine                  GCOM3.1
Page Name                websys.default.csp
Component ID             69
Component Name           DEBDebtor.Banner
websys.Component Version .51.1531 at 2015-07-14 03:14:18PM
Component Version        L2010.1.1 on 2015-07-14 03:14:52PM
Layout for               SYS.SYS 
-->
<DIV STYLE="LEFT: 0px; TOP: 0px" id='dDEBDebtor_Banner' onclick="websys_sckeys[String.fromCharCode(113)]='websys_help(\'69\',\'-100000000000000\',\'\');'"><FORM ACTION='websys.csp' method=post name='fDEBDebtor_Banner' id='fDEBDebtor_Banner' autocomplete='off'>
<INPUT TYPE='HIDDEN' ID='TFORM' NAME='TFORM' VALUE='DEBDebtor.Banner'>
<INPUT TYPE='HIDDEN' ID='TPAGID' NAME='TPAGID' VALUE='sRi5912LJScu3'>
<INPUT TYPE='HIDDEN' ID='TEVENT' NAME='TEVENT' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TXREFID' NAME='TXREFID' VALUE='3'>
<INPUT TYPE='HIDDEN' ID='TOVERRIDE' NAME='TOVERRIDE' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TDIRTY' NAME='TDIRTY' VALUE='1'>
<INPUT TYPE='HIDDEN' ID='TWKFL' NAME='TWKFL' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TWKFLI' NAME='TWKFLI' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TFRAME' NAME='TFRAME' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TWKFLL' NAME='TWKFLL' VALUE="">
<INPUT TYPE='HIDDEN' ID='TWKFLJ' NAME='TWKFLJ' VALUE="">
<INPUT TYPE='HIDDEN' ID='TREPORT' NAME='TREPORT' VALUE="">
<INPUT TYPE='HIDDEN' ID='TRELOADCMP' NAME='TRELOADCMP' VALUE="">
<INPUT TYPE='HIDDEN' ID='TRELOADID' NAME='TRELOADID' VALUE="sRi1272LJScu4">
<INPUT TYPE='HIDDEN' ID='TOVERLAY' NAME='TOVERLAY' VALUE=''>
<input id="UseSameWin" name="UseSameWin" type="hidden" value="">
<TABLE><TBODY><TR><TD></TD><TD>
<table border='1px'><tr><td><table border='1px' cellspacing='0px' width='940px'><tr><td width='105px' ><span style='font-family:Arial; font-size:10pt; color:#000000;'> Episode No.</span></td><td width='139px' style='background-color:#0000FF;'><span style='font-weight:bold; color:#FFFF00;'>  PK01150438</span></td><td width='48px' ><span style='color:#000000;'> MRN</span></td><td width='192px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#00FFFF;'>  MRN46913203</span></td><td width='39px' ><span style='color:#000000;'> Lab</span></td><td width='398px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  Nelspruit Laboratory</span></td><td width='6px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='495px' ><span style='font-size:14pt; font-weight:bold; color:#0000FF;'> Unknown MALE</span></td><td width='37px' style='background-color:#0A246A;'><span style='font-size:10pt; font-weight:bold; color:#D4D0C8;'>  M</span></td><td width='88px' style='background-color:#0A246A;'><span style='font-size:9pt; font-weight:bold; color:#D4D0C8;'>  27 y</span></td><td width='130px' style='background-color:#0A246A;'><span style='font-size:10pt; font-weight:bold; color:#D4D0C8;'>  01/01/1988</span></td><td width='176px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  </span></td><td width='6px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='418px' ><span style='font-size:8pt; color:#000000;'>  Clotted blood;EDTA blood</span></td><td width='339px' ><span style='font-size:8pt; color:#000000;'>  1</span></td><td width='178px' style='background-color:#0A246A;'><span style='font-size:10pt; font-weight:bold; color:#D4D0C8;'>  Routine</span></td><td width='3px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='40px' ><span style='font-size:10pt; color:#000000;'> F.N.</span></td><td width='154px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  08/120456</span></td><td width='58px' ><span style='color:#000000;'> Ref No</span></td><td width='402px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  AAWC0287NOF</span></td><td width='91px' ><span style='font-size:10pt; color:#000000;'> Collection</span></td><td width='115px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  02/07/2015</span></td><td width='54px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  00:30</span></td><td width='5px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='40px' ><span style='font-size:10pt; color:#000000;'> Hosp</span></td><td width='349px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  Rob Ferreira Hospital</span></td><td width='26px' ><span style='color:#000000;'> </span></td><td width='240px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  013 741 3031</span></td><td width='91px' ><span style='color:#000000;'> Received</span></td><td width='115px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  02/07/2015</span></td><td width='54px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  02:45</span></td><td width='6px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='40px' ><span style='font-size:10pt; color:#000000;'>  </span></td><td width='349px' style='background-color:#0A246A;'><span style='font-size:10pt; font-weight:bold; color:#D4D0C8;'>  Ward 4</span></td><td width='26px' ><span style='color:#000000;'> </span></td><td width='158px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  </span></td><td width='362px'></td></tr></table><table border='1px' cellspacing='0px' width='940px'><tr><td width='40px' ><span style='font-size:10pt; color:#000000;'> Doc</span></td><td width='349px' style='background-color:#0A246A;'><span style='font-weight:bold; color:#D4D0C8;'>  DR IN CHARGE </span></td><td width='23px' ><span style='color:#000000;'> </span></td><td width='525px'></td></tr></table></td></tr></table></TD></TR></TBODY></TABLE>
</FORM>
<SCRIPT SRC="../scripts_gen/debdebtor.banner.js"></SCRIPT>
<SCRIPT SRC="../scripts/debdebtor.banner.js"></SCRIPT>
<SCRIPT SRC="../custom/NHLS-LABTRAK/scripts/debdebtor.banner.js"></SCRIPT>
<SCRIPT>
t['DemographicPanel']='Demographic Panel';
websys_sckeys[String.fromCharCode(113)]='websys_help(\'69\',\'-100000000000000\',\'\');';
websys_sckeys[String.fromCharCode(220)]='if (top.frames[\'eprmenu\']) top.frames[\'eprmenu\'].ToggleMenu(null);';
</SCRIPT>
</DIV>
<!-- COMPONENT END DEBDebtor.Banner -->
<SCRIPT language=javascript>
  try { InitMe(); } catch(e) {};
</SCRIPT>

</DIV>


<DIV id='cmp_web_EPVisitTestSet_FullLabPreview'><!-- COMPONENT 
Routine                  GCOM66.1
Page Name                websys.default.csp
Component ID             79
Component Name           web.EPVisitTestSet.FullLabPreview
websys.Component Version .51.1531 at 2015-07-14 03:14:18PM
Component Version        L2010.1.1 on 2015-07-14 03:14:52PM
Layout for               SYS.SYS 
-->
<DIV STYLE="LEFT: 0px; TOP: 0px" id='dweb_EPVisitTestSet_FullLabPreview' onclick="websys_sckeys[String.fromCharCode(113)]='websys_help(\'79\',\'-100000000000000\',\'\');'"><FORM ACTION='websys.csp' method=post name='fweb_EPVisitTestSet_FullLabPreview' id='fweb_EPVisitTestSet_FullLabPreview' autocomplete='off'>
<INPUT TYPE='HIDDEN' ID='TFORM' NAME='TFORM' VALUE='web.EPVisitTestSet.FullLabPreview'>
<INPUT TYPE='HIDDEN' ID='TPAGID' NAME='TPAGID' VALUE='sRi12LJS61cu8'>
<INPUT TYPE='HIDDEN' ID='TEVENT' NAME='TEVENT' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TXREFID' NAME='TXREFID' VALUE='66'>
<INPUT TYPE='HIDDEN' ID='TOVERRIDE' NAME='TOVERRIDE' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TDIRTY' NAME='TDIRTY' VALUE='1'>
<INPUT TYPE='HIDDEN' ID='TWKFL' NAME='TWKFL' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TWKFLI' NAME='TWKFLI' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TFRAME' NAME='TFRAME' VALUE=''>
<INPUT TYPE='HIDDEN' ID='TWKFLL' NAME='TWKFLL' VALUE="">
<INPUT TYPE='HIDDEN' ID='TWKFLJ' NAME='TWKFLJ' VALUE="">
<INPUT TYPE='HIDDEN' ID='TREPORT' NAME='TREPORT' VALUE="">
<INPUT TYPE='HIDDEN' ID='TRELOADCMP' NAME='TRELOADCMP' VALUE="">
<INPUT TYPE='HIDDEN' ID='TRELOADID' NAME='TRELOADID' VALUE="sRi1272LJScu4">
<INPUT TYPE='HIDDEN' ID='TOVERLAY' NAME='TOVERLAY' VALUE=''>
<TABLE><TBODY><TR><TD></TD><TD colSpan=3>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Sodium</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Potassium</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Chloride</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Bicarbonate</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Urea</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Creatinine</span><span style='font-weight:bold; font-family:Courier New,monospace; font-size:9pt; '>                       </span><span style='font-weight:bold; color:RED;font-family:Courier New,monospace; font-size:9pt; '>       116 H</span><span style='font-family:Courier New,monospace; font-size:9pt; '>    umol/L                    64 - 104</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       eGFR (MDRD formula)                     >60      mL/min/1.73 m2</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            MDRD-derived estimation of GFR may significantly underestimate true GFR</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            in patients with GFR > 60 mL/min/1.73m^2.  It may also be unreliable in</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            the case of: age <18 years or >70 years; pregnancy; serious co-morbid</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            conditions; acute renal failure; extremes of body habitus/unusual diet,</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            gross oedema. The MDRD-eGFR used here does not employ an ethnic factor</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            for race.</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            </span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Calcium                                2.44      mmol/L                  2.15 - 2.55</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Magnesium                              0.88      mmol/L                  0.63 - 1.05</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Inorganic phosphate</span><span style='font-weight:bold; font-family:Courier New,monospace; font-size:9pt; '>              </span><span style='font-weight:bold; color:RED;font-family:Courier New,monospace; font-size:9pt; '>      1.47 H</span><span style='font-family:Courier New,monospace; font-size:9pt; '>    mmol/L                  0.78 - 1.42</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Total protein                            77      g/L                       60 - 78</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Albumin                                  48      g/L                       35 - 52</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Total bilirubin                           8      umol/L                     5 - 21</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Conjugated bilirubin (DBil)               1      umol/L                     0 - 3</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Alanine transaminase (ALT)</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Aspartate transaminase (AST)             26      U/L                       15 - 40</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Alkaline phosphatase (ALP)               61      U/L                       53 - 128</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 04/07/2015  at 17:02</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Gamma-glutamyl transferase (GGT)         18      U/L               <68</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Thyroid stimulating hormone (TSH)</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Thyroxine (free T4)</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>            Specimen insufficient for test(s)</span>
</pre>
<pre>
</pre>
<pre>
<span style='color:ORANGE;font-family:Courier New,monospace; font-size:9pt; '>            Authorised by xx  on 02/07/2015  at 08:23</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       White Cell Count             9.75   x 109/L                              3.92 - 10.40</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Red Cell Count               5.29   x 1012/L                             4.19 - 5.85</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Haemoglobin                  15.9   g/dL                                 13.4 - 17.5</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Haematocrit                 0.474   L/L                                 0.390 - 0.510</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       MCV                          89.6   fL                                   83.1 - 101.6</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       MCH                          30.0   pg                                   27.8 - 34.8</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       MCHC                         33.5   g/dL                                 33.0 - 35.0</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       RDW                          13.6   %                                    12.1 - 16.3</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Platelet Count                217   x 109/L                               171 - 388</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       MPV                          10.0   fL                                    7.1 - 11.0</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Neutrophils                 65.60   %              6.40   x 109/L        1.60 - 6.98                                                                32.00 - 76.00</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Lymphocytes                 25.90   %              2.53   x 109/L        1.40 - 4.20                                                                   18.00 - 56.00</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Monocytes                    5.60   %              0.55   x 109/L        0.30 - 0.80                                                                    4.00 - 12.00</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Eosinophils                  0.50   %              0.05   x 109/L        0.00 - 0.95                                                                    0.00 - 8.00</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       Basophils                    0.20   %              0.02   x 109/L        0.00 - 0.10                                                                    0.00 - 2.00</span>
<span style='font-family:Courier New,monospace; font-size:9pt; '>       "Other" Cells                2.10   %              0.20   x 109/L</span>
</pre>
<INPUT TYPE="HIDDEN" NAME="UnreadTSList" VALUE="PK01150438||C012||1,PK01150438||C013||1,PK01150438||C014||1,PK01150438||C015||1,PK01150438||C017||1,PK01150438||C002||1,PK01150438||C051||1,PK01150438||C053||1,PK01150438||C054||1,PK01150438||C056||1,PK01150438||C057||1,PK01150438||C058||1,PK01150438||C059||1,PK01150438||C060||1,PK01150438||C061||1,PK01150438||C062||1,PK01150438||C063||1,PK01150438||C150||1,PK01150438||C151||1,PK01150438||H002||1"><INPUT TYPE="HIDDEN" NAME="UnviewedTSList" VALUE="PK01150438||C012||1,PK01150438||C013||1,PK01150438||C014||1,PK01150438||C015||1,PK01150438||C017||1,PK01150438||C002||1,PK01150438||C051||1,PK01150438||C053||1,PK01150438||C054||1,PK01150438||C056||1,PK01150438||C057||1,PK01150438||C058||1,PK01150438||C059||1,PK01150438||C060||1,PK01150438||C061||1,PK01150438||C062||1,PK01150438||C063||1,PK01150438||C150||1,PK01150438||C151||1,PK01150438||H002||1"></TD></TR><TR><TD></TD><TD>
<a href="#" id="MarkAllReadLink" name="MarkAllReadLink" tabIndex="1">Mark all unread as read</A>
</TD><TD></TD><TD>
<a href="#" id="MarkAllViewedLink" name="MarkAllViewedLink" tabIndex="1">Mark all unviewed as viewed</A>
</TD></TR><TR><TD></TD><TD>
<A id="PrintReport" name="PrintReport" href='#' onclick="websys_lu('websys.csp?TUID=63&TUID=28',false,'top=30,left=20,width=800,height=600');return false;" TVARS="d79iPrintReport^sRi12L63JScu6" TUID='63'  tabIndex=1>Print Report</A>
</TD><TD>&nbsp;&nbsp; &nbsp;&nbsp; </TD><TD></TD></TR></TBODY></TABLE>
</FORM>
<SCRIPT language=javascript id='websysMoveJS64'>
websys_move(eval(screen.availWidth-760)/2,eval(screen.availHeight-540)/2,760,540);
</SCRIPT>
<SCRIPT SRC="../scripts_gen/web.epvisittestset.fulllabpreview.js"></SCRIPT>
<SCRIPT SRC="../scripts/web.epvisittestset.fulllabpreview.js"></SCRIPT>
<SCRIPT SRC="../custom/NHLS-LABTRAK/scripts/web.epvisittestset.fulllabpreview.js"></SCRIPT>
<SCRIPT>
t['result']='result';
t['MarkAllReadLink']='Mark all unread as read';
t['MarkAllViewedLink']='Mark all unviewed as viewed';
t['PrintReport']='Print Report';
websys_sckeys[String.fromCharCode(113)]='websys_help(\'79\',\'-100000000000000\',\'\');';
websys_sckeys[String.fromCharCode(220)]='if (top.frames[\'eprmenu\']) top.frames[\'eprmenu\'].ToggleMenu(null);';
</SCRIPT>
</DIV>
<!-- COMPONENT END web.EPVisitTestSet.FullLabPreview -->
<SCRIPT language=javascript>
  try { InitMe(); } catch(e) {};
</SCRIPT>

</DIV>

</DIV></BODY>
</HTML>

Tags: newinputsizestyletypewidthfamilypre
2条回答

我不知道你为什么会有问题,但你没有显示所有的代码。您需要做的一件事是将tag.find_all('span')替换为标记。以下是工作代码:

bs = """ (your HTML code from above)  """
soup = BeautifulSoup(bs)

for tag in soup.find_all('div'):
    for span in tag.find_all('span'):
        print span.text

生成以下内容:

^{pr2}$

修好了。在

我把一个大的html文件放到BeautifulSoup中。这非常有效,其中div name是我真正想要从文档中获取的信息。在

## create new beautiful soup page with last page source, element containing results
html_source = driver.find_element_by_id("dweb_EPVisitTestSet_FullLabPreview").text
soup = BeautifulSoup(html_source,'html.parser')
driver.quit()
print soup

它产生这样的输出:

^{pr2}$

这正是我想要的。现在只需格式化所有文本并将其放入字符串数组中。。。。在

相关问题 更多 >