在解析期间替换包含特殊字符的xml内容时

<NDSL_Articles> <Article><DC.Identifier><controlNumber>73113660</controlNumber> <controlNumber.source.BL>RN602387147</controlNumber.source.BL><controlNumber.source>JAKO201857968658354</controlNumber.source> <journal scheme="URL">http://society.kisti.re.kr/journal/kj_view.jsp?kj=HJTODO&soc=etri&ndsl=y</journal> <article scheme="URL">http://society.kisti.re.kr/journal/view.jsp?soc=etri&kj=HJTODO&py=2018&vol=40&iss=2&sp=283&ndsl=y</article> <article.source scheme="KOI">KISTI1.1003/JNL.JAKO201857968658354</article.source> <article.source scheme="URL">http://koix.kisti.re.kr/KISTI1.1003/JNL.JAKO201857968658354</article.source> <article scheme="DOI">http://dx.doi.org/10.4218/etrij.15.0114.0065</article> <article.source scheme="ACMS_CN2">etri/HJTODO_2018_v40n2_283</article.source> <paper scheme="ISSN">1225-6463</paper> <publicationID.source>HJTODO</publicationID.source> </DC.Identifier> <DC.Relation><isPartOf> <title>ETRI Journal</title> <volume>v.40 no.2</volume> <sourcePage>283-283</sourcePage> <startPage>283</startPage> <lastPage>283</lastPage> <type>Journal</type> </isPartOf></DC.Relation> <DC.Description><reference.count>0</reference.count></DC.Description> <DC.Format><Pages>1</Pages></DC.Format> <DC.Language><text scheme="USMARC">eng</text></DC.Language> <DC.Creator><personal><main>Hong, Kang Woon</main><affiliation>Department of Information and Communications Engineering, KAIST, Broadcasting & Telecommunications Media Research Laboratory, ETRI</affiliation><email>kangwoon@kaist.ac.kr, gwhong@etri.re.kr</email></personal><personal><main>Ryu, Won</main><affiliation>Broadcasting & Telecommunications Media Research Laboratory, ETRI</affiliation></personal></DC.Creator> <DC.Title><main>Corrigendum</main> </DC.Title> <DC.Publisher><main>Electronics and Telecommunications Research Institute</main><alternative>한국전자통신연구원</alternative></DC.Publisher> <DC.Date><created scheme="ISO 8601">2018-04-01</created></DC.Date> <DC.Type>Article</DC.Type> <NDSL.Usage scheme="freetext">eletronic</NDSL.Usage> <NDSL.Cataloging> <instituion scheme="Internal">BL</instituion> <source.version>KISTI XML기반의 학술정보 및 협회기술정보 가공 지침서 v.1.0</source.version> <date scheme="ISO 8601">2015-09-25T13:48:09</date> <name>BL</name> <instituion.lastUpdate scheme="Internal">NDSL 센터</instituion.lastUpdate> <date.lastUpdate scheme="ISO 8601">2018-07-12T11:17:45</date.lastUpdate> <name.lastUpdate>김순영</name.lastUpdate> </NDSL.Cataloging> </Article> <DC.DOI> <doi>http://dx.doi.org/10.4218/etrij.15.0114.0065</doi> </DC.DOI> </NDSL_Articles>

1条回答

网友

1楼 · 发布于 2024-10-16 17:16:03

基于您提供的少量信息，假设您的所有数据都具有相似的格式。我将使用BeautifulSoup来提取数据。你知道吗

from bs4 import BeautifulSoup
a = BeautifulSoup("<DC.Title><main>Characteristics of the interaction mechanism between tannic <acid> and sodium caseinate using multispectroscopic and thermodynamics methods</main></DC.Title>", "html.parser")
print(a.main)

例如，上面的代码打印出<main>的内容：

<main>Characteristics of the interaction mechanism between tannic <acid> and sodium caseinate using multispectroscopic and thermodynamics methods</acid></main>

像<acid>这样的标签会阻止我们使用a.text，我没有足够的信息来帮助你，所以你必须自己处理。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章