<p>学会接受Unicode…世界不再是ASCII了。在</p>
<p>假设您在Windows上,使用Excel或记事本查看.CSV,请在Python3上使用以下行。只需进行此更改(并修复帖子的缩进),您甚至可以正确查看非ASCII字符。记事本和Excel类似于文件开头的UTF-8bom签名,<code>utf-8-sig</code>提供了这个签名。在</p>
<pre><code>with open('usnwr_schools.csv', 'w', newline='', encoding='utf-8-sig') as f:
</code></pre>
<p>如果在另一个Python脚本中读取该文件,请确保使用以下命令读取该文件。您阅读的示例<code>b'University of Michigan\xe2\x80\x94\xe2\x80\x8bAnn Arbor'</code>是以二进制模式<code>'rb'</code>读取的。在</p>
^{pr2}$
<p>如果在Linux上,可以使用<code>utf8</code>而不是{<cd1>}。在</p>
<p>顺便说一句,您可以将循环替换为:</p>
<pre><code>with open('usnwr_schools.csv', 'w', newline='', encoding='utf-8-sig') as f:
writer = csv.writer(f)
for school in reqSoup:
x = reqSoup.find_all("a", {"class" : "school-name"})
for item in x:
y = item.get_text()
writer.writerow([y])
</code></pre>
<p>读回来:</p>
<pre><code>with open('usnwr_schools.csv',encoding='utf-8-sig') as f:
print(f.read())
</code></pre>
<p>输出:</p>
<pre class="lang-none prettyprint-override"><code>Massachusetts Institute of Technology
Stanford University
University of California—Berkeley
California Institute of Technology
Carnegie Mellon University
University of Michigan—Ann Arbor
Georgia Institute of Technology
University of Illinois—Urbana-Champaign
Purdue University—West Lafayette
University of Texas—Austin (Cockrell)
Texas A&M; University—College Station (Look)
Cornell University
University of Southern California (Viterbi)
Columbia University (Fu Foundation)
University of California—Los Angeles (Samueli)
University of California—San Diego (Jacobs)
Princeton University
Northwestern University (McCormick)
University of Pennsylvania
Johns Hopkins University (Whiting)
Virginia Tech
University of California—Santa Barbara
Harvard University
University of Maryland—College Park (Clark)
University of Washington
Massachusetts Institute of Technology
Stanford University
University of California—Berkeley
California Institute of Technology
Carnegie Mellon University
University of Michigan—Ann Arbor
Georgia Institute of Technology
University of Illinois—Urbana-Champaign
Purdue University—West Lafayette
University of Texas—Austin (Cockrell)
Texas A&M; University—College Station (Look)
Cornell University
University of Southern California (Viterbi)
Columbia University (Fu Foundation)
University of California—Los Angeles (Samueli)
University of California—San Diego (Jacobs)
Princeton University
Northwestern University (McCormick)
University of Pennsylvania
Johns Hopkins University (Whiting)
Virginia Tech
University of California—Santa Barbara
Harvard University
University of Maryland—College Park (Clark)
University of Washington
Massachusetts Institute of Technology
Stanford University
University of California—Berkeley
California Institute of Technology
Carnegie Mellon University
University of Michigan—Ann Arbor
Georgia Institute of Technology
University of Illinois—Urbana-Champaign
Purdue University—West Lafayette
University of Texas—Austin (Cockrell)
Texas A&M; University—College Station (Look)
Cornell University
University of Southern California (Viterbi)
Columbia University (Fu Foundation)
University of California—Los Angeles (Samueli)
University of California—San Diego (Jacobs)
Princeton University
Northwestern University (McCormick)
University of Pennsylvania
Johns Hopkins University (Whiting)
Virginia Tech
University of California—Santa Barbara
Harvard University
University of Maryland—College Park (Clark)
University of Washington
Massachusetts Institute of Technology
Stanford University
University of California—Berkeley
California Institute of Technology
Carnegie Mellon University
University of Michigan—Ann Arbor
Georgia Institute of Technology
University of Illinois—Urbana-Champaign
Purdue University—West Lafayette
University of Texas—Austin (Cockrell)
Texas A&M; University—College Station (Look)
Cornell University
University of Southern California (Viterbi)
Columbia University (Fu Foundation)
University of California—Los Angeles (Samueli)
University of California—San Diego (Jacobs)
Princeton University
Northwestern University (McCormick)
University of Pennsylvania
Johns Hopkins University (Whiting)
Virginia Tech
University of California—Santa Barbara
Harvard University
University of Maryland—College Park (Clark)
University of Washington
Massachusetts Institute of Technology
Stanford University
University of California—Berkeley
California Institute of Technology
Carnegie Mellon University
University of Michigan—Ann Arbor
Georgia Institute of Technology
University of Illinois—Urbana-Champaign
Purdue University—West Lafayette
University of Texas—Austin (Cockrell)
Texas A&M; University—College Station (Look)
Cornell University
University of Southern California (Viterbi)
Columbia University (Fu Foundation)
University of California—Los Angeles (Samueli)
University of California—San Diego (Jacobs)
Princeton University
Northwestern University (McCormick)
University of Pennsylvania
Johns Hopkins University (Whiting)
Virginia Tech
University of California—Santa Barbara
Harvard University
University of Maryland—College Park (Clark)
University of Washington
</code></pre>
<p>如果仍希望仅使用ASCII,则可以执行以下操作:</p>
<pre><code>import requests
import bs4
import csv
results = requests.get('http://grad-schools.usnews.rankingsandreviews.com/best-graduate-schools/top-engineering-schools/eng-rankings?int=a74509')
replacements = {ord('\N{EN DASH}'):'-',
ord('\N{EM DASH}'):'-',
ord('\N{ZERO WIDTH SPACE}'):None}
reqSoup = bs4.BeautifulSoup(results.text, "html.parser")
with open('usnwr_schools.csv', 'w', newline='', encoding='ascii') as f:
writer = csv.writer(f)
for school in reqSoup:
x = reqSoup.find_all("a", {"class" : "school-name"})
for item in x:
y = item.get_text()
writer.writerow([y.translate(replacements)])
with open('usnwr_schools.csv',encoding='ascii') as f:
print(f.read())
</code></pre>