擅长:python、mysql、java
<p>这是一个基于机器渴望和华夫悖论评论的解决方案。
使用<a href="http://docs.python.org/library/glob.html" rel="nofollow">glob</a>生成项目列表,并将它们作为列表传递给XMLCorpusReader:</p>
<pre><code>from glob import glob
import re
years = glob('nltk_data/corpora/nytimes_test/*')
year_months = []
for year in years:
year_months += glob(year+'/*')
print year_months
days = []
for year_month in year_months:
days += glob(year_month+'/*')
articles = []
for day in days:
articles += glob(day+'/*.xml')
file_ids = []
for article in articles:
file_ids.append(re.sub('nltk_data/corpora/nytimes_test','',article))
reader = XMLCorpusReader('nltk_data/corpora/nytimes_test', articles)
</code></pre>