<blockquote>
<p>My script is comparing lines but only finds a match when the entire line (including the frequencies and relative frequencies) matches exactly. I realize that that is because I'm finding the intersection between two entire sets but I have no idea how to do this differently.</p>
</blockquote>
<p>这正是字典的用途:当你有一个单独的键和值(或者只有部分值是键时)。所以:</p>
<pre><code>a_dict = {row[0]: row for row in alist}
b_dict = {row[0]: row for row in blist}
</code></pre>
<p>现在,您不能在字典上直接使用set方法。Python3在这里提供了一些帮助,但是您使用的是2.7。所以,你必须明确地写下:</p>
^{pr2}$
<p>或者:</p>
<pre><code>matches = set(a_dict) & set(b_dict)
</code></pre>
<p>但实际上并不需要集合;您只需要在这里迭代它们。所以:</p>
<pre><code>for key in a_dict:
if key in b_dict:
a_values = a_dict[key]
b_values = b_dict[key]
do_stuff_with(a_values[2], b_values[2])
</code></pre>
<hr/>
<p>作为一个补充说明,你真的不需要在一开始就建立列表,只是为了把它们变成集合,或者dicts。只需建立集合或指令:</p>
<pre><code>a_set = set()
with open("ngrams.csv", "rb") as fileA:
reader = csv.reader(fileA, delimiter=',')
for row in reader:
a_set.add(tuple(row))
a_dict = {}
with open("ngrams.csv", "rb") as fileA:
reader = csv.reader(fileA, delimiter=',')
for row in reader:
a_dict[row[0]] = row
</code></pre>
<p>另外,如果你知道理解,这三个版本都迫切需要转换:</p>
<pre><code>with open("ngrams.csv", "rb") as fileA:
reader = csv.reader(fileA, delimiter=',')
# Now any of these
a_list = list(reader)
a_set = {tuple(row) for row in reader}
a_dict = {row[0]: row for row in reader}
</code></pre>