<p>为了帮助理解,我将这个问题分解成更小、更易于管理的任务:</p>
<ul>
<li>从两个已排序的csv文件的第一列中读取电话号码。在</li>
<li>查找出现在两个电话号码列表中的重复号码。在</li>
</ul>
<p>读取电话号码是一个可重复使用的功能,因此我们将其分开:</p>
<pre class="lang-py prettyprint-override"><code>def read_phone_numbers(file_path):
file_obj = open(file_path, 'r')
phone_numbers = []
for row in csv.reader(file_obj):
phone_numbers.append(row[0])
file_obj.close()
return phone_numbers
</code></pre>
<p>对于查找重复项的任务,<a href="https://docs.python.org/3/tutorial/datastructures.html#sets" rel="nofollow noreferrer">^{<cd1>}</a>是一个有用的工具。<em>来自python文档:</em></p>
<blockquote>
<p>A set is an unordered collection with no duplicate elements.</p>
</blockquote>
^{pr2}$
<p>总而言之:</p>
<pre class="lang-py prettyprint-override"><code>def main(credit_csv_path, purchase_csv_path, out_csv_path):
credit_nums = read_phone_numbers(credit_csv_path)
purchase_nums = read_phone_numbers(purchase_csv_path)
duplicates = find_duplicates(credit_nums, purchase_nums)
with open(out_csv_path, 'w') as file_obj:
writer = csv.DictWriter(
file_obj,
fieldnames=['phone_number', 'credit_count', 'purchase_count'],
)
writer.writerows(duplicates)
</code></pre>
<p>如果需要处理数百倍大的文件,可以查看<a href="https://docs.python.org/3/library/collections.html?highlight=collections#collections.Counter" rel="nofollow noreferrer">the ^{<cd2>} module</a>。在</p>