<pre><code>import pandas as pd
def collect_to_set(grp): # 3
return set.union(*[set(row.split()) for row in grp['Address']])
data = pd.read_table('data', sep='\s{2,}') # 1
result = data.groupby(['Area']).apply(collect_to_set) # 2
print(result) # 4
# Area
# mahadevapura set([ballet, outer, road, ring, d1001, akme])
# vasanth nagar set([cant, station, railway, cantonment])
# whitefield set([hotel,room, sap, fortune, villa, no, oppo...
# dtype: object
print(result.to_dict()) # 5
# {'vasanth nagar': set(['cant', 'station', 'railway', 'cantonment']),
# 'mahadevapura': set(['ballet', 'outer', 'road', 'ring', 'd1001', 'akme']),
# 'whitefield': set(['hotel,room', 'sap', 'fortune', 'villa', 'no', 'opposite',
# 'palm', 'labs,', '4112', 'medose', '106/107'])}
</code></pre>
<ol>
<li>我使用<code>read_table</code>将数据片段加载到数据帧中。
因为已经有了<code>data</code>作为数据帧,所以当然没有
我需要这条线。你知道吗</li>
<li>这是主线。它将<code>data</code>按<code>Area</code>分组,然后调用
每个组的<code>collect_to_set</code>函数<code>grp</code>。你知道吗</li>
<li>在<code>collect_to_set</code>中,<code>grp</code>是<code>data</code>(带有all)的子数据帧
具有相同<code>Area</code>)的行。它返回所有单词的<code>set</code>
<code>grp['Address']</code>的行。你知道吗</li>
<li><code>result</code>是<code>Series</code>。你知道吗</li>
<li>如果你想要一个dict,就用<code>result.to_dict()</code>。你知道吗</li>
</ol>