<p>创建一个不包括城市的短团队名称列表,然后扫描标题以查找这些短名称。您应该在每个标题中找到两个简短的团队名称,然后可以使用它们将标题分组为独特的游戏</p>
<pre><code>team_long_names = ['Arizona Cardinals', 'Atlanta Falcons', 'Carolina Panthers', 'Chicago Bears',
'Dallas Cowboys', 'Detriot Lions','Green Bay Packers','Los Angeles Rams',
'Minnesota Vikings','New Orleans Saints','New York Giants', 'Philadelphia Eagles',
'San Francisco 49ers','Seattle Seahawks','Washington Redskins','Baltimore Ravens',
'Buffalo Bills','Cinncinnati Bangals','Cleveland Browns','Denver Broncos',
'Houston Texans','Indanapolis Colts','Jacksonville Jaguars','Kansas City Chiefs',
'Las Vegas Raiders','Los Angeles Chargers','Miami Dolphins','New England Patriots',
'New York Jets','Pittsburgh Steelers','Tennessee Titans']
team_short_names = [n.lower().split(' ')[-1] for n in team_long_names]
game_titles = ['Atlanta Falcons vs New York Jets', 'ATL Falcons vs NY Jets', 'Falcons v Jets',
'SF 49ers vs PIT Steelers', 'San Fransico 49ers vs Pittsburg Steelers', '49ers vs Steelers',
'Dallas Cowboys vs LA Chargers', 'DAL Cowboys vs Los Angles Chargers', 'Cowboys v Chargers',
'Blah blah Falcons and Foo bar Jets']
titles_by_key = []
for title in game_titles:
game_key = '-'.join([word for word in title.lower().split(' ') if word in team_short_names])
titles_by_key.append(game_key + ": " + title)
print(sorted(titles_by_key))
</code></pre>
<p>输出:</p>
<pre><code>['49ers-steelers: 49ers vs Steelers',
'49ers-steelers: SF 49ers vs PIT Steelers',
'49ers-steelers: San Fransico 49ers vs Pittsburg Steelers',
'cowboys-chargers: Cowboys v Chargers',
'cowboys-chargers: DAL Cowboys vs Los Angles Chargers',
'cowboys-chargers: Dallas Cowboys vs LA Chargers',
'falcons-jets: ATL Falcons vs NY Jets',
'falcons-jets: Atlanta Falcons vs New York Jets',
'falcons-jets: Blah blah Falcons and Foo bar Jets',
'falcons-jets: Falcons v Jets']
</code></pre>
<p>这并不能解决不同联赛中可能出现的球队名称冲突问题,但我怀疑作为预处理步骤,可能有更简单的策略来检测联赛</p>