使用Python和Pandas将对象列表与另一个列表进行比较

2024-06-01 21:46:31 发布

您现在位置：Python中文网/ 问答频道 /正文

2704

网友

男 | 程序猿一只，喜欢编程写python代码。

我可能走错了路。我已经将两个csv文件中的公司名称加载到一个列表中，我正在尝试比较这两个列表以找出名称相似的地方

名字的大小写和标点符号都被删除了，但有时输入信息的人会缩写公司的身份或拼错一个词，所以我试图找到一种方法，根据名字的相似程度来打分

原始数据可能如下所示（不是实际数据）：

Walgreens Boots Alliance
CARDINAL HEALTH
EXPRESS SCRIPTS HOLDING
j.p. morgan chase
Bank of America Corp
wells fargo
Home Depot
STATE FARM INSURANCE COS.
Johnson & Johnson
archer daniels midland

然后降低大小写，删除停止字/标点符号，并拆分：

[walgreens, boots, alliance]
[cardinal, health]
[express, scripts, holding]
[jp, morgan, chase]
[bank, america, corp]
[wells, fargo]
[home, depot]
[state, farm, insurance, cos]
[johnson, johnson]
[archer, daniels, midland]

。。。类似的第二个列表如下所示：

[cardinal, health]
[expres, scripts, holding]
[bank, america, corporation]
[wells, fargo]
[home, depot]
[state, farm, insurance, companies]
[archer, daniels]
[ford, motor, company]
[general, motors]
[john, deere]

我为Pandas编写了一个复杂的循环，以测试列表中的每个单词是否也存在于其他列表中：

for index, row in df1[['Company Name Tokens']].iterrows():
    for content in row:
        for x in content:
            df1.iloc[index]['Test'] = 0
            df1.iloc[index]['Count'] = len(content)
            for idx, rw in entities[['Company Name Tokens']]:
                for r in rw:
                    if x in r:
                        df1.iloc[index]['Test'] = df1.iloc[index]['Test'] + 1

我知道这可能真的很慢，但我不是在追求效率。无论如何，我认为这种方法对于Python解释器来说可能太难处理了，因为我遇到了一个错误：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-192-adea96d8cb82> in <module>()
      4             df1.iloc[index]['Test'] = 0
      5             df1.iloc[index]['Count'] = len(content)
----> 6             for idx, rw in entities[['Company Name Tokens']]:
      7                 for r in rw:
      8                     if x in r:

ValueError: too many values to unpack (expected 2)

我是不是太难了？有没有更好的方法

Tags：方法 in test 列表 for index content company

1条回答

网友

1楼 · 发布于 2024-06-01 21:46:31

如果你只是从两个列表开始，你可以得到每个列表的集合的并集：

shared = set(list_a).union(set(list_b))

您只需要将这两个列表设置为1-D，而不是像数据帧那样设置为2-D。当然，只有以标准方式输入数据时，这才起作用。你得把这两个清理干净，看看你刚才提到的输入错误

使用Python和Pandas将对象列表与另一个列表进行比较

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用Python和Pandas将对象列表与另一个列表进行比较

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >