下面是一个示例数据帧,对于每个总线描述,我希望找到所有其他总线,这些总线的描述至少包含一个相同的单词。在
Bus # DESCRIPTION
Bus1 RICE MILLS MANUFACTURER
Bus2 LICORICE CANDY RETAIL
Bus3 LICORICE CANDY WHOLESALE
Bus4 RICE RETAIL
例如,的输出:
^{pr2}$下面的代码几乎可以正确地执行此操作。在
df[df['DESCRIPTION'].str.contains(df['DESCRIPTION'][0].split()[0])]
df[df['DESCRIPTION'].str.contains(df['DESCRIPTION'][0].split()[1])]
df[df['DESCRIPTION'].str.contains(df['DESCRIPTION'][0].split()[2])]
df[df['DESCRIPTION'].str.contains(df['DESCRIPTION'][1].split()[0])]
df[df['DESCRIPTION'].str.contains(df['DESCRIPTION'][1].split()[1])]
df[df['DESCRIPTION'].str.contains(df['DESCRIPTION'][1].split()[2])]
df[df['DESCRIPTION'].str.contains(df['DESCRIPTION'][2].split()[0])]
df[df['DESCRIPTION'].str.contains(df['DESCRIPTION'][2].split()[1])]
df[df['DESCRIPTION'].str.contains(df['DESCRIPTION'][2].split()[2])]
df[df['DESCRIPTION'].str.contains(df['DESCRIPTION'][3].split()[0])]
df[df['DESCRIPTION'].str.contains(df['DESCRIPTION'][3].split()[1])]
问题是“甘草”中有“大米”一词,所以米厂生产商的产量包括“甘草零售”,我不想这样。在
这仍然是O(n^2),但是,它是高度矢量化的。在
结果
^{pr2}$比较计时
定时
现在,您可以按如下方式调用上述函数:
^{pr2}$请注意,这不是最好的优化算法,但它是快速和肮脏的方法。这是一个O(n2)。在
相关问题 更多 >
编程相关推荐