Python返回字符串列表(如果其他列表中存在任何子字符串)

2024-10-02 02:36:32 发布

您现在位置:Python中文网/ 问答频道 /正文

假设你有这样的公司信息:

companies = [['zmpEVqsbCUO1aXStxHkSVA', 'palms-car-wash'],
['5T0vKfIJWP1xTnxA7fJ17w', 'meat-and-bread'],
['C0d5kzUx6C19mLcxQyhxCA', 'alamo-drafthouse-cinema-'],
['ch1ercqwoNLpQLxpTb90KQ', 'boston-tea-stop']]

假设您想要排除某些业务,如果列表(以下)的任何字符串/子字符串出现在上面列表的某些信息中:

no_interest = ['museum', 'cinema', 'car']

我已经这样做了,(我们只在每个条目的第2列中查看):

# KEEPING ONLY RESULTS WHERE WE DO NOT FIND THE SUBSTRINGS
[x for x in companies if (no_interest[0] not in x[1]) & (no_interest[1] not in x[1]) & (no_interest[2] not in x[1])]

# RETURN
[['5T0vKfIJWP1xTnxA7fJ17w', 'meat-and-bread'],
 ['ch1ercqwoNLpQLxpTb90KQ', 'boston-tea-stop']]

即使我希望它使用“OR”语句而不是“AND”(&;)语句,它似乎也能工作,对我来说,AND是一个累积运算符,如果所有条件都满足,它也应该工作(“博物馆”、“电影院”和“汽车”在同一字符串中)

所以我有两个问题:

  • Why is the 'AND' statement acting like a 'OR'?
  • How can we make this code more pythonic and more efficient?

我们在这里只检查3个子字符串,但它越来越多地与我们正在寻找的数千个事件有关,最好不要重复这些条件,而是有一个更像all()any()的语句,返回结果而不是布尔值


Tags: andno字符串in信息not语句car
2条回答

下面是另一个使用正则表达式的例子,但是(正如亨利·埃克的回答)它假设在任何“无兴趣”元素中都没有干扰正则表达式的特殊字符

import regex as re
pattern = re.compile("|".join(no_interest))
out = [c for c in companies if ((pattern.search(c[0]) == None) and (pattern.search(c[1]) == None))]

Why is the 'AND' statement acting like a 'OR'?

见:DeMorgan's Laws

DeMorgan's Law

How can we make this code more pythonic and more efficient?

更像Python:

一种选择是在单独的列表中使用all

companies = [['zmpEVqsbCUO1aXStxHkSVA', 'palms-car-wash'],
             ['5T0vKfIJWP1xTnxA7fJ17w', 'meat-and-bread'],
             ['C0d5kzUx6C19mLcxQyhxCA', 'alamo-drafthouse-cinema-'],
             ['ch1ercqwoNLpQLxpTb90KQ', 'boston-tea-stop']]

no_interest = ['museum', 'cinema', 'car']

out = [x for x in companies if all([ni not in x[1] for ni in no_interest])]
print(out)

或与not{a4}一起:

out = [x for x in companies if not any([ni in x[1] for ni in no_interest])]
[['5T0vKfIJWP1xTnxA7fJ17w', 'meat-and-bread'],
 ['ch1ercqwoNLpQLxpTb90KQ', 'boston-tea-stop']]

更有效率:

使用类似pandas的库:

import pandas as pd

companies = [['zmpEVqsbCUO1aXStxHkSVA', 'palms-car-wash'],
             ['5T0vKfIJWP1xTnxA7fJ17w', 'meat-and-bread'],
             ['C0d5kzUx6C19mLcxQyhxCA', 'alamo-drafthouse-cinema-'],
             ['ch1ercqwoNLpQLxpTb90KQ', 'boston-tea-stop']]

df = pd.DataFrame(data=companies, columns=['id', 'val'])

no_interest = ['museum', 'cinema', 'car']

out = df[~df['val'].str.contains('|'.join(no_interest))]
print(out)

输出为数据帧

                       id              val
1  5T0vKfIJWP1xTnxA7fJ17w   meat-and-bread
3  ch1ercqwoNLpQLxpTb90KQ  boston-tea-stop

输出为列表

print(out.to_numpy().tolist())
[['5T0vKfIJWP1xTnxA7fJ17w', 'meat-and-bread'],
 ['ch1ercqwoNLpQLxpTb90KQ', 'boston-tea-stop']]

相关问题 更多 >

    热门问题