Pandas迭代行模式识别

df["valid"] =0 def author_check(x, y): if str(x) == y: return 1 else: return 0 import re author_list =["Andi","Tomasius"]#] regex_list = [".*nd*"] for i in range(len(author_list)): for x in range(len(regex_list)): r = re.compile(regex_list[x]) newlist = filter(r.match, author_list) x = len(list(newlist)) if x>0: df['brand'] = df.apply(lambda row: author_check(row['Author'], author_list[i]), axis=1 )

1条回答

网友

1楼 · 发布于 2024-09-29 18:57:58

你有一些问题。首先，您的正则表达式将同时匹配Andi和Anke，因为.*nd*本质上是说“匹配0个或多个非换行符、一个n字符和0个或多个d字符”。其次，使用循环并没有利用Panda的功能。相反，我建议使用^{}和^{}以更快更简洁的方式执行函数。你知道吗

使用dataframe的示例：

df = pd.DataFrame({'Author': ['Andi', 'Tomasius', 'Anke'], 'valid': [0, 0, 0]})

以下代码将为您提供所需的内容：

df['valid'] = np.where(df.Author.str.contains('nd'), 1, 0)

如果您的查询更复杂，并且需要正则表达式（根据您的注释），您也可以使用它：

p = re.compile(r'(?:as)|(?:nd)')
df['valid'] = np.where(df.Author.str.contains(p), 1, 0)

相关问题更多 >

编程相关推荐

热门问题

热门文章