如果列包含来自另一个datafram的列的字符串，则在dataframe中创建一个新列

WXYnineZAB EFGsixHIJ QRSeightTUV GHItwoJKL YZAfiveBCD EFGsixHIJ MNOthreePQR ABConeDEF MNOthreePQR MNOthreePQR YZAfiveBCD WXYnineZAB GHItwoJKL KLMsevenNOP EFGsixHIJ ABConeDEF KLMsevenNOP QRSeightTUV STUfourVWX STUfourVWX KLMsevenNOP WXYnineZAB CDEtenFGH YZAfiveBCD CDEtenFGH QRSeightTUV ABConeDEF STUfourVWX CDEtenFGH GHItwoJKL

WXYnineZAB,nine EFGsixHIJ,*** QRSeightTUV,*** GHItwoJKL,*** YZAfiveBCD,five EFGsixHIJ,*** MNOthreePQR,three ABConeDEF,one MNOthreePQR,three MNOthreePQR,three YZAfiveBCD,five WXYnineZAB,nine GHItwoJKL,*** KLMsevenNOP,seven EFGsixHIJ,*** ABConeDEF,one KLMsevenNOP,seven QRSeightTUV,*** STUfourVWX,*** STUfourVWX,*** KLMsevenNOP,seven WXYnineZAB,nine CDEtenFGH,*** YZAfiveBCD,five CDEtenFGH,*** QRSeightTUV,*** ABConeDEF,one STUfourVWX,*** CDEtenFGH,*** GHItwoJKL,***

2条回答

网友

1楼 · 编辑于 2024-09-30 12:35:10

源DFs:

In [172]: d1
Out[172]:
            txt
0    WXYnineZAB
1     EFGsixHIJ
2   QRSeightTUV
3     GHItwoJKL
4    YZAfiveBCD
..          ...
25  QRSeightTUV
26    ABConeDEF
27   STUfourVWX
28    CDEtenFGH
29    GHItwoJKL

[30 rows x 1 columns]

In [173]: d2
Out[173]:
    word
0    one
1  three
2   five
3  seven
4   nine

从第二个RegEx模式生成：

^{pr2}$

提取与正则表达式模式匹配的单词，并将其指定为新列：

In [176]: d1['new'] = d1['txt'].str.extract(pat, expand=False)

In [177]: d1
Out[177]:
            txt   new
0    WXYnineZAB  nine
1     EFGsixHIJ   NaN
2   QRSeightTUV   NaN
3     GHItwoJKL   NaN
4    YZAfiveBCD  five
..          ...   ...
25  QRSeightTUV   NaN
26    ABConeDEF   one
27   STUfourVWX   NaN
28    CDEtenFGH   NaN
29    GHItwoJKL   NaN

[30 rows x 2 columns]

如果需要，也可以在同一步骤中填充NaN：

^{4}$

网友

2楼 · 编辑于 2024-09-30 12:35:10

如果您想避免使用RegEx，下面是一个纯粹基于列表的解决方案：

# Sample DataFrames (structure is borrowed from MaxU)
d1 = pd.DataFrame({'txt':['WXYnineZAB','EFGsixHIJ','QRSeightTUV','GHItwoJKL']})
d2 = pd.DataFrame({'word':['two','six']})
# Check if word exists in any txt (1-liner).
exists = [list(d2.word[[word in txt for word in d2.word]])[0] if sum([word in txt for word in d2.word]) == 1 else '***' for txt in d1.txt]
# Resulting output
res = pd.DataFrame(zip(d1.txt,exists), columns = ['text','word'])

结果：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章