如果列包含来自另一个datafram的列的字符串,则在dataframe中创建一个新列

2024-09-30 12:35:10 发布

您现在位置:Python中文网/ 问答频道 /正文

如果列包含第二个dataframe的列中的任何值,我想在dataframe中创建一个新列。在

第一个数据帧

WXYnineZAB
EFGsixHIJ
QRSeightTUV
GHItwoJKL
YZAfiveBCD
EFGsixHIJ
MNOthreePQR
ABConeDEF
MNOthreePQR
MNOthreePQR
YZAfiveBCD
WXYnineZAB
GHItwoJKL
KLMsevenNOP
EFGsixHIJ
ABConeDEF
KLMsevenNOP
QRSeightTUV
STUfourVWX
STUfourVWX
KLMsevenNOP
WXYnineZAB
CDEtenFGH
YZAfiveBCD
CDEtenFGH
QRSeightTUV
ABConeDEF
STUfourVWX
CDEtenFGH
GHItwoJKL

第二个数据帧

^{pr2}$

输出数据帧

WXYnineZAB,nine
EFGsixHIJ,***
QRSeightTUV,***
GHItwoJKL,***
YZAfiveBCD,five
EFGsixHIJ,***
MNOthreePQR,three
ABConeDEF,one
MNOthreePQR,three
MNOthreePQR,three
YZAfiveBCD,five
WXYnineZAB,nine
GHItwoJKL,***
KLMsevenNOP,seven
EFGsixHIJ,***
ABConeDEF,one
KLMsevenNOP,seven
QRSeightTUV,***
STUfourVWX,***
STUfourVWX,***
KLMsevenNOP,seven
WXYnineZAB,nine
CDEtenFGH,***
YZAfiveBCD,five
CDEtenFGH,***
QRSeightTUV,***
ABConeDEF,one
STUfourVWX,***
CDEtenFGH,***
GHItwoJKL,***

为了便于解释,我将第一个数据帧设置为3chars+searchstring+3chars,但是我的实际文件没有这样的一致性。在


Tags: 数据onethreefivenineabconedefcdetenfghmnothreepqr
2条回答

源DFs:

In [172]: d1
Out[172]:
            txt
0    WXYnineZAB
1     EFGsixHIJ
2   QRSeightTUV
3     GHItwoJKL
4    YZAfiveBCD
..          ...
25  QRSeightTUV
26    ABConeDEF
27   STUfourVWX
28    CDEtenFGH
29    GHItwoJKL

[30 rows x 1 columns]

In [173]: d2
Out[173]:
    word
0    one
1  three
2   five
3  seven
4   nine

从第二个RegEx模式生成:

^{pr2}$

提取与正则表达式模式匹配的单词,并将其指定为新列:

In [176]: d1['new'] = d1['txt'].str.extract(pat, expand=False)

In [177]: d1
Out[177]:
            txt   new
0    WXYnineZAB  nine
1     EFGsixHIJ   NaN
2   QRSeightTUV   NaN
3     GHItwoJKL   NaN
4    YZAfiveBCD  five
..          ...   ...
25  QRSeightTUV   NaN
26    ABConeDEF   one
27   STUfourVWX   NaN
28    CDEtenFGH   NaN
29    GHItwoJKL   NaN

[30 rows x 2 columns]

如果需要,也可以在同一步骤中填充NaN:

^{4}$

如果您想避免使用RegEx,下面是一个纯粹基于列表的解决方案:

# Sample DataFrames (structure is borrowed from MaxU)
d1 = pd.DataFrame({'txt':['WXYnineZAB','EFGsixHIJ','QRSeightTUV','GHItwoJKL']})
d2 = pd.DataFrame({'word':['two','six']})
# Check if word exists in any txt (1-liner).
exists = [list(d2.word[[word in txt for word in d2.word]])[0] if sum([word in txt for word in d2.word]) == 1 else '***' for txt in d1.txt]
# Resulting output
res = pd.DataFrame(zip(d1.txt,exists), columns = ['text','word'])

结果:

^{pr2}$

相关问题 更多 >

    热门问题