标准化数据帧列中的值

2024-06-13 09:27:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧数据框,看起来像:

id colour  response
 1   blue    curent 
 2    red   loaning
 3 yellow   current
 4  green      loan 
 5    red   currret
 6  green      loan

您可以看到response列中的值不一致,我想让它捕捉到一组标准化的响应。在

{cd1>也有一个验证列表}

^{pr2}$

我想根据验证列表中条目的前三个字符来标准化df中的响应列

因此,最终输出结果如下:

id colour  response
 1   blue   current
 2    red      loan
 3 yellow   current
 4  green      loan 
 5    red   current
 6  green      loan

试图使用fnmatch

pattern = 'cur*'
fnmatch.filter(df, pattern) = 'current'

但不能改变df中的值。在

如果有人能提供帮助,我们将不胜感激

谢谢


Tags: 数据iddf列表responsegreenbluered
2条回答

您可以使用map

In [3664]: mapping = dict(zip(s.str[:3], s))

In [3665]: df.response.str[:3].map(mapping)
Out[3665]:
0    current
1       loan
2    current
3       loan
4    current
5       loan
Name: response, dtype: object

In [3666]: df['response2'] = df.response.str[:3].map(mapping)

In [3667]: df
Out[3667]:
   id  colour response response2
0   1    blue   curent   current
1   2     red  loaning      loan
2   3  yellow  current   current
3   4   green     loan      loan
4   5     red  currret   current
5   6   green     loan      loan

其中s是一系列验证值。在

^{pr2}$

细节

In [3652]: mapping
Out[3652]: {'cur': 'current', 'loa': 'loan', 'tra': 'transfer'}

mapping也可以是系列

In [3678]: pd.Series(s.str[:3].values, index=s.values)
Out[3678]:
current     cur
loan        loa
transfer    tra
dtype: object

模糊匹配?在

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
a=[]
for x in df.response:
    a.append([process.extract(x, val.validate, limit=1)][0][0][0])
df['response2']=a
df
Out[867]: 
   id  colour response response2
0   1    blue   curent   current
1   2     red  loaning      loan
2   3  yellow  current   current
3   4   green     loan      loan
4   5     red  currret   current
5   6   green     loan      loan

相关问题 更多 >