是否用数据帧中的新值替换唯一值?

2024-09-26 22:49:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面这样的dataframe,我想通过替换列的唯一值来降低它的敏感度。i、 e.我想用一些从“faker”库生成的假姓氏替换姓氏列

代码片段如下所示

import pandas as pd
from faker import Faker
fake = Faker()
print(fake.first_name())
print(fake.last_name())
last = ('Meyer', 'Maier', 'Meyer', 'Mayer', 'Meyr', 'Mair')
job = ('data analyst', 'programmer', 'computer scientist', 
       'data scientist', 'accountant', 'psychiatrist')
language = ('Python', 'Perl', 'Java', 'Java', 'Cobol', 'Brainfuck')

df = pd.DataFrame(list(zip(last, job, language)), 
                  columns =['last', 'job', 'language'],
                  index=first) 

我想要的输出是用假名字更改姓氏列,但例如,Meyer应该总是用相同的假姓氏替换


Tags: nameimportdatajoblanguagefakefirstpd
1条回答
网友
1楼 · 发布于 2024-09-26 22:49:39

获取所有唯一名称,创建映射唯一名称的词典->;伪造名称,并将其映射到您的列:

import pandas as pd
first = ('Mike', 'Dorothee', 'Tom', 'Bill', 'Pete', 'Kate')
last = ('Meyer', 'Maier', 'Meyer', 'Mayer', 'Meyr', 'Mair')
job = ('data analyst', 'programmer', 'computer scientist', 
      'data scientist', 'accountant', 'psychiatrist')
language = ('Python', 'Perl', 'Java', 'Java', 'Cobol', 'Brainfuck')

df = pd.DataFrame(list(zip(last, job, language)), 
                  columns =['last', 'job', 'language'],
                  index=first) 
print(df)

# get all unique names - this can easily hande a couple tenthousand names
all_names = set(df["last"])

# create mapper: you would use fake.last_name() instead of 42+i
# mapper = {k: fake.last_name() for k in all_names }
mapper = {k: 42 + i for i, k in enumerate(all_names )}

# apply it
df["last"] = df["last"].map(mapper)
print(df)

输出:

# before
          last                 job   language
Mike      Meyer        data analyst     Python
Dorothee  Maier          programmer       Perl
Tom       Meyer  computer scientist       Java
Bill      Mayer      data scientist       Java
Pete       Meyr          accountant      Cobol
Kate       Mair        psychiatrist  Brainfuck

# after
          last                 job   language
Mike        44        data analyst     Python
Dorothee    43          programmer       Perl
Tom         44  computer scientist       Java
Bill        45      data scientist       Java
Pete        46          accountant      Cobol
Kate        47        psychiatrist  Brainfuck

相关问题 更多 >

    热门问题