如何在一列中修改多个值，但在python中跳过其他值

目标

主要目标我需要跳过ID列中的某些值。 下面的代码去掉破折号“-”，最多只能读取9位数字。但是，我需要跳过某些ID，因为它们是唯一的

之后，我将开始比较多张工作表。

主数据帧ID的格式为000-000-000-000

其他数据帧，我会比较它有没有破折号“-”为000000000，并减去三个000，共九位数字

我需要跳过的唯一ID在两个数据帧中都是相同的，但格式完全不同，范围为000-000-000#12000-000-000#35或000-000-000#z

我将在每个ID上使用的代码，唯一ID除外：

dfSS["ID"] = dfSS["ID"].str.replace("-", "").str[:9]

但是我想使用一个if语句，比如（这不起作用）

lst = ["000-000-000_#69B", "000-000-000_a", "etc.. random IDs", ] if ~dfSS["ID"].isin(lst ).any() dfSS["ID"] = dfSS["ID"].str.replace("-", "").str[:9] else: pass

我的输入数据框如下：

ID Street # Street Name 0 004-330-002-000 2272 Narnia 1 021-521-410-000_128 2311 Narnia 2 001-243-313-000 2235 Narnia 3 002-730-032-000 2149 Narnia 4 000-000-000_a 1234 Narnia

我希望这样做作为输出：

ID Street # Street Name 0 004330002 2272 Narnia 1 021-521-410-000_128 2311 Narnia 2 001243313000 2235 Narnia 3 002730032000 2149 Narnia 4 000-000-000_a 1234 Narnia

2条回答

网友

1楼 · 编辑于 2024-10-01 00:24:31

有很多方法可以做到这一点。这里的第一种方法不涉及编写函数

# Create a placeholder column with all transformed IDs
dfSS["ID_trans"] = dfSS["ID"].str.replace("-", "").str[:9]
dfSS.loc[~dfSS["ID"].isin(lst), "ID"] = dfSS.loc[~dfSS["ID"].isin(lst), "ID_trans"] # conditional indexing

第二种方法是编写一个有条件地转换ID的函数，但速度不如第一种方法快

def transform_ID(ID_val):
    if ID_val not in lst:
        return ID_val.replace("-", "")[:9]

dfSS['ID_trans'] = dfSS['ID'].apply(transform_ID)

网友

2楼 · 编辑于 2024-10-01 00:24:31

这是基于@xyzxyzjayne的答案，但我有两个问题无法解决

第一期

我是否收到此警告：（请参见编辑）

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Documentation for this warning

您将在下面的代码中看到，我试图输入.loc，但似乎无法找到如何通过正确使用.loc消除此警告。还在学呢。不，我不会忽略它，即使它有效。我说这是一个学习的机会

第二期

我不理解这部分代码。我知道左边应该是行，右边是列。也就是说，这为什么有效？当此代码为符文时，ID是列而不是行。我的身份证是：

df.loc[~df["ID "].isin(uniqueID ), "ID "] = df.loc[~df["ID "].isin(uniqueID ), "Place Holder"]

我还不明白的区域是这部分逗号（，）的左侧：

df.loc[~df["ID "].isin(uniqueID), "ID "]

这里说的是最终结果，基本上就像我说的，是XZY的帮助让我来到这里，但我正在添加更多的.loc并处理文档，直到我可以消除警告为止

    uniqueID = [ and whole list of IDs i had to manually enter 1000+ entries that
 will go in the below code. These ids get skipped. example: "032-234-987_#4256"]

# gets the columns i need to make the DateFrame smaller
df = df[['ID ', 'Street #', 'Street Name', 'Debris Finish', 'Number of Vehicles',
         'Number of Vehicles Removed', 'County']]

#Place holder will make our new column with this filter
df.loc[:, "Place Holder"] = df.loc[:,"ID "].str.replace("-", "").str[:9]

#the next code is the filter that goes through the list and skips them. Work in progress to fully understand.
df.loc[~df["ID "].isin(uniqueID ), "ID "] = df.loc[~df["ID "].isin(uniqueID ), "Place Holder"]

#Makes the ID our index
df = df.set_index("ID ")

#just here to add the date to our file name. Must import time for this to work
todaysDate = time.strftime("%m-%d-%y")

#make it an excel file
df.to_excel("ID TEXT " + todaysDate + ".xlsx")

一旦我消除了警告，我会编辑这篇文章，找出左边，这样我就可以为每个需要/看到这篇文章的人解释

编辑：使用CopyWarning设置：

修复了这个链接索引问题，在筛选之前复制原始数据库，并制作everthing.loc as XYZ帮助我解决了这个问题。在开始筛选之前，请使用DataFrame.copy（），其中DataFrame是您自己的DataFrame的名称

目标

注:

第一期

第二期

编辑：使用CopyWarning设置：

相关问题更多 >

编程相关推荐

热门问题

热门文章