Pandas系列和数据帧替换和结构更换功能

2024-10-04 03:29:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下pandas数据帧(pandas 0.20.2,python 3.6.2):

#    df=pd.DataFrame([['abc00010                    Pathway'],['abc00020                    Pathway']], columns=["ENTRY"])
df3=pd.DataFrame(columns=["ENTRY"])
df3.loc[:,"ENTRY"]=[list(['abc00010                    Pathway']),list(['abc00020                    Pathway'])]


df["ENTRY2"]=df.loc[:,"ENTRY"]  
df["ENTRY3"]=df.loc[:,"ENTRY"]  
df["ENTRY4"]=df.loc[:,"ENTRY"]  
df["ENTRY5"]=df.loc[:,"ENTRY"]  
df["ENTRY6"]=df.loc[:,"ENTRY"]  


dfcleaner=re.compile(r"\W+?Pathway")  
df.loc[:,"ENTRY"]=df.loc[:,"ENTRY"].apply(str)
df.loc[:,"ENTRY"].replace(dfcleaner,"", inplace=True, regex=True)  

df.loc[:,"ENTRY2"]=df.loc[:,"ENTRY2"].apply(str)
df.loc[:,"ENTRY2"].replace(dfcleaner,"")

df.loc[:,"ENTRY3"].replace(dfcleaner,"", inplace=True, regex=True)
df["ENTRY4"]=df.loc[:,"ENTRY4"].str.replace(dfcleaner,"")#>NANA

df.loc[:,"ENTRY5"]=df.loc[:,"ENTRY5"].replace(dfcleaner,"", inplace=True, regex=True)
df.loc[:,"ENTRY6"]=df.loc[:,"ENTRY6"].replace(dfcleaner,"", regex=True)

    ENTRY   ENTRY2  ENTRY3  ENTRY4  ENTRY5  ENTRY6  
0   ['abc00010']    ['abc00010                    Pathway'] ['abc00010                    Pathway'] nan None    ['abc00010                    Pathway']
1   ['abc00020']    ['abc00020                    Pathway'] ['abc00020                    Pathway'] nan None    ['abc00020                    Pathway']

我希望ENTRY2以及ENTRY3和ENTRY6不会被更改,因为它们不是字符串,也没有被转换成字符串,或者ENTRY5作为place替换将返回none。在

我没想到的是字符串访问器的ENTRY4行为。你能给我解释一下吗?不能决定是否是一个错误,它还没有报告,如果它是一个。。。在

编辑了上面的代码,因为第一个代码没有给出与我想要的完全相似的df/与代码中的结果匹配的df


Tags: truedflocreplaceregexentrypathwayentry3
1条回答
网友
1楼 · 发布于 2024-10-04 03:29:17

I expected ENTRY2 not to be changed, as well as ENTRY3 and ENTRY6 since they are not strings nor converted to it

所有列都是object(string)数据类型:

In [11]: df.dtypes
Out[11]:
ENTRY     object
ENTRY2    object
ENTRY3    object
ENTRY4    object
ENTRY5    object
ENTRY6    object
dtype: object

ENTRY5 as replacing in place will return none

这就是inplace=True的工作原理。当使用inplace=False时,您可以将返回的DF赋值(默认值):

^{pr2}$

或者就地更新-在这种情况下,返回None,因此我们应该而不是将其分配回:

df.loc[:,"ENTRY5"].replace(dfcleaner,"", inplace=True, regex=True)

What I did not expect was the ENTRY4 behavior with the string accessor.

我无法使用您的代码再现ENTRY4“问题”(Pandas 0.20.1):

In [16]: df
Out[16]:
      ENTRY                               ENTRY2    ENTRY3    ENTRY4 ENTRY5    ENTRY6
0  abc00010  abc00010                    Pathway  abc00010  abc00010   None  abc00010
1  abc00020  abc00020                    Pathway  abc00020  abc00020   None  abc00020

相关问题 更多 >