基于另一个数据帧替换列值更好吗？问题的回答

基于另一个数据帧替换列值更好吗？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

注意：为了简单起见，我使用了一个玩具示例，因为复制/粘贴数据帧在堆栈溢出中很困难（请告诉我是否有一种简单的方法可以做到这一点） 有没有一种方法可以将一个数据帧中的值合并到另一个数据帧中，而不获取_X，_Y列？我希望一列上的值替换另一列上的所有零值 <pre><code>df1: Name Nonprofit Business Education X 1 1 0 Y 0 1 0 <- Y and Z have zero values for Nonprofit and Educ Z 0 0 0 Y 0 1 0 df2: Name Nonprofit Education Y 1 1 <- this df has the correct values. Z 1 1 pd.merge(df1, df2, on='Name', how='outer') Name Nonprofit_X Business Education_X Nonprofit_Y Education_Y Y 1 1 1 1 1 Y 1 1 1 1 1 X 1 1 0 nan nan Z 1 1 1 1 1 </code></pre> 在上一篇文章中，我尝试了先组合_和dropna（），但这两种方法都不行 我想用df2中的值替换df1中的零。此外，我希望具有相同名称的所有行都根据df2进行更改 <pre><code>Name Nonprofit Business Education Y 1 1 1 Y 1 1 1 X 1 1 0 Z 1 0 1 </code></pre> （需要澄清：name=Z的“业务”列中的值应为0。） 我现有的解决方案执行以下操作：我基于df2中存在的名称创建子集，然后用正确的值替换这些值。不过，我想用一种不太老套的方式来做这件事 <pre><code>pubunis_df = df2 sdf = df1 regex = str_to_regex(', '.join(pubunis_df.ORGS)) pubunis = searchnamesre(sdf, 'ORGS', regex) sdf.ix[pubunis.index, ['Education', 'Public']] = 1 searchnamesre(sdf, 'ORGS', regex) </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<h2>注意：在最新版本的《熊猫》中，上述两个答案不再适用：</h2> KSD的回答将引发错误： <pre><code>df1 = pd.DataFrame([["X",1,1,0], ["Y",0,1,0], ["Z",0,0,0], ["Y",0,0,0]],columns=["Name","Nonprofit","Business", "Education"]) df2 = pd.DataFrame([["Y",1,1], ["Z",1,1]],columns=["Name","Nonprofit", "Education"]) df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2.loc[df2.Name.isin(df1.Name),['Nonprofit', 'Education']].values df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2[['Nonprofit', 'Education']].values Out[851]: ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (3,) </code></pre> EdChum的回答会给我们错误的结果： <pre><code> df1.loc[df1.Name.isin(df2.Name), ['Nonprofit', 'Education']] = df2[['Nonprofit', 'Education']] df1 Out[852]: Name Nonprofit Business Education 0 X 1.0 1 0.0 1 Y 1.0 1 1.0 2 Z NaN 0 NaN 3 Y NaN 1 NaN </code></pre> 好的，只有当列“Name”中的值是唯一的并且在两个数据帧中都排序时，它才能安全地工作 以下是我的答案： <h2>方式1:</h2> <pre><code>df1 = df1.merge(df2,on='Name',how="left") df1['Nonprofit_y'] = df1['Nonprofit_y'].fillna(df1['Nonprofit_x']) df1['Business_y'] = df1['Business_y'].fillna(df1['Business_x']) df1.drop(["Business_x","Nonprofit_x"],inplace=True,axis=1) df1.rename(columns={'Business_y':'Business','Nonprofit_y':'Nonprofit'},inplace=True) </code></pre> <h2>方式2:</h2> <pre><code>df1 = df1.set_index('Name') df2 = df2.set_index('Name') df1.update(df2) df1.reset_index(inplace=True) </code></pre> <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html" rel="noreferrer">More guide about update.</a>。需要设置索引的两个数据帧的列名在“更新”之前不必相同。你可以试试“Name1”和“Name2”。此外，即使df2中的其他不必要行不会更新df1，它也可以工作。换句话说，df2不需要是df1的超集 例如： <pre><code>df1 = pd.DataFrame([["X",1,1,0], ["Y",0,1,0], ["Z",0,0,0], ["Y",0,1,0]],columns=["Name1","Nonprofit","Business", "Education"]) df2 = pd.DataFrame([["Y",1,1], ["Z",1,1], ['U',1,3]],columns=["Name2","Nonprofit", "Education"]) df1 = df1.set_index('Name1') df2 = df2.set_index('Name2') df1.update(df2) </code></pre> 结果: <pre><code> Nonprofit Business Education Name1 X 1.0 1 0.0 Y 1.0 1 1.0 Z 1.0 0 1.0 Y 1.0 1 1.0 </code></pre>

基于另一个数据帧替换列值更好吗？

1 个回答

相关Python问题