Pandas：在过滤器中添加列会弄乱数据结构问题的回答

Pandas：在过滤器中添加列会弄乱数据结构

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

它似乎是一个bug，在<code>pandas/io/stat.py</code>的源代码中，在<code>_do_select_columns()</code>方法中，循环： <pre><code>dtyplist = [] typlist = [] fmtlist = [] lbllist = [] matched = set() for i, col in enumerate(data.columns): if col in column_set: matched.update([col]) dtyplist.append(self.dtyplist[i]) typlist.append(self.typlist[i]) fmtlist.append(self.fmtlist[i]) lbllist.append(self.lbllist[i]) </code></pre> 打乱了<code>dtypes</code>的顺序，它不再与<code>column_set</code>中出现的序列匹配。你知道吗 比较本例中<code>df2</code>和<code>df3</code>的<code>dtypes</code>： <pre><code>In [1]: import zipfile z = zipfile.ZipFile('/Users/q6600sl/Downloads/cepr_org_2014.zip') df= pd.read_stata(z.open('cepr_org_2014.dta'), convert_categoricals = False) In [2]: columns = ['wbho', 'age', 'female', 'wage4', 'ind_nber'] columns2 = ['year', 'month', 'minsamp', 'hhid', 'hhid2', 'fnlwgt'] In [3]: df2 = pd.read_stata(z.open('cepr_org_2014.dta'), convert_categoricals = False, columns=columns+columns2) In [4]: df2.dtypes Out[4]: wbho int16 age int8 female int8 wage4 object ind_nber object year float32 month int8 minsamp int8 hhid float64 hhid2 float64 fnlwgt float32 dtype: object In [5]: df3 = df[columns+columns2] In [6]: df3.dtypes Out[6]: wbho int8 age int8 female int8 wage4 float32 ind_nber float64 year int16 month int8 minsamp int8 hhid object hhid2 object fnlwgt float32 dtype: object </code></pre> 更改为： <pre><code>dtyplist = [] typlist = [] fmtlist = [] lbllist = [] #matched = set() for i in np.hstack([np.argwhere(data.columns==col) for col in columns]).ravel(): # if col in column_set: # matched.update([col]) dtyplist.append(self.dtyplist[i]) typlist.append(self.typlist[i]) fmtlist.append(self.fmtlist[i]) lbllist.append(self.lbllist[i]) </code></pre> 修复了问题。你知道吗 （不知道<code>matched</code>在这里做什么。以后似乎再也不用了。）

Pandas：在过滤器中添加列会弄乱数据结构

1 个回答

相关Python问题