多索引过滤

import numpy as np import pandas as pd np.random.seed(1234) midx = pd.MultiIndex.from_product([['a', 'b', 'c'], pd.date_range('20130101', periods=6)], names=['letter', 'date']) df = pd.DataFrame(np.random.randn(len(midx), 1), index=midx)

3条回答

网友

1楼 · 编辑于 2024-06-24 13:17:12

这是一个愚蠢的方法，但你可以用事实

passing a list of labels or tuples works similar to reindexing [source]

并利用pd.Index.slice_indexer(start, stop)，它允许您将每个索引过滤到指定日期之间。在

>>> dictionary = {"a": ("20130102", "20130105"),
...               "b": "20130103",
...               "c": ("20130103", "20130105")}
... 
... 
... def get_idx_pairs():
...     for lvl0, lvl1 in df.index.groupby(df.index.get_level_values(0)).items():
...         dates = lvl1.levels[1]
...         dt = dictionary[lvl0]
...         if isinstance(dt, (tuple, list)):
...             slices = dates[dates.slice_indexer(dt[0], dt[1])]
...             for s in slices:
...                 yield (lvl0, s)
...         else:
...             yield (lvl0, dt)
... 
... 
... df.loc[list(get_idx_pairs())]
... 
                        0
letter date              
a      2013-01-02 -1.1910
       2013-01-03  1.4327
       2013-01-04 -0.3127
       2013-01-05 -0.7206
b      2013-01-03  0.0157
c      2013-01-03 -0.3341
       2013-01-04  0.0021
       2013-01-05  0.4055

对于date中的每个“较小”DatetimeIndex，将其约束到指定的片段，然后构造(letter, date)的元组，并在其上显式索引。在

或者，如果您可以将日期指定为元组（对于单个日期，只需重复），则可以稍微压缩helper函数：

^{pr2}$

网友

2楼 · 编辑于 2024-06-24 13:17:12

您可以使用query，它是为这种选择标准而设计的。在

如果稍微修改一下dictionary，则可以借助列表理解生成所需的查询：

In : dictionary
Out:
{'a': ('20130102', '20130105'),
 'b': ('20130103', '20130103'),
 'c': ('20130103', '20130105')}

In : df.query(
          ' or '.join("('{}' <= date <= '{}' and letter == '{}')".format(*(v + (k,))) 
          for k, v in dictionary.items())
         )
Out:
                          0
letter date
a      2013-01-02 -1.190976
       2013-01-03  1.432707
       2013-01-04 -0.312652
       2013-01-05 -0.720589
b      2013-01-03  0.015696
c      2013-01-03 -0.334077
       2013-01-04  0.002118
       2013-01-05  0.405453

有关查询语句实际执行什么操作的详细信息，请参见下面的列表理解：

^{pr2}$

网友

3楼 · 编辑于 2024-06-24 13:17:12

只要对原词典稍作改动，我们就可以把它写得更简洁一些。我们可以在列表理解中使用pd.IndexSlice，然后pd.concat

# add `-` to separate dates
dictionary = {"a": slice("2013-01-02", "2013-01-05"),
              "b": "2013-01-03",
              "c": slice("2013-01-03", "2013-01-05")}

dictionary = OrderedDict(sorted(dictionary.items()))

idx_slices = [pd.IndexSlice[k, v] for k, v in dictionary.items()]

pd.concat([df.loc[idx, :] for idx in idx_slices])

Out[1]:
                     0
letter  date    
a       2013-01-02   -1.190976
        2013-01-03   1.432707
        2013-01-04   -0.312652
        2013-01-05   -0.720589
c       2013-01-03   -0.334077
        2013-01-04   0.002118
        2013-01-05   0.405453
b       2013-01-03   0.015696

如果您希望自动添加-，可以使用datetime，如下所示：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章