Pandas:创建一个条件列,并基于另一个条件列中两列的值返回一个值数据框groupby

2024-06-26 00:11:17 发布

您现在位置:Python中文网/ 问答频道 /正文

这个问题是另一个question的扩展,但方法不同。我有以下两个dfs:

(if someone can show me a more efficient way of creating the df below,instead of writing it out by hand, that would be great)

yrs = pd.DataFrame({'years': [1950, 1951, 1952, 1953, 1954, 1955, \
1956, 1957,1958,1959,1960,1961,1962,1963,1964,1965,1967,1968,1969,\
1970,1971,1972,1973,1974,1975,1976,10977,1978,1979,1980,1981,1982,\
1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,\
1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,\
2009,2010,2011,2012,2013,2014]}, index=[1,2,3,4,5,6,7,8,9,10,11,12,\
13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,\
35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,51,52,53,54,55,56,57,\
58,59,60,61,62,63,64,65])

yrs

    years
1   1950
2   1951
3   1952
4   1953
5   1954
........
58  2007
59  2008
60  2009
61  2010
62  2011
63  2012
64  2013
65  2014

dfyears.head(30).to_dict()
{'end': {0: 1995,1: 1997,2: 1999,3: 2001,4: 2003,5: 2005,6: 2007,07: 2013,
  8: 2014,9: 1995,10: 2007,11: 2013,12: 2014,13: 1989,14: 1991, 15: 1993,
  16: 1995,17: 1997,18: 1999,19: 2001,20: 2003,21: 2005,22: 2007,23: 2013,
  24: 2014,25: 1985,26: 1987,27: 1989,28: 1991,29: 1993},'idthomas': {0: 136,1: 136,2: 136,3: 136,4: 136,5: 136,6: 136,7: 136,8: 136,9: 172,10: 172,
  11: 172,12: 172,13: 174,14: 174,15: 174,16: 174,17: 174,18: 174,19: 174,
  20: 174, 21: 174,22: 174,23: 174,24: 174,25: 179,26: 179,27: 179,28: 179,
  29: 179}, 'start': {0: 1993,1: 1995,2: 1997,3: 1999,4: 2001,5: 2003,6: 2005,7: 2007,8: 2013,9: 1993,10: 2001,11: 2007,12: 2013,13: 1987,14: 1989,
  15: 1991,16: 1993,17: 1995,18: 1997, 19: 1999,20: 2001,21: 2003, 22: 2005,
  23: 2007,24: 2013, 25: 1983,26: 1985,27: 1987,28: 1989,29: 1991}}

dfyears.head(30)
    end     start   idthomas
0   1995    1993    136
1   1997    1995    136
2   1999    1997    136
3   2001    1999    136
4   2003    2001    136
5   2005    2003    136
6   2007    2005    136
7   2013    2007    136
8   2014    2013    136
9   1995    1993    172
10  2007    2001    172
11  2013    2007    172
12  2014    2013    172

我想在yrs中创建一个column == served,根据column == years中对应的值是>= start还是<= end返回1或0,同时创建一个column == idthomas,从对应于所应用条件的行返回idthomas value。下面是我想要的一个例子:

  years  served idthomas
1   1950    0   136
2   1951    0   136
3   1952    0   136
4   1953    0   136
5   1954    0   136
...................
43  1993    1   136
44  1994    1   136
45  1995    1   136
46  1996    1   136
47  1997    1   136
48  1998    1   136
49  1999    1   136
51  2000    1   136
52  2001    1   136
53  2002    1   136
54  2003    1   136
55  2004    1   136
56  2005    1   136
57  2006    1   136
58  2007    1   136
59  2008    1   136
60  2009    1   136
61  2010    1   136
62  2011    1   136
63  2012    1   136
64  2013    1   136
65  2014    1   136
66  1950    0   172
67  1951    0   172
68  1952    0   172
69  1953    0   172
70  1954    0   172
...................
72  1993    1   172
73  1994    1   172
74  1995    1   172
75  1996    0   172
76  1997    0   172
77  1998    0   172
78  1999    0   172
79  2000    0   172
80  2001    1   172
81  2002    1   172
82  2003    1   172
83  2004    1   172
84  2005    1   172
85  2006    1   172
86  2007    1   172
87  2008    1   172
88  2009    1   172
89  2010    1   172
90  2011    1   172
91  2012    1   172
92  2013    1   172
93  2014    1   172

我输入了“某物”来编码这个。这是令人尴尬的粗糙:

uu=dfyears.groupby('idthomas')

yrs['did_service'] == 1 if:
# somewhere in the next line I think that I need to do some sort of
# tuple so that I can grab the value in the 'idthomas' column that 
# is associated with the comparison that I am doing.
    x in years >= uu.start | x in years <= uu.end 
    else == 0

如果这不起作用,那么我将手工做这项工作。我只会问,如果有人尝试,但没有能力,那么只要让我知道,这样我就可以有一个想法的生命力的想法。你知道吗


Tags: oftheinifthatcolumnstartcan
1条回答
网友
1楼 · 发布于 2024-06-26 00:11:17

我可以帮助你处理时间序列,你不需要手工输入数据,这里是你可以做的。你知道吗

pd.DataFrame(np.array(pd.date_range(start='1900', end='1920', freq='A').strftime('%Y')), columns=['years'])

或者如果你想有个月、天和完整的日期,也可以去掉.strftime()。你知道吗

为了运行你所描述的逻辑,我在想np.哪里可能工作正常,比如(未测试)

yrs['served'] = np.where((yrs['years'] >= dfyears['start'] | yrs['years'] <= dfyears['end']), 1, 0)

但是,至少根据您的示例,这并不能解决您希望向yrs添加新行的问题。你知道吗

我知道这不是一个完整的答案,但我希望它在某种程度上有所帮助。你知道吗

相关问题 更多 >