Pandas:根据其他两列的值计算列

2024-09-27 21:34:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下节选的dataframe

{'end': {0: 1995, 1: 1997, 2: 1999, 3: 2001, 4: 2003, 5: 2005, 6: 2007, 7: 2013, 8: 2014, 9: 1995, 10: 2007, 11: 2013, 12: 2014, 13: 1989,
  14: 1991, 15: 1993, 16: 1995, 17: 1997, 18: 1999, 19: 2001, 20: 2003,
  21: 2005, 22: 2007, 23: 2013, 24: 2014, 25: 1985, 26: 1987, 27: 1989, 28: 1991, 29: 1993}, 'idthomas': {0: 136, 1: 136, 2: 136, 3: 136, 4:136,
  5: 136, 6: 136, 7: 136, 8: 136, 9: 172, 10: 172, 11: 172, 12: 172,  13: 174, 14: 174, 15: 174, 16: 174, 17: 174, 18: 174, 19: 174, 20: 174, 21: 174, 22: 174, 23: 174, 24: 174, 25: 179, 26: 179, 27: 179, 28: 179,
  29: 179}, 'start': {0: 1993, 1: 1995, 2: 1997, 3: 1999, 4: 2001, 5: 2003, 6: 2005, 7: 2007, 8: 2013, 9: 1993, 10: 2001, 11: 2007, 12: 2013, 13: 1987, 14: 1989, 15: 1991, 16: 1993, 17: 1995, 18: 1997, 19: 1999, 20: 2001, 21: 2003, 22: 2005, 23: 2007, 24: 2013, 25: 1983, 26: 1985, 27: 1987, 28: 1989, 29: 1991}}


df_oddyears.head()
    end     start   idthomas
0   1995    1993    136
1   1997    1995    136
2   1999    1997    136
3   2001    1999    136
4   2003    2001    136
5   2005    2003    136
6   2007    2005    136
7   2013    2007    136
8   2014    2013    136
9   1995    1993    172
10  2007    2001    172
11  2013    2007    172
12  2014    2013    172
13  1989    1987    174
14  1991    1989    174

它代表了美国立法者和国会的条款。有一些不方便的违规行为:startend日期表示任期的开始和结束,并将有2年或6年的差异,这取决于立法者是在众议院还是参议院任职。所有的立法者都有一个独特的idthomas,如果他们愿意的话,可以从众议院转到参议院。有时立法委员不能连任,这就造成了他们的服务差距。看看idthomas == 172你可以看到end == 1995start == 2001之间的差距。你知道吗

我需要计算从立法会议员服务期开始到立法会议员服务期结束的每一年的积极累积公共服务年数,甚至包括年数。在下一步中,我将合并这个df与另一个df沿年,因此我需要偶数年和奇数年的现役服务。你知道吗

这是我在深入了解问题之前所发现的:

df_oddyears['end']=df_oddyears['end'].map(lambda x: str(x)[:-6])
df_oddyears['start']=df_oddyears['start'].map(lambda x: str(x[:-6]))
df_oddyears['end'] = df_oddyears['end'].astype('int')
df_oddyears['start'] = df_oddyears['start'].astype('int')
df_oddyears['end'] = df_oddyears['end'].clip_upper(2014)
df_oddyears['term'] = df_oddyears.end - df_oddyears.start
df_oddyears['years_exp']=df_oddyears.groupby(['id.thomas']).term.cumsum()
df_oddyears.rename(columns={'id.thomas':'idthomas'},inplace=True)

df_oddyears.head()

    end     start   idthomas    term    years_exp
0   1995    1993    136           2     2
1   1997    1995    136           2     4
2   1999    1997    136           2     6
3   2001    1999    136           2     8
4   2003    2001    136           2     10
5   2005    2003    136           2     12
6   2007    2005    136           2     14
7   2013    2007    136           6     20
8   2014    2013    136           1     21
9   1995    1993    172           2     2
10  2007    2001    172           6     8
11  2013    2007    172           6     14
12  2014    2013    172           1     15

{'end': {0: 1995, 1: 1997, 2: 1999, 3: 2001, 4: 2003, 5: 2005, 6: 2007,
  7: 2013, 8: 2014, 9: 1995, 10: 2007, 11: 2013, 12: 2014, 13: 1989,
  14: 1991, 15: 1993, 16: 1995, 17: 1997, 18: 1999, 19: 2001, 20: 2003,
  21: 2005, 22: 2007, 23: 2013, 24: 2014, 25: 1985, 26: 1987, 27: 1989,
  28: 1991, 29: 1993}, 'idthomas': {0: 136, 1: 136, 2: 136, 3: 136,
  4: 136, 5: 136, 6: 136, 7: 136, 8: 136, 9: 172, 10: 172, 11: 172,  12: 172, 13: 174, 14: 174, 15: 174, 16: 174, 17: 174, 18: 174, 19: 174,
  20: 174, 21: 174, 22: 174, 23: 174, 24: 174, 25: 179, 26: 179, 27: 179, 28: 179, 29: 179},'start': {0: 1993, 1: 1995, 2: 1997, 3: 1999,  4: 2001, 5: 2003, 6: 2005, 7: 2007, 8: 2013, 9: 1993, 10: 2001, 11: 2007, 12: 2013, 13: 1987, 14: 1989, 15: 1991, 16: 1993, 17: 1995, 18: 1997, 19: 1999, 20: 2001, 21: 2003, 22: 2005, 23: 2007, 24: 2013, 25: 1983, 26: 1985, 27: 1987, 28: 1989, 29: 1991},'term': {0: 2, 1: 2, 2: 2,  3: 2, 4: 2, 5: 2, 6: 2, 7: 6, 8: 1, 9: 2, 10: 6, 11: 6, 12: 1, 13: 2, 14: 2, 15: 2, 16: 2, 17: 2, 18: 2, 19: 2, 20: 2, 21: 2, 22: 2, 23: 6,
  24: 1, 25: 2, 26: 2, 27: 2, 28: 2, 29: 2},'years_exp': {0: 2, 1: 4,
  2: 6, 3: 8, 4: 10, 5: 12, 6: 14, 7: 20, 8: 21, 9: 2, 10: 8, 11: 14,
  12: 15, 13: 2, 14: 4, 15: 6, 16: 8, 17: 10, 18: 12, 19: 14, 20: 16,
  21: 18, 22: 20, 23: 26, 24: 27, 25: 2, 26: 4, 27: 6, 28: 8, 29: 10}}

然后我df=df_oddyears.drop(['start', 'term'], axis=1, inplace=False)并从here实现以下代码

    final_year = 2014
df= pd.DataFrame([(year, id_, n) 
                  for id_, end, years_exp in df.groupby('idthomas').first().itertuples() 
                  for n, year in enumerate(range(end, final_year + 1), years_exp)], 
                 columns=['end', 'idthomas', 'years_exp'])

df.head()

        end     idthomas    years_exp
673     1995    172     2
674     1996    172     3
675     1997    172     4
676     1998    172     5
677     1999    172     6
678     2000    172     7
679     2001    172     8
680     2002    172     9
681     2003    172     10

这与我想要的非常接近,因为它使我能够连接到end上的另一个df,同时保持总的years_exp。不幸的是,我在发布我最初的问题时没有意识到间歇性服务的问题;因此,years_exp没有考虑到公共服务方面的差距。这是今天的(第一个)项目。如果任何人有问题、建议或批评,他们都是受欢迎的。你知道吗

我期望的最终结果如下:

    end idthomas    years_exp
0   1994    136      1
1   1995    136      2
2   1996    136      3
3   1997    136      4
4   1998    136      5
5   1999    136      6
6   2000    136      7
7   2001    136      8
8   2002    136      9
9   2003    136      10
10  2004    136      11
11  2005    136      12
12  2006    136      13
13  2007    136      14
14  2008    136      15
15  2009    136      16
16  2010    136      17
17  2011    136      18
18  2012    136      19
19  2013    136      20
20  2014    136      21
21  1994    172      1
22  1995    172      2
23  2001    172      2
24  2002    172      3
25  2003    172      4
26  2004    172      5
27  2005    172      6
28  2006    172      7 
29  2007    172      8
30  2008    172      9
31  2009    172      10
32  2010    172      11
33  2011    172      12
34  2012    172      13
35  2013    172      14
36  2014    172      15

Tags: iddfyearstartheadendterm差距

热门问题