我有以下节选的dataframe:
{'end': {0: 1995, 1: 1997, 2: 1999, 3: 2001, 4: 2003, 5: 2005, 6: 2007, 7: 2013, 8: 2014, 9: 1995, 10: 2007, 11: 2013, 12: 2014, 13: 1989,
14: 1991, 15: 1993, 16: 1995, 17: 1997, 18: 1999, 19: 2001, 20: 2003,
21: 2005, 22: 2007, 23: 2013, 24: 2014, 25: 1985, 26: 1987, 27: 1989, 28: 1991, 29: 1993}, 'idthomas': {0: 136, 1: 136, 2: 136, 3: 136, 4:136,
5: 136, 6: 136, 7: 136, 8: 136, 9: 172, 10: 172, 11: 172, 12: 172, 13: 174, 14: 174, 15: 174, 16: 174, 17: 174, 18: 174, 19: 174, 20: 174, 21: 174, 22: 174, 23: 174, 24: 174, 25: 179, 26: 179, 27: 179, 28: 179,
29: 179}, 'start': {0: 1993, 1: 1995, 2: 1997, 3: 1999, 4: 2001, 5: 2003, 6: 2005, 7: 2007, 8: 2013, 9: 1993, 10: 2001, 11: 2007, 12: 2013, 13: 1987, 14: 1989, 15: 1991, 16: 1993, 17: 1995, 18: 1997, 19: 1999, 20: 2001, 21: 2003, 22: 2005, 23: 2007, 24: 2013, 25: 1983, 26: 1985, 27: 1987, 28: 1989, 29: 1991}}
df_oddyears.head()
end start idthomas
0 1995 1993 136
1 1997 1995 136
2 1999 1997 136
3 2001 1999 136
4 2003 2001 136
5 2005 2003 136
6 2007 2005 136
7 2013 2007 136
8 2014 2013 136
9 1995 1993 172
10 2007 2001 172
11 2013 2007 172
12 2014 2013 172
13 1989 1987 174
14 1991 1989 174
它代表了美国立法者和国会的条款。有一些不方便的违规行为:start
和end
日期表示任期的开始和结束,并将有2年或6年的差异,这取决于立法者是在众议院还是参议院任职。所有的立法者都有一个独特的idthomas
,如果他们愿意的话,可以从众议院转到参议院。有时立法委员不能连任,这就造成了他们的服务差距。看看idthomas == 172
你可以看到end == 1995
和start == 2001
之间的差距。你知道吗
我需要计算从立法会议员服务期开始到立法会议员服务期结束的每一年的积极累积公共服务年数,甚至包括年数。在下一步中,我将合并这个df与另一个df沿年,因此我需要偶数年和奇数年的现役服务。你知道吗
这是我在深入了解问题之前所发现的:
df_oddyears['end']=df_oddyears['end'].map(lambda x: str(x)[:-6])
df_oddyears['start']=df_oddyears['start'].map(lambda x: str(x[:-6]))
df_oddyears['end'] = df_oddyears['end'].astype('int')
df_oddyears['start'] = df_oddyears['start'].astype('int')
df_oddyears['end'] = df_oddyears['end'].clip_upper(2014)
df_oddyears['term'] = df_oddyears.end - df_oddyears.start
df_oddyears['years_exp']=df_oddyears.groupby(['id.thomas']).term.cumsum()
df_oddyears.rename(columns={'id.thomas':'idthomas'},inplace=True)
df_oddyears.head()
end start idthomas term years_exp
0 1995 1993 136 2 2
1 1997 1995 136 2 4
2 1999 1997 136 2 6
3 2001 1999 136 2 8
4 2003 2001 136 2 10
5 2005 2003 136 2 12
6 2007 2005 136 2 14
7 2013 2007 136 6 20
8 2014 2013 136 1 21
9 1995 1993 172 2 2
10 2007 2001 172 6 8
11 2013 2007 172 6 14
12 2014 2013 172 1 15
{'end': {0: 1995, 1: 1997, 2: 1999, 3: 2001, 4: 2003, 5: 2005, 6: 2007,
7: 2013, 8: 2014, 9: 1995, 10: 2007, 11: 2013, 12: 2014, 13: 1989,
14: 1991, 15: 1993, 16: 1995, 17: 1997, 18: 1999, 19: 2001, 20: 2003,
21: 2005, 22: 2007, 23: 2013, 24: 2014, 25: 1985, 26: 1987, 27: 1989,
28: 1991, 29: 1993}, 'idthomas': {0: 136, 1: 136, 2: 136, 3: 136,
4: 136, 5: 136, 6: 136, 7: 136, 8: 136, 9: 172, 10: 172, 11: 172, 12: 172, 13: 174, 14: 174, 15: 174, 16: 174, 17: 174, 18: 174, 19: 174,
20: 174, 21: 174, 22: 174, 23: 174, 24: 174, 25: 179, 26: 179, 27: 179, 28: 179, 29: 179},'start': {0: 1993, 1: 1995, 2: 1997, 3: 1999, 4: 2001, 5: 2003, 6: 2005, 7: 2007, 8: 2013, 9: 1993, 10: 2001, 11: 2007, 12: 2013, 13: 1987, 14: 1989, 15: 1991, 16: 1993, 17: 1995, 18: 1997, 19: 1999, 20: 2001, 21: 2003, 22: 2005, 23: 2007, 24: 2013, 25: 1983, 26: 1985, 27: 1987, 28: 1989, 29: 1991},'term': {0: 2, 1: 2, 2: 2, 3: 2, 4: 2, 5: 2, 6: 2, 7: 6, 8: 1, 9: 2, 10: 6, 11: 6, 12: 1, 13: 2, 14: 2, 15: 2, 16: 2, 17: 2, 18: 2, 19: 2, 20: 2, 21: 2, 22: 2, 23: 6,
24: 1, 25: 2, 26: 2, 27: 2, 28: 2, 29: 2},'years_exp': {0: 2, 1: 4,
2: 6, 3: 8, 4: 10, 5: 12, 6: 14, 7: 20, 8: 21, 9: 2, 10: 8, 11: 14,
12: 15, 13: 2, 14: 4, 15: 6, 16: 8, 17: 10, 18: 12, 19: 14, 20: 16,
21: 18, 22: 20, 23: 26, 24: 27, 25: 2, 26: 4, 27: 6, 28: 8, 29: 10}}
然后我df=df_oddyears.drop(['start', 'term'], axis=1, inplace=False)
并从here实现以下代码
final_year = 2014
df= pd.DataFrame([(year, id_, n)
for id_, end, years_exp in df.groupby('idthomas').first().itertuples()
for n, year in enumerate(range(end, final_year + 1), years_exp)],
columns=['end', 'idthomas', 'years_exp'])
df.head()
end idthomas years_exp
673 1995 172 2
674 1996 172 3
675 1997 172 4
676 1998 172 5
677 1999 172 6
678 2000 172 7
679 2001 172 8
680 2002 172 9
681 2003 172 10
这与我想要的非常接近,因为它使我能够连接到end
上的另一个df,同时保持总的years_exp
。不幸的是,我在发布我最初的问题时没有意识到间歇性服务的问题;因此,years_exp
没有考虑到公共服务方面的差距。这是今天的(第一个)项目。如果任何人有问题、建议或批评,他们都是受欢迎的。你知道吗
我期望的最终结果如下:
end idthomas years_exp
0 1994 136 1
1 1995 136 2
2 1996 136 3
3 1997 136 4
4 1998 136 5
5 1999 136 6
6 2000 136 7
7 2001 136 8
8 2002 136 9
9 2003 136 10
10 2004 136 11
11 2005 136 12
12 2006 136 13
13 2007 136 14
14 2008 136 15
15 2009 136 16
16 2010 136 17
17 2011 136 18
18 2012 136 19
19 2013 136 20
20 2014 136 21
21 1994 172 1
22 1995 172 2
23 2001 172 2
24 2002 172 3
25 2003 172 4
26 2004 172 5
27 2005 172 6
28 2006 172 7
29 2007 172 8
30 2008 172 9
31 2009 172 10
32 2010 172 11
33 2011 172 12
34 2012 172 13
35 2013 172 14
36 2014 172 15
目前没有回答
相关问题 更多 >
编程相关推荐