这个问题是另一个question的扩展,但方法不同。我有以下两个dfs:
(if someone can show me a more efficient way of creating the df below,instead of writing it out by hand, that would be great)
yrs = pd.DataFrame({'years': [1950, 1951, 1952, 1953, 1954, 1955, \
1956, 1957,1958,1959,1960,1961,1962,1963,1964,1965,1967,1968,1969,\
1970,1971,1972,1973,1974,1975,1976,10977,1978,1979,1980,1981,1982,\
1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,\
1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,\
2009,2010,2011,2012,2013,2014]}, index=[1,2,3,4,5,6,7,8,9,10,11,12,\
13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,\
35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,51,52,53,54,55,56,57,\
58,59,60,61,62,63,64,65])
yrs
years
1 1950
2 1951
3 1952
4 1953
5 1954
........
58 2007
59 2008
60 2009
61 2010
62 2011
63 2012
64 2013
65 2014
dfyears.head(30).to_dict()
{'end': {0: 1995,1: 1997,2: 1999,3: 2001,4: 2003,5: 2005,6: 2007,07: 2013,
8: 2014,9: 1995,10: 2007,11: 2013,12: 2014,13: 1989,14: 1991, 15: 1993,
16: 1995,17: 1997,18: 1999,19: 2001,20: 2003,21: 2005,22: 2007,23: 2013,
24: 2014,25: 1985,26: 1987,27: 1989,28: 1991,29: 1993},'idthomas': {0: 136,1: 136,2: 136,3: 136,4: 136,5: 136,6: 136,7: 136,8: 136,9: 172,10: 172,
11: 172,12: 172,13: 174,14: 174,15: 174,16: 174,17: 174,18: 174,19: 174,
20: 174, 21: 174,22: 174,23: 174,24: 174,25: 179,26: 179,27: 179,28: 179,
29: 179}, 'start': {0: 1993,1: 1995,2: 1997,3: 1999,4: 2001,5: 2003,6: 2005,7: 2007,8: 2013,9: 1993,10: 2001,11: 2007,12: 2013,13: 1987,14: 1989,
15: 1991,16: 1993,17: 1995,18: 1997, 19: 1999,20: 2001,21: 2003, 22: 2005,
23: 2007,24: 2013, 25: 1983,26: 1985,27: 1987,28: 1989,29: 1991}}
dfyears.head(30)
end start idthomas
0 1995 1993 136
1 1997 1995 136
2 1999 1997 136
3 2001 1999 136
4 2003 2001 136
5 2005 2003 136
6 2007 2005 136
7 2013 2007 136
8 2014 2013 136
9 1995 1993 172
10 2007 2001 172
11 2013 2007 172
12 2014 2013 172
我想在yrs中创建一个column == served
,根据column == years
中对应的值是>= start
还是<= end
返回1或0,同时创建一个column == idthomas
,从对应于所应用条件的行返回idthomas value
。下面是我想要的一个例子:
years served idthomas
1 1950 0 136
2 1951 0 136
3 1952 0 136
4 1953 0 136
5 1954 0 136
...................
43 1993 1 136
44 1994 1 136
45 1995 1 136
46 1996 1 136
47 1997 1 136
48 1998 1 136
49 1999 1 136
51 2000 1 136
52 2001 1 136
53 2002 1 136
54 2003 1 136
55 2004 1 136
56 2005 1 136
57 2006 1 136
58 2007 1 136
59 2008 1 136
60 2009 1 136
61 2010 1 136
62 2011 1 136
63 2012 1 136
64 2013 1 136
65 2014 1 136
66 1950 0 172
67 1951 0 172
68 1952 0 172
69 1953 0 172
70 1954 0 172
...................
72 1993 1 172
73 1994 1 172
74 1995 1 172
75 1996 0 172
76 1997 0 172
77 1998 0 172
78 1999 0 172
79 2000 0 172
80 2001 1 172
81 2002 1 172
82 2003 1 172
83 2004 1 172
84 2005 1 172
85 2006 1 172
86 2007 1 172
87 2008 1 172
88 2009 1 172
89 2010 1 172
90 2011 1 172
91 2012 1 172
92 2013 1 172
93 2014 1 172
我输入了“某物”来编码这个。这是令人尴尬的粗糙:
uu=dfyears.groupby('idthomas')
yrs['did_service'] == 1 if:
# somewhere in the next line I think that I need to do some sort of
# tuple so that I can grab the value in the 'idthomas' column that
# is associated with the comparison that I am doing.
x in years >= uu.start | x in years <= uu.end
else == 0
如果这不起作用,那么我将手工做这项工作。我只会问,如果有人尝试,但没有能力,那么只要让我知道,这样我就可以有一个想法的生命力的想法。你知道吗
我可以帮助你处理时间序列,你不需要手工输入数据,这里是你可以做的。你知道吗
或者如果你想有个月、天和完整的日期,也可以去掉
.strftime()
。你知道吗为了运行你所描述的逻辑,我在想np.哪里可能工作正常,比如(未测试)
但是,至少根据您的示例,这并不能解决您希望向yrs添加新行的问题。你知道吗
我知道这不是一个完整的答案,但我希望它在某种程度上有所帮助。你知道吗
相关问题 更多 >
编程相关推荐