回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我希望根据starttime和endcolumn中的值来“扩展”日期范围。在</p>
<p>如果一个记录的任何部分出现在前一个记录中,我希望返回一个starttime,它是两个starttime记录中的最小值,endtime是这两个endtime记录中的最大值。在</p>
<p>这些将按订单id分组</p>
<pre><code>Order starttime endtime RollingStart RollingEnd
1 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150
1 2015-07-01 10:24:43.137 2015-07-01 10:24:43.200 2015-07-01 10:24:43.047 2015-07-01 10:24:43.200
1 2015-07-01 10:24:43.197 2015-07-01 10:24:57.257 2015-07-01 10:24:43.047 2015-07-01 10:24:57.257
1 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470
1 2015-07-01 10:24:57.730 2015-07-01 10:25:13.485 2015-07-01 10:24:57.465 2015-07-01 10:25:13.485
2 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485
</code></pre>
<p>因此,在上述示例中,订单1的初始范围为2015-07-01 10:24:43.047到2015-07-01 10:24:57.257,然后是2015-07-01 10:24:57.465到2015-07-01 10:25:13.485</p>
<p>注意,虽然starttimes是有序的,但结束时间并不一定是由于数据的性质(有短期事件和长期事件)</p>
<p>最后,我只需要每个orderid的最后一条记录,滚动开始组合(因此在本例中,最后两条记录</p>
<p>我试过了</p>
^{pr2}$
<p>(这显然不包括订单id)</p>
<p>但我收到的错误是</p>
<pre><code>ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
</code></pre>
<p>任何想法都将不胜感激</p>
<p>要复制的代码如下:</p>
<pre><code>from io import StringIO
import io
text = """Order starttime endtime
1 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150
1 2015-07-01 10:24:43.137 2015-07-01 10:24:43.200
1 2015-07-01 10:24:43.197 2015-07-01 10:24:57.257
1 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470
1 2015-07-01 10:24:57.730 2015-07-01 10:25:13.485
2 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485"""
df = pd.read_csv(StringIO(text), sep='\s{2,}', engine='python', parse_dates=[1, 2])
df['RollingStart'] = np.where((df['endtime'] >= df['RollingStart'].shift()) & (df['RollingEnd'].shift()>= df['start']), min(df['starttime'],df['RollingStart']),df['starttime'])
df = pd.read_csv(StringIO(text), sep='\s{2,}', engine='python', parse_dates=[1, 2])
df['RollingStart']=df['starttime']
df['RollingEnd']=df['endtime']
df['RollingStart'] =
np.where((df['endtime'] >= df['RollingStart'].shift()) & (df['RollingEnd'].shift()>= df['starttime']),min(df['starttime'],df['RollingStart']),df['starttime'])
</code></pre>
<p>错误是:</p>
<pre><code>Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 731, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
</code></pre>
<p>谢谢</p>