<p><a href="https://stackoverflow.com/questions/53500872/how-to-clean-up-datetime-strings-in-dataframe-after-export-from-excel-sheet/53502032#comment93925937_53501894">Andrew observed</a>可以通过翻转<em>所有</em>月和日来修复数据帧,这样做会产生一个有效的日期。在</p>
<p>这里有一个快速的方法来“翻转”所有的日期。无效的日期被强制转换为NaT(非时间戳)值,然后被删除。剩余的翻转日期可以重新分配给<code>df</code>:</p>
<pre><code>import pandas as pd
df = pd.read_excel('2016_Bike_Share_Toronto_Ridership_Q4.xlsx')
for col in ['trip_start_time', 'trip_stop_time']:
df[col] = pd.to_datetime(df[col])
swapped = pd.to_datetime({'year':df[col].dt.year,
'month':df[col].dt.day,
'day':df[col].dt.month,
'hour':df[col].dt.hour,
'minute':df[col].dt.minute,
'second':df[col].dt.second,}, errors='coerce')
swapped = swapped.dropna()
mask = swapped.index
df.loc[mask, col] = swapped
# check that now all dates are in 2016Q4
for col in ['trip_start_time', 'trip_stop_time']:
mask = (pd.PeriodIndex(df[col], freq='Q') == '2016Q4')
assert mask.all()
# check that `trip_start_times` are in chronological order
assert (df['trip_start_time'].diff().dropna() >= pd.Timedelta(0)).all()
# check that `trip_stop_times` are always greater than `trip_start_times`
assert ((df['trip_stop_time']-df['trip_start_time']).dropna() >= pd.Timedelta(0)).all()
</code></pre>
<p>上面的assert语句验证了结果日期都在2016Q4中,<code>trip_start_times</code>是按时间顺序排列的,并且{<cd3>}总是大于其关联的{<cd2>}。在</p>