<p>我认为您需要将<code>header=0</code>更改为select first row to header,然后用list <code>cols</code>替换列名。</p>
<p>如果仍然存在问题,则需要<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_numeric.html" rel="noreferrer">^{<cd3>}</a>,因为<code>StartTime</code>和<code>StopTime</code>中的某些值是字符串,被解析为<code>NaN</code>,替换为<code>0</code>最后一个转换列为<code>int</code>:</p>
<pre><code>cols = ['UserId', 'UserMAC', 'HotspotID', 'StartTime', 'StopTime']
df = pd.read_csv('canada_mini_unixtime.csv', header=0, names=cols)
#print (df)
df['StartTime'] = pd.to_numeric(df['StartTime'], errors='coerce').fillna(0).astype(int)
df['StopTime'] = pd.to_numeric(df['StopTime'], errors='coerce').fillna(0).astype(int)
</code></pre>
<p>无变化:</p>
<pre><code>df['m'] = df.StopTime + df.StartTime
df['d'] = df.StopTime - df.StartTime
start = pd.to_datetime(df.StartTime.min(), unit='s').date()
end = pd.to_datetime(df.StopTime.max(), unit='s').date() + pd.Timedelta(days=1)
freq = '1H' # 1 Hour frequency
idx = pd.date_range(start, end, freq=freq)
r = pd.DataFrame(index=idx)
r['start'] = (r.index - pd.datetime(1970,1,1)).total_seconds().astype(np.int64)
# 1 hour in seconds, minus one second (so that we will not count it twice)
interval = 60*60 - 1
r['LogCount'] = 0
r['UniqueIDCount'] = 0
</code></pre>
<p><code>ix</code>在上一版本的pandas中不推荐使用,因此请使用<code>loc</code>,并且列名在<code>[]</code>:</p>
<pre><code>for i, row in r.iterrows():
# intervals overlap test
# https://en.wikipedia.org/wiki/Interval_tree#Overlap_test
# i've slightly simplified the calculations of m and d
# by getting rid of division by 2,
# because it can be done eliminating common terms
u = df.loc[np.abs(df.m - 2*row.start - interval) < df.d + interval, 'UserId']
r.loc[i, ['LogCount', 'UniqueIDCount']] = [len(u), u.nunique()]
r['Date'] = pd.to_datetime(r.start, unit='s').dt.date
r['Day'] = pd.to_datetime(r.start, unit='s').dt.weekday_name.str[:3]
r['StartTime'] = pd.to_datetime(r.start, unit='s').dt.time
r['EndTime'] = pd.to_datetime(r.start + interval + 1, unit='s').dt.time
print (r)
</code></pre>