TypeError:不支持python 3.x Anacond中-：'str'和'str'的操作数类型问题的回答

TypeError:不支持python 3.x Anacond中-：'str'和'str'的操作数类型

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我试图在一个大数据集中计算每小时的一些实例。下面的代码似乎在Python2.7上运行良好，但我不得不将其升级到3.x最新版本的python，并在Anaconda上更新了所有包。当我试图执行程序时，我得到了如下的<code>str</code>错误 代码： <pre><code>import pandas as pd from datetime import datetime,time import numpy as np fn = r'00_input.csv' cols = ['UserId', 'UserMAC', 'HotspotID', 'StartTime', 'StopTime'] df = pd.read_csv(fn, header=None, names=cols) df['m'] = df.StopTime + df.StartTime df['d'] = df.StopTime - df.StartTime # 'start' and 'end' for the reporting DF: `r` # which will contain equal intervals (1 hour in this case) start = pd.to_datetime(df.StartTime.min(), unit='s').date() end = pd.to_datetime(df.StopTime.max(), unit='s').date() + pd.Timedelta(days=1) # building reporting DF: `r` freq = '1H' # 1 Hour frequency idx = pd.date_range(start, end, freq=freq) r = pd.DataFrame(index=idx) r['start'] = (r.index - pd.datetime(1970,1,1)).total_seconds().astype(np.int64) # 1 hour in seconds, minus one second (so that we will not count it twice) interval = 60*60 - 1 r['LogCount'] = 0 r['UniqueIDCount'] = 0 for i, row in r.iterrows(): # intervals overlap test # https://en.wikipedia.org/wiki/Interval_tree#Overlap_test # i've slightly simplified the calculations of m and d # by getting rid of division by 2, # because it can be done eliminating common terms u = df[np.abs(df.m - 2*row.start - interval) < df.d + interval].UserID r.ix[i, ['LogCount', 'UniqueIDCount']] = [len(u), u.nunique()] r['Date'] = pd.to_datetime(r.start, unit='s').dt.date r['Day'] = pd.to_datetime(r.start, unit='s').dt.weekday_name.str[:3] r['StartTime'] = pd.to_datetime(r.start, unit='s').dt.time r['EndTime'] = pd.to_datetime(r.start + interval + 1, unit='s').dt.time #r.to_csv('results.csv', index=False) #print(r[r.LogCount > 0]) #print (r['StartTime'], r['EndTime'], r['Day'], r['LogCount'], r['UniqueIDCount']) rout = r[['Date', 'StartTime', 'EndTime', 'Day', 'LogCount', 'UniqueIDCount'] ] #print rout rout.to_csv('o_1_hour.csv', index=False, header=False </code></pre> （第页） 在何处进行更改以获得无错误执行 错误： <pre><code>File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\ops.py", line 686, in <lambda> lambda x: op(x, rvalues)) TypeError: unsupported operand type(s) for -: 'str' and 'str' </code></pre> 谢谢你的帮助，提前谢谢

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我认为您需要将<code>header=0</code>更改为select first row to header，然后用list <code>cols</code>替换列名。 如果仍然存在问题，则需要<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_numeric.html" rel="noreferrer">^{<cd3>}</a>，因为<code>StartTime</code>和<code>StopTime</code>中的某些值是字符串，被解析为<code>NaN</code>，替换为<code>0</code>最后一个转换列为<code>int</code>： <pre><code>cols = ['UserId', 'UserMAC', 'HotspotID', 'StartTime', 'StopTime'] df = pd.read_csv('canada_mini_unixtime.csv', header=0, names=cols) #print (df) df['StartTime'] = pd.to_numeric(df['StartTime'], errors='coerce').fillna(0).astype(int) df['StopTime'] = pd.to_numeric(df['StopTime'], errors='coerce').fillna(0).astype(int) </code></pre> 无变化： <pre><code>df['m'] = df.StopTime + df.StartTime df['d'] = df.StopTime - df.StartTime start = pd.to_datetime(df.StartTime.min(), unit='s').date() end = pd.to_datetime(df.StopTime.max(), unit='s').date() + pd.Timedelta(days=1) freq = '1H' # 1 Hour frequency idx = pd.date_range(start, end, freq=freq) r = pd.DataFrame(index=idx) r['start'] = (r.index - pd.datetime(1970,1,1)).total_seconds().astype(np.int64) # 1 hour in seconds, minus one second (so that we will not count it twice) interval = 60*60 - 1 r['LogCount'] = 0 r['UniqueIDCount'] = 0 </code></pre> <code>ix</code>在上一版本的pandas中不推荐使用，因此请使用<code>loc</code>，并且列名在<code>[]</code>： <pre><code>for i, row in r.iterrows(): # intervals overlap test # https://en.wikipedia.org/wiki/Interval_tree#Overlap_test # i've slightly simplified the calculations of m and d # by getting rid of division by 2, # because it can be done eliminating common terms u = df.loc[np.abs(df.m - 2*row.start - interval) < df.d + interval, 'UserId'] r.loc[i, ['LogCount', 'UniqueIDCount']] = [len(u), u.nunique()] r['Date'] = pd.to_datetime(r.start, unit='s').dt.date r['Day'] = pd.to_datetime(r.start, unit='s').dt.weekday_name.str[:3] r['StartTime'] = pd.to_datetime(r.start, unit='s').dt.time r['EndTime'] = pd.to_datetime(r.start + interval + 1, unit='s').dt.time print (r) </code></pre>

TypeError:不支持python 3.x Anacond中-：'str'和'str'的操作数类型

1 个回答

相关Python问题