Python int比较在pandas中无法正常工作

for identifier in ls: citing = data.set_index('citing') # save data indexed by 'citing' column to local variable try: # handle KeyError exception creation = citing.loc[identifier]['creation'] # this can either be a str or a pandas series if type(creation) == pandas.core.series.Series: if int(creation.iloc[0][:4]) == (int(year))-1 or int(creation.iloc[0][:4]) == (int(year))-2: print('DEBUG: ', creation.iloc[0][:4], 'is == to either {} or {}'.format(str(int(year)-1), str(int(year)-2))) pub.add(identifier) elif type(creation) == str: if int(creation[:4]) == (int(year))-1 or (int(year))-2: print('DEBUG: ', creation[:4], 'is == to either {} or {}'.format(str(int(year)-1), str(int(year)-2))) pub.add(identifier) except KeyError: pass

1条回答

网友

1楼 · 发布于 2024-05-09 02:19:07

您可以（几乎）在一次射击中找到与您的标准匹配的所有行。事实上，这更有效，因为您将在一次操作中针对所有行计算标准，而不是在每个值上循环

ix = df[
    df.creation.astype(str).str[:4].astype(int).isin({year-1, year-2})
  ].index
identifiers = set(df.loc[ix, 'citing'])
pub |= identifiers

更多解释：

.astype(str)->；确保每个值都是str类型，即使是几年（以防万一）

.str->；熊猫的字符串访问器，允许您使用字符串方法（更多信息here）

[:4]->；字符串方法，将允许您捕获前4个字符

.astype(int)->；将整个结果强制转换为int（请注意，如果有缺少值的行，这可能会失败；请参阅下面的解决方法）

.isin(...)->；将允许查看（每行上的）值是否在（…）内

您将获得一个“索引”，它可用于在一次操作中过滤数据帧

如果缺少值，可以从使用df['creation'].fillna("1000", inplace=True)开始

相关问题更多 >

编程相关推荐

热门问题

热门文章