Python2.7将函数应用于pandas数据fram的2列的最快方法

texto textito veo1 0 mma m 1 1 sdas f 0 2 asdsad n 0 3 mma m 1 4 sdas f 0 5 asdsad n 0 6 mma m 1 7 sdas f 0 8 asdsad n 0 9 mma m 1

2条回答

网友

1楼 · 编辑于 2024-09-30 20:33:53

使用理解和zip

t2['veo1'] = [int(a in b) for a, b in zip(t2.textito, t2.texto)]

更好的回答每个@Ninja Puppy

^{pr2}$

更好的答案是每只@Ninja Puppy

from operator import contains;
t2['veo1'] = pd.Series(map(contains, t2.texto, t2.textito), dtype=int)

按照忍者小狗的建议。使用set并检查子集在这种特殊情况下可以使用单个字符串。但是，它也会为'word'中的'www'返回{}，这可能不是您想要的。在

set('www') <= set('word')

True

还有

set('not') <= set('stone')

True

什么时候

'not' in 'stone'

False

定时

特别说明

感谢@Ninja Puppy

请注意，如果我们将理解中的bool值赋给pd.Series，并让一个向量化的操作处理到int的转换，我们可以节省一些时间。在

如果我们导入contains操作符并使用python的map，我们可以获得更高的效率

网友

2楼 · 编辑于 2024-09-30 20:33:53

如果空间充足，可以通过将set应用于原始数据帧来创建新的数据帧。那么成员资格测试将比对字符串使用in快得多。在

# setup
aa=['mma', 'sdas', 'asdsad']*1000
t=pd.DataFrame(aa)
a=['m', 'f', 'n']*1000
t1=pd.DataFrame(a)
df=pd.concat([t,t1], axis=1)
df.columns=['a', 'b']

# new DataFrame  using the set of the relevant columns
df2 = df.applymap(set)
# new column based on the membership test
df['v'] = df2.b <= df2.a

>>> df[:10]
        a  b      v
0     mma  m   True
1    sdas  f  False
2  asdsad  n  False
3     mma  m   True
4    sdas  f  False
5  asdsad  n  False
6     mma  m   True
7    sdas  f  False
8  asdsad  n  False
9     mma  m   True
>>>

定时

相关问题更多 >

编程相关推荐

热门问题

热门文章