Just loop over the dataframe indexes. If the current df element is in the list, remove it from the list. When the list becomes empty, the current index is the answer.
def idxall(series, elements):
for i, e in enumerate(series.to_numpy()): # faster than series.items()
if e in elements:
elements.remove(e)
if not elements:
return i + 1
return np.nan
import numpy as np
import pandas as pd
elements = ['a', 'b', 'c', 'd']
df = pd.DataFrame({
'mycol': ['a', 'x', 'y', 'e', 'b', 'c', 'o', 'l', 's', 'd', 'g']
})
# Find the first location where each unique value is found
a, b = np.unique(df['mycol'], return_index=True)
# Compare unique values to values we're looking for
m = (a == np.array(elements)[:, None])
# If we have a location for all elements
if m.any(axis=1).all():
# Find the highest index value
max_index = b[m.any(axis=0)].max()
# Offset index by one to match expected output
print('All values found by', max_index + 1)
else:
# We couldn't find all elements
print('Not all elements found.')
x = df['mycol'].drop_duplicates().isin(elements).cumsum().eq(len(elements))
if x.any():
print(x.idxmax())
else:
print("Not all values are in the dataframe")
这是值得考虑的。我无法用更大的测试数据得到更奇特的索引答案,但Barmar的循环应该是可靠的:
计时
给定
df = pd.DataFrame({'mycol': np.random.choice(list(string.ascii_lowercase), size=1000)})
:验证
使用OP的样本:
使用Henry的样本#1(混合顺序和重复):
使用Henry的样本#2(未找到所有元素):
我们可以使用^{} 和
return_index=True
来查找每个唯一值的第一个实例:具有混合顺序和重复项的示例:
未找到所有元素的示例:
试试
idxmax
:编辑:
要指定数据框中元素的所有值,请尝试:
对于当前数据帧:
对于并非所有值都在数据帧中的数据帧:
相关问题 更多 >
编程相关推荐