在dataframe中查找列表的所有元素的位置有多低

3条回答

网友

1楼 · 编辑于 2024-09-26 22:51:19

这是值得考虑的。我无法用更大的测试数据得到更奇特的索引答案，但Barmar的循环应该是可靠的：

Just loop over the dataframe indexes. If the current df element is in the list, remove it from the list. When the list becomes empty, the current index is the answer.

def idxall(series, elements):
    for i, e in enumerate(series.to_numpy()): # faster than series.items()
        if e in elements:
            elements.remove(e)
            if not elements:
                return i + 1
    return np.nan

计时

给定df = pd.DataFrame({'mycol': np.random.choice(list(string.ascii_lowercase), size=1000)})：

%timeit tdy_idxall(df.mycol, list(string.ascii_lowercase))
# 21.4 µs ± 7.44 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit henry_ecker_np_unique(df.mycol, list(string.ascii_lowercase))
# 379 µs ± 48.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit u12_forward_idxmax(df.mycol, list(string.ascii_lowercase)
# 538 µs ± 61.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit corralien_idxall(df.mycol, list(string.ascii_lowercase))
# 1.28 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

验证

使用OP的样本：

df = pd.DataFrame({'mycol': list('axyebcolsdg')})
elements = list('abcd')

idxall(df.mycol, elements)
# 10

使用Henry的样本#1（混合顺序和重复）：

df = pd.DataFrame({'mycol': list('dxcabcodsdg')})
elements = list('abcd')

idxall(df.mycol, elements)
# 5

使用Henry的样本#2（未找到所有元素）：

df = pd.DataFrame({'mycol': list('dxcabcodsdg')})
elements = list('abcz')

idxall(df.mycol, elements)
# nan

网友

2楼 · 编辑于 2024-09-26 22:51:19

我们可以使用^{}和return_index=True来查找每个唯一值的第一个实例：

import numpy as np
import pandas as pd

elements = ['a', 'b', 'c', 'd']
df = pd.DataFrame({
    'mycol': ['a', 'x', 'y', 'e', 'b', 'c', 'o', 'l', 's', 'd', 'g']
})

# Find the first location where each unique value is found
a, b = np.unique(df['mycol'], return_index=True)
# Compare unique values to values we're looking for
m = (a == np.array(elements)[:, None])
# If we have a location for all elements
if m.any(axis=1).all():
    # Find the highest index value
    max_index = b[m.any(axis=0)].max()
    # Offset index by one to match expected output
    print('All values found by', max_index + 1)
else:
    # We couldn't find all elements
    print('Not all elements found.')

All values found by 10

具有混合顺序和重复项的示例：

elements = ['a', 'b', 'c', 'd']
df = pd.DataFrame({
    'mycol': ['d', 'x', 'c', 'a', 'b', 'c', 'o', 'd', 's', 'd', 'g']
})

   mycol
0      d
1      x
2      c
3      a
4      b
5      c
6      o
7      d
8      s
9      d
10     g
All values found by 5

未找到所有元素的示例：

elements = ['a', 'b', 'c', 'z']
df = pd.DataFrame({
    'mycol': ['d', 'x', 'c', 'a', 'b', 'c', 'o', 'd', 's', 'd', 'g']
})

   mycol
0      d
1      x
2      c
3      a
4      b
5      c
6      o
7      d
8      s
9      d
10     g
Not all elements found.  # (No z)

网友

3楼 · 编辑于 2024-09-26 22:51:19

试试idxmax：

>>> df['mycol'].isin(elements)[::-1].idxmax()
9
>>>

编辑：

要指定数据框中元素的所有值，请尝试：

x = df['mycol'].drop_duplicates().isin(elements).cumsum().eq(len(elements))
if x.any():
    print(x.idxmax())
else:
    print("Not all values are in the dataframe")

对于当前数据帧：

对于并非所有值都在数据帧中的数据帧：

Not all values are in the dataframe

计时

验证

相关问题更多 >

编程相关推荐

热门问题

热门文章

在dataframe中查找列表的所有元素的位置有多低

计时

验证

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >