在每行上随机选择一个元素

2024-09-28 17:29:18 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个带有3列的数据帧,如下所示:

     col1     col2     col3
0  banana1  banana2  banana2
1   apple1   apple2   apple3
2  monkey1  monkey2  monkey3
3  iphone1  iphone2  iphone3
4  runner1  runner2  runner3
5     pig1     pig2     pig3
6    wifi1    wifi2    wifi3
7    girl1    girl2    girl3
8     boy1     boy2     boy3
9  couple1  couple2  couple3

如何在每行的3个元素中随机选择1个元素并将其附加到一个新的数据帧中,我希望它循环N次,然后继续在新行的3个元素中添加1个元素并循环N次?你知道吗

这有点难以解释,所以我要用一个例子来解释:

import pandas as pd

data = {'col1': ['banana1', 'apple1', 'monkey1', 'iphone1', 'runner1', 'pig1', 'wifi1', 'girl1', 'boy1', 'couple1'],
        'col2': ['banana2', 'apple2', 'monkey2', 'iphone2', 'runner2', 'pig2', 'wifi2', 'girl2', 'boy2', 'couple2'],
        'col3': ['banana2', 'apple3', 'monkey3', 'iphone3', 'runner3', 'pig3', 'wifi3', 'girl3', 'boy3', 'couple3']}
df = pd.DataFrame(data, columns=['col1', 'col2' , 'col3'])

所以我想做的是为每一行随机选择item1item2item3并将其附加到新数据帧中的新行,当第10项被选中时,我希望它重新开始执行N次,然后移到新数据帧中的新行并循环N次。最终得到这样的结果(随机性):

    1       2      3       4       5       6    7     8     9    10       11      12     13      14      15      16   17    18    19   20
    banana3 apple2 monkey1 iphone2 runner2 pig1 wifi2 girl3 boy1 couple1  banana1 apple2 monkey2 iphone3 runner3 pig3 wifi2 girl1 boy1 couple3
    ........................................................................................................................................... 
    ...........................................................................................................................................
    ...........................................................................................................................................
    banana1 apple2 monkey2 iphone3 runner1 pig2 wifi3 girl1 boy3 couple2  banana2 apple1 monkey2 iphone2 runner2 pig1 wifi2 girl3 boy1 couple2

在这个输出中,我在每行上选择了1/3的循环,在新的数据帧中将它循环了2次到N行。你知道吗

我的尝试:

我想通过一个函数来实现它,这个函数将根据n和n给出所需的输出

new_df = []

def rand_element_selection(n,N):
    for row in df.iterrows: 
        element_holder = df.sample(1)
        new_df.append(placeholder)

上面没有定义nN,因为我在努力前进。。你知道吗


Tags: 数据dfrunner2monkey2banana2apple2banana1couple2
2条回答

连接主要来自EdChum's answer

n=3
N=2
df_list=[]
for i in range(n):
    df_list.append(pd.concat([df.apply(np.random.choice, axis=1) for i in range(N)], ignore_index=True))
new_df = pd.concat(df_list, axis=1, ignore_index=True).T

IIUC您可以通过在axis=1上调用sample并转置:

In [172]:
n=3
N=2
df_list=[]
for i in range(n):
    df_list.append(pd.concat([df.sample(1, axis=1).T.reset_index(drop=True) for j in range(N)], axis=1, ignore_index=True))
pd.concat(df_list, ignore_index=True)    

Out[172]:
        0       1        2        3        4     5      6      7     8   \
0  banana2  apple3  monkey3  iphone3  runner3  pig3  wifi3  girl3  boy3   
1  banana2  apple2  monkey2  iphone2  runner2  pig2  wifi2  girl2  boy2   
2  banana2  apple2  monkey2  iphone2  runner2  pig2  wifi2  girl2  boy2   

        9        10      11       12       13       14    15     16     17  \
0  couple3  banana2  apple3  monkey3  iphone3  runner3  pig3  wifi3  girl3   
1  couple2  banana1  apple1  monkey1  iphone1  runner1  pig1  wifi1  girl1   
2  couple2  banana2  apple3  monkey3  iphone3  runner3  pig3  wifi3  girl3   

     18       19  
0  boy3  couple3  
1  boy1  couple1  
2  boy3  couple3  

相关问题 更多 >