如何用唯一数据填充空列行?

2024-09-29 23:16:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我想用随机值填充没有数据的列。你知道吗

853                           None
854                   cheese empty
855                   cheese other
856                   yogurt empty
857                   yogurt other
858                   yogurt empty
859                   yogurt other
860                   butter empty
861                   butter other
862                           None
863                           None

想要得到这样的东西:

853                           ASDFGHJAS
854                         cheese empty
855                         cheese other
856                         yogurt empty
857                         yogurt other
858                         yogurt empty
859                         yogurt other
860                         butter empty
861                         butter other
862                           DFGHJRTYT
863                           ERTYUIOIO
864                           TYUIOPPWE
865                           QWERTYUUI
866                           CBNMTYUIO

我试过这样做:

df1 = df[['english_name']].fillna(''.join(choice(ascii_uppercase) for i in range(12)), axis=1)



853                          ASDFGHJAS
854                         cheese empty
855                         cheese other
856                         yogurt empty
857                         yogurt other
858                         yogurt empty
859                         yogurt other
860                         butter empty
861                         butter other
862                           ASDFGHJAS
863                           ASDFGHJAS
864                           ASDFGHJAS
865                           ASDFGHJAS
866                           ASDFGHJAS

问题是每一行的值都相同,并且每一行都需要唯一的随机值。你知道吗


Tags: 数据nonedfemptydf1otherbuttercheese
3条回答

使用lambdaapply随机选择nan值。你知道吗

In [243]: df[['english_name']].apply(lambda x: x.fillna(''.join(choice(ascii_upper
     ...: case) for i in range(12))), axis=1)
Out[243]:
     english_name
853  BIZLLWLFGUSD
854  cheese empty
855  cheese other
856  yogurt empty
857  yogurt other
858  yogurt empty
859  yogurt other
860  butter empty
861  butter other
862  NMHDRQMTWZXF
863  EGPCZFWEDOFR

或者,用随机名称预先创建一个长度相同的序列,然后使用df.name.fillna(s)

In [259]: s = pd.Series([''.join(choice(ascii_uppercase) for i in range(12)) for _
     ...:  in range(len(df))], index=df.index)

In [260]: df.english_name.fillna(s)
Out[260]:
853    BRFERJPGVDXP
854    cheese empty
855    cheese other
856    yogurt empty
857    yogurt other
858    yogurt empty
859    yogurt other
860    butter empty
861    butter other
862    NYYTRCSSCPWT
863    ZYBNJQIPIWEF
Name: english_name, dtype: object

使用this answer,可以定义一个函数来返回给定大小的随机字符串:

def random_string(N=9):
    return ''.join(random.SystemRandom().choice(string.ascii_uppercase) for _ in range(N))


df[['english_name']].apply(lambda x: x.fillna(random_string()),axis=1)

具有多个列的数据帧的通用解决方案

df = pd.DataFrame([
        ['a', np.nan, 'b'],
        [np.nan, 'c', np.nan],
        ['d', np.nan, 'e'],
        [np.nan, 'f', np.nan]
    ])

     0    1    2
0    a  NaN    b
1  NaN    c  NaN
2    d  NaN    e
3  NaN    f  NaN

  • 堆栈df以获取序列
  • 计数空值

dfs = df.stack(dropna=False)
wherenull = dfs.isnull().values
n = wherenull.sum()

生成填充值

np.random.seed([3,1415])
fills = pd.DataFrame(
    np.random.choice(
        list(ascii_uppercase),
        (n, 12)
    )).sum(1).values

缺少填充

dfs.loc[wherenull] = fills
dfs.unstack()

              0             1             2
0             a  QLCKPXNLNTIX             b
1  AWYMWACAUZHT             c  NSMEDTNWHXNU
2             d  FDXFZLYHMGEH             e
3  WSOGGOVSIXKF             f  PYEPNHGRMMPO

相关问题 更多 >

    热门问题