在一些fi之后从现有数据帧获取数据帧

2024-10-05 10:12:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个这样的数据框。你知道吗

Aman Aggarwal      Amar Jannela   Vipin Kumar       Roshan Pati
BlackBuck          DJ CHETAS      WOW Editions      MensXP
Transport/Freight  Musician/Band  Furniture         News/Media Website
Like               Like           Like              Like
NaN                NaN            NaN               NaN   
GiveMeSport        NaN            500 Startups      No Abuse KG
News/Media Website Celina Jaitly  Internet/Software Community
Like               Actor/Director Like              Liked
NaN                Like           NaN               NaN
NaN                NaN            Jitendra Kumar    Monogatari Series
Anushka Sharma     Durjoy Datta   Actor/Director    TV Show
Actor/Director     Author         Liked             Like
Like               Like           NaN               NaN
NaN                NaN            NaN               NaN

很明显,原始csv文件中有空行。我必须从中提取两个数据帧本栏name作为新datafra中每一行的第一个元素,page\u name(BlackBuck)元素作为相应行的下一个元素。像这样的。你知道吗

Aman Aggarwal     BlackBuck        GiveMeSport    Anushka Sharma 
Amar Jannela      DJ CHETAS        Celina Jaitly  Durjoy Datta 
Vipin Kumar       WOW Editions     500 Startups   Jitendra Kumar
Roshan Pati       MensXP           No Abuse KGP   Monogatari Series

第二个数据帧也类似这样

Aman Aggarwal   Transport/Freight  News/Media Website  Actor/Director
Amar Jannela       Musician/Band      Actor/Director          Author
Vipin Kumar           Furniture   Internet/Software  Actor/Director
Roshan Pati  News/Media Website           Community         TV Show

真正的问题是存在任意的NaN值,有些地方bank可能也喜欢,但唯一的问题是名称(BlackBuck)和类别(Transport/Freight)是相同的一起。自从我的coe无法识别哪个是页面名称,哪个是类别。因此,我可能必须首先为每一列分别删除NaN值和Like,Like,然后相应地对齐和转置。如何在python2.7中有效地实现这一点。你知道吗


Tags: 数据websitenanmedialikenewsactordirector
1条回答
网友
1楼 · 发布于 2024-10-05 10:12:03

很明显,您必须逐列执行,因为名称和类别没有对齐。我使用apply逐列工作,并过滤掉null值或字符串列表中的值以避免:

filter = ['Like', 'Liked']

df.apply(lambda column: 
    column[~(column.isnull() | column.isin(filter))].reset_index(drop=True)
)

请注意,这也会起作用,但我不太相信:

import numpy as np
filter = [np.nan, 'Like', 'Liked']

df.apply(lambda column: column[~column.isin(filter)].reset_index(drop=True))

输出:

        Aman Aggarwal    Amar Jannela        Vipin Kumar         Roshan Pati
0           BlackBuck       DJ CHETAS       WOW Editions              MensXP
1   Transport/Freight   Musician/Band          Furniture  News/Media Website
2         GiveMeSport   Celina Jaitly       500 Startups         No Abuse KG
3  News/Media Website  Actor/Director  Internet/Software           Community
4      Anushka Sharma    Durjoy Datta     Jitendra Kumar   Monogatari Series
5      Actor/Director          Author     Actor/Director             TV Show

注意事项

  • column.str.contains('Like')之前测试column.isnull()是很重要的,否则后者将在空值时失败。你知道吗
  • 您需要重置索引,否则结果将与原始索引对齐,这正是您不希望看到的。你知道吗

相关问题 更多 >

    热门问题