如何在Python中复制或删除带条件的行

2024-09-30 20:37:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我有不平衡的数据帧,我尝试先平衡数据再去堆叠数据,关键是len(df.Question == "Q007_C02")是新数据的行数,所以如果任何级别的df.Question大于df.Question == "Q007_C02"的行数,我只取第一行到len(df.Question == "Q007_C02"),如果df.Question小于df.Question == "Q007_C02"的行数,我需要复制,然后取消堆叠数据或转置。你知道吗

df = pd.DataFrame({"Question":["Q007_A00","Q007_B00","Q007_C01","Q007_C01","Q007_C01","Q007_C01","Q007_C01","Q007_C01","Q007_C01","Q007_C02","Q007_C02","Q007_C02","Q007_C02","Q007_C02"],
               "Key": ["Y","N",1,4,5,2,8,9,3,"Text 1","Text 2","Text 3","Text 4","Text 5"]})
df

    Key Question
0   Y   Q007_A00
1   N   Q007_B00
2   1   Q007_C01
3   4   Q007_C01
4   5   Q007_C01
5   2   Q007_C01
6   8   Q007_C01
7   9   Q007_C01
8   3   Q007_C01
9   Text 1  Q007_C02
10  Text 2  Q007_C02
11  Text 3  Q007_C02
12  Text 4  Q007_C02
13  Text 5  Q007_C02

您可以看到len(df.Question == "Q007_C02")=5,所以使用5作为数据帧行数的默认值,即我想要的输出。你知道吗

  Q007_A00  Q007_B00    Q007_C01    Q007_C02
0   Y          N            1        Text 1
1   Y          N            4        Text 2
2   Y          N            5        Text 3
3   Y          N            2        Text 4
4   Y          N            8        Text 5

Tags: 数据keytextdataframedflen级别关键
1条回答
网友
1楼 · 发布于 2024-09-30 20:37:39

下面是一个适用于示例数据的解决方案。你知道吗

import pandas as pd

df = pd.DataFrame({"Question":["Q007_A00","Q007_B00","Q007_C01","Q007_C01","Q007_C01","Q007_C01","Q007_C01","Q007_C01","Q007_C01","Q007_C02","Q007_C02","Q007_C02","Q007_C02","Q007_C02"],
               "Key": ["Y","N",1,4,5,2,8,9,3,"Text 1","Text 2","Text 3","Text 4","Text 5"]})

#create a new index column which based on which row each item should occupy in the balanced table
df = df.sort_values('Question')  #the dataframe must be sorted for this to work
new_index = []
for c in df.groupby('Question')['Key'].count():
    new_index.extend(range(c))
# for the example code, new_index is this list [0, 0, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4]

balanced = df.set_index([new_index, 'Question']) #set the dataframe index to have two levels, index and Question
balanced = balanced.unstack()                    #unstack on the last index level, which is Question
balanced.columns = balanced.columns.droplevel(0) #the column index is a MultiIndex of (Key, Question), remove the top level
balanced = balanced.dropna(subset=['Q007_C02'])  #limits the dataframe to the number of rows in column Q007_C02
balanced = balanced.fillna(method='ffill')       #fill missing values based on the last valid value

使用unstack()的关键是用平衡数据帧中每个条目的行的值创建一个索引。for循环基于每个df.Questioncount()df.Keys创建这个新索引。一旦你有了这个索引,剩下的就是操纵数据帧来获得所需的结构。你知道吗

我有一种感觉,也许有更好的方法来获取索引,但我现在想不起来。你知道吗

相关问题 更多 >