如何将特定行拖放到DataFrame以生成嵌套的JSON

2024-09-30 06:18:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前正在开发一个d3树映射,它需要一个嵌套的json作为条目,我成功地组织了我的df并生成了json,但是我的一些树映射矩形比其他的大30倍,所以我决定删除生成这个矩形的行。你知道吗

我的函数dropSmall()在我的列和行中迭代,以验证每个groupby的和是否比最大和小30倍 我正在努力更新df,要么使用drop,要么影响匹配的值 这是我的密码:

def dropSmall(df):
    list = []
    for i in df.columns: #b, c, z ..
        if i != 'valeur' and i!='unite':
            list.append(i)
            # iterating on rows
            for j in range(df.groupby(list).sum().shape[0]): 
                myMax = df.groupby(list).sum().iloc[:, 0].max() / 30
                myJ = df.groupby(list).sum().iloc[:, 0][j]
                myDf = df.groupby(list).sum().iloc[:, 0]
                if myJ <= myMax:
                    df = df[myDf['value']>=  myMax]

我的团员是这样的


          name          b   c   z   l   sL  value       unit
3099    Myindicator     1   1   3   NA  NA  129.74      kg
3100                                    1   44929.74    kg
3101                                    2   5174.74     kg
3110                    3   1   3   1   NA  2497.66     kg
3156                                2   NA  29.43       kg
3222                                3   NA  304.81      kg


对于第一行的例子,当b=1 c=1 z=3 l=NA时,我想在迭代3 sL时验证sL的值是该和的最大值的30倍,对于这种情况,当value=129时删除该行

我的函数验证条件,但我不知道如何从初始df notdf.groupby('list').sum()中删除行

第一行的未分组df示例

        name        Continent  Region   Country   State   City    Borough  Value       Unit
1000    Myindicator     1        1        3        1      1         1      53.86      kg

[从此处编辑]

我的截止乘数是2 每个层次都有一个最大值

                                            Value
name        Continent Region Country State       
Myindicator 1         1      1       7         50[MAX]
                                     8         30 
                             2       5         70[MAX]
                                     6         30 *
                             3       1         50[MAX]
                             4       5        200[MAX]
                                     6        150 
                             5       1        300[MAX]
                                     6        160
                                     7        100*
                                     8         50*
                                     9         50*
                      2      4       9        100[MAX]
                                     10        40 *
                             5       3         80[MAX]
                                     11        20 *
                             6       2         10[MAX]
                      3      7       12       100[MAX]


在本例中,您不会删除region 2 country 6 state 2,因为它是此region>;country>;state的唯一行,同时也是最大值

希望这更清楚


Tags: namejsondfvaluemaxlistsumgroupby
1条回答
网友
1楼 · 发布于 2024-09-30 06:18:04

所以我不是100%清楚你的输入是什么样子的,或者你想要什么回来,但是如果我理解正确的话,我认为下面的方法是可行的。你知道吗

从这里编辑

EDIT2:添加星号(*)以指示要删除的行。你知道吗

EDIT3:由于赋值和副本处理pandas.DataFrame的方式而更改了函数

执行此过程的函数:

def drop_small(dfcop, cutoff_multiplier):
    # Create copy of dataframe so we don't alter the original
    df=dfcop.copy(deep=True)
    # Group on all columns except 'Value' and 'Unit'
    grp_cols = [i for i in df.columns if i not in ['Value', 'Unit']]
    groupers = [grp_cols[:i+1] for i in range(len(grp_cols))]
    print(groupers)
    #loop through all hierarchical groupings
    for grp in groupers:
        print(f"Grouping on {grp}")
        # Add a column with the group sums to the dataframe
        df['gsum'] = df.groupby(grp)['Value'].transform('sum')
        # Compute the max of the parent group - don't do this if we are grouping by a single field
        if len(grp) > 1:
            df['gmax'] = df.groupby(grp[:-1])['gsum'].transform(lambda x: max(x)/cutoff_multiplier)
        else:
            df['gmax'] = df.gsum.max()/cutoff_multiplier
        print("Grouped sums and cutoffs for this hierarchy:")
        print(df)
        # Drop all rows where the group sum is less than the cutoff mulitplier of the max
        idexs = df[df.gsum < df.gmax].index
        df = df[df.gsum >= df.gmax]
        print('Indexes dropped:')
        print(','.join([str(i) for i in idexs]))
        # Remove the group sum column
        df.drop(['gsum', 'gmax'], axis=1, inplace=True)
    return df

下面是示例表的工作方式。你知道吗

           name  Continent  Region  Country  State  Value Unit
0   Myindicator          1       1        3      1     50   kg
1   Myindicator          1       1        3      4     50   kg
2   Myindicator          1       1        2      5     20   kg
3   Myindicator          1       1        2      5     50   kg
4   Myindicator          1       1        2      6     30   kg
5   Myindicator          1       1        1      7     50   kg
6   Myindicator          1       1        1      8     20   kg
7   Myindicator          1       2        4      9     50   kg
8   Myindicator          1       2        4      9     50   kg
9   Myindicator          1       2        4     10     40   kg
10  Myindicator          1       2        5     11     20   kg
11  Myindicator          1       2        5      3     40   kg
12  Myindicator          1       2        5      3     40   kg
13  Myindicator          1       2        6      2     10   kg
14  Myindicator          1       3        7     12     50   kg
15  Myindicator          1       3        7     12     50   kg
16  Myindicator          1       3        8     14     15   kg
17  Myindicator          1       3        8     14     15   kg
18  Myindicator          1       3        8     13     15   kg
19  Myindicator          1       3        8     13      1   kg
20  Myindicator          1       4        9     15     10   kg
21  Myindicator          1       4        9     16     10   kg

分组在['name'] 此层次结构的分组总和和截止值:

           name  Continent  Region  Country  State  Value Unit  gsum  gmax
0   Myindicator          1       1        3      1     50   kg   686   343
1   Myindicator          1       1        3      4     50   kg   686   343
2   Myindicator          1       1        2      5     20   kg   686   343
3   Myindicator          1       1        2      5     50   kg   686   343
4   Myindicator          1       1        2      6     30   kg   686   343
5   Myindicator          1       1        1      7     50   kg   686   343
6   Myindicator          1       1        1      8     20   kg   686   343
7   Myindicator          1       2        4      9     50   kg   686   343
8   Myindicator          1       2        4      9     50   kg   686   343
9   Myindicator          1       2        4     10     40   kg   686   343
10  Myindicator          1       2        5     11     20   kg   686   343
11  Myindicator          1       2        5      3     40   kg   686   343
12  Myindicator          1       2        5      3     40   kg   686   343
13  Myindicator          1       2        6      2     10   kg   686   343
14  Myindicator          1       3        7     12     50   kg   686   343
15  Myindicator          1       3        7     12     50   kg   686   343
16  Myindicator          1       3        8     14     15   kg   686   343
17  Myindicator          1       3        8     14     15   kg   686   343
18  Myindicator          1       3        8     13     15   kg   686   343
19  Myindicator          1       3        8     13      1   kg   686   343
20  Myindicator          1       4        9     15     10   kg   686   343
21  Myindicator          1       4        9     16     10   kg   686   343

删除的索引: 没有

分组在['name', 'Continent'] 此层次结构的分组总和和截止值:

           name  Continent  Region  Country  State  Value Unit  gsum  gmax
0   Myindicator          1       1        3      1     50   kg   686   343
1   Myindicator          1       1        3      4     50   kg   686   343
2   Myindicator          1       1        2      5     20   kg   686   343
3   Myindicator          1       1        2      5     50   kg   686   343
4   Myindicator          1       1        2      6     30   kg   686   343
5   Myindicator          1       1        1      7     50   kg   686   343
6   Myindicator          1       1        1      8     20   kg   686   343
7   Myindicator          1       2        4      9     50   kg   686   343
8   Myindicator          1       2        4      9     50   kg   686   343
9   Myindicator          1       2        4     10     40   kg   686   343
10  Myindicator          1       2        5     11     20   kg   686   343
11  Myindicator          1       2        5      3     40   kg   686   343
12  Myindicator          1       2        5      3     40   kg   686   343
13  Myindicator          1       2        6      2     10   kg   686   343
14  Myindicator          1       3        7     12     50   kg   686   343
15  Myindicator          1       3        7     12     50   kg   686   343
16  Myindicator          1       3        8     14     15   kg   686   343
17  Myindicator          1       3        8     14     15   kg   686   343
18  Myindicator          1       3        8     13     15   kg   686   343
19  Myindicator          1       3        8     13      1   kg   686   343
20  Myindicator          1       4        9     15     10   kg   686   343
21  Myindicator          1       4        9     16     10   kg   686   343

删除的索引: 没有

分组在['name', 'Continent', 'Region'] 此层次结构的分组总和和截止值:

           name  Continent  Region  Country  State  Value Unit  gsum  gmax
0   Myindicator          1       1        3      1     50   kg   270   135
1   Myindicator          1       1        3      4     50   kg   270   135
2   Myindicator          1       1        2      5     20   kg   270   135
3   Myindicator          1       1        2      5     50   kg   270   135
4   Myindicator          1       1        2      6     30   kg   270   135
5   Myindicator          1       1        1      7     50   kg   270   135
6   Myindicator          1       1        1      8     20   kg   270   135
7   Myindicator          1       2        4      9     50   kg   250   135
8   Myindicator          1       2        4      9     50   kg   250   135
9   Myindicator          1       2        4     10     40   kg   250   135
10  Myindicator          1       2        5     11     20   kg   250   135
11  Myindicator          1       2        5      3     40   kg   250   135
12  Myindicator          1       2        5      3     40   kg   250   135
13  Myindicator          1       2        6      2     10   kg   250   135
14  Myindicator          1       3        7     12     50   kg   146   135
15  Myindicator          1       3        7     12     50   kg   146   135
16  Myindicator          1       3        8     14     15   kg   146   135
17  Myindicator          1       3        8     14     15   kg   146   135
18  Myindicator          1       3        8     13     15   kg   146   135
19  Myindicator          1       3        8     13      1   kg   146   135
20  Myindicator          1       4        9     15     10   kg    20   135 *
21  Myindicator          1       4        9     16     10   kg    20   135 *

删除的索引: 20,21岁

分组在['name', 'Continent', 'Region', 'Country'] 此层次结构的分组总和和截止值:

           name  Continent  Region  Country  State  Value Unit  gsum  gmax
0   Myindicator          1       1        3      1     50   kg   100    50
1   Myindicator          1       1        3      4     50   kg   100    50
2   Myindicator          1       1        2      5     20   kg   100    50
3   Myindicator          1       1        2      5     50   kg   100    50
4   Myindicator          1       1        2      6     30   kg   100    50
5   Myindicator          1       1        1      7     50   kg    70    50
6   Myindicator          1       1        1      8     20   kg    70    50
7   Myindicator          1       2        4      9     50   kg   140    70
8   Myindicator          1       2        4      9     50   kg   140    70
9   Myindicator          1       2        4     10     40   kg   140    70
10  Myindicator          1       2        5     11     20   kg   100    70
11  Myindicator          1       2        5      3     40   kg   100    70
12  Myindicator          1       2        5      3     40   kg   100    70
13  Myindicator          1       2        6      2     10   kg    10    70 *
14  Myindicator          1       3        7     12     50   kg   100    50
15  Myindicator          1       3        7     12     50   kg   100    50
16  Myindicator          1       3        8     14     15   kg    46    50 *
17  Myindicator          1       3        8     14     15   kg    46    50 *
18  Myindicator          1       3        8     13     15   kg    46    50 *
19  Myindicator          1       3        8     13      1   kg    46    50 *

删除的索引: 13、16、17、18、19岁

分组在['name', 'Continent', 'Region', 'Country', 'State'] 此层次结构的分组总和和截止值:

           name  Continent  Region  Country  State  Value Unit  gsum  gmax
0   Myindicator          1       1        3      1     50   kg    50    25
1   Myindicator          1       1        3      4     50   kg    50    25
2   Myindicator          1       1        2      5     20   kg    70    35
3   Myindicator          1       1        2      5     50   kg    70    35
4   Myindicator          1       1        2      6     30   kg    30    35 *
5   Myindicator          1       1        1      7     50   kg    50    25
6   Myindicator          1       1        1      8     20   kg    20    25 *
7   Myindicator          1       2        4      9     50   kg   100    50
8   Myindicator          1       2        4      9     50   kg   100    50
9   Myindicator          1       2        4     10     40   kg    40    50 *
10  Myindicator          1       2        5     11     20   kg    20    40 *
11  Myindicator          1       2        5      3     40   kg    80    40
12  Myindicator          1       2        5      3     40   kg    80    40
14  Myindicator          1       3        7     12     50   kg   100    50
15  Myindicator          1       3        7     12     50   kg   100    50

删除的索引: 4、6、9、10

最终表格:

           name  Continent  Region  Country  State  Value Unit
0   Myindicator          1       1        3      1     50   kg
1   Myindicator          1       1        3      4     50   kg
2   Myindicator          1       1        2      5     20   kg
3   Myindicator          1       1        2      5     50   kg
5   Myindicator          1       1        1      7     50   kg
7   Myindicator          1       2        4      9     50   kg
8   Myindicator          1       2        4      9     50   kg
11  Myindicator          1       2        5      3     40   kg
12  Myindicator          1       2        5      3     40   kg
14  Myindicator          1       3        7     12     50   kg
15  Myindicator          1       3        7     12     50   kg

相关问题 更多 >

    热门问题