如何使用另一个函数中的dataframe对象？

def process_DrugCount(drugcount): dc = pd.read_csv("DrugCount.csv") sub_map = {'1' : 1, '2':2, '3':3, '4':4, '5':5, '6':6, '7+' : 7} dc['DrugCount'] = dc.DrugCount.map(sub_map) dc['DrugCount'] = dc.DrugCount.astype(int) dc_grouped = dc.groupby(dc.Year, as_index=False) DrugCount_Y1 = dc_grouped.get_group('Y1') DrugCount_Y2 = dc_grouped.get_group('Y2') DrugCount_Y3 = dc_grouped.get_group('Y3') DrugCount_Y1.drop('Year', axis=1, inplace=True) DrugCount_Y2.drop('Year', axis=1, inplace=True) DrugCount_Y3.drop('Year', axis=1, inplace=True) return (DrugCount_Y1,DrugCount_Y2,DrugCount_Y3) def replaceMonth(string): replace_map = {'0- 1 month' : "0_1", "1- 2 months": "1_2", "2- 3 months": "2_3", "4- 5 months": "4_5", "5- 6 months": "5_6", "6- 7 months": "6_7", "7- 8 months" : "7_8",\ "8- 9 months": "8_9", "9-10 months": "9_10", "10-11 months": "10_11", "11-12 months": "11_12"} a_new_string = string.map(replace_map) return a_new_string def process_yearly_DrugCount(aframe): processed_frame = None dc = pd.read_csv("DrugCount.csv") sub_map = {'1' : 1, '2':2, '3':3, '4':4, '5':5, '6':6, '7+' : 7} dc['DrugCount'] = dc.DrugCount.map(sub_map) dc['DrugCount'] = dc.DrugCount.astype(int) dc_grouped = dc.groupby(dc.Year, as_index=False) DrugCount_Y1 = dc_grouped.get_group('Y1') DrugCount_Y1.drop('Year', axis=1, inplace=True) # print DrugCount_Y1['DSFS'].unique return processed_frame

1条回答

网友

1楼 · 发布于 2024-09-30 18:16:41

你的例子我不太清楚，但这里有一个基于熊猫文档的稍微不同的例子，它展示了一些有用的技术：

听起来与其使用groupby，不如使用数据透视表重新形成多索引。你知道吗

例如，尝试：

import pandas.util.testing as tm; tm.N = 3
def unpivot(frame):
    N, K = frame.shape
    data = {'value' : frame.values.ravel('F'),
            'variable' : np.asarray(frame.columns).repeat(N),
            'date' : np.tile(np.asarray(frame.index), K)}
    return pd.DataFrame(data, columns=['date', 'variable', 'value'])

df = unpivot(tm.makeTimeDataFrame())

测试df，然后比较测向头（）：

        date variable     value
0 2000-01-03        A -0.357495
1 2000-01-04        A  0.367520
2 2000-01-05        A  2.216699
3 2000-01-03        B -0.417521
4 2000-01-04        B -1.163966

有印刷品数据透视表（索引=（“变量”，“日期”））

                        value
variable date                
A        2000-01-03 -0.357495
        2000-01-04  0.367520
        2000-01-05  2.216699
B        2000-01-03 -0.417521
        2000-01-04 -1.163966
        2000-01-05 -0.774422
C        2000-01-03  0.560017
        2000-01-04  0.174880
        2000-01-05  0.625167
D        2000-01-03 -1.673194
        2000-01-04 -0.075789
        2000-01-05 -2.041236

然后你可以做df_枢轴.loc['A']给你：

            value
date                
2000-01-03 -0.357495
2000-01-04  0.367520
2000-01-05  2.216699

你可以很容易地用几年的时间来适应你的例子。对于这种类型的操作，它比使用groupby更容易，而且它将所有数据保存在一个数据帧（视图）中。你知道吗

您还可以使用值计数来查找所有值及其频率。所以在我的例子中：

df['variable'].value_counts()

将返回一个序列：

D    3
B    3
C    3
A    3
Name: variable, dtype: int64

如果我理解正确的话，这个系列的索引就是你的唯一值列表。所以呢

list(df['variable'].value_counts().index)

应该给你想要的。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章