按列对数据帧进行分组并列出每组的值计数

ga:country ga:hostname ga:pagePathLevel1 ga:pagePathLevel2 ga:keyword ga:adMatchedQuery ga:operatingSystem ga:hour ga:exitPagePath ga:sessions 0 (not set) de.google.com /beste-sms/ / +sms sms Germany best for Android 09 /beste-sms/ 1 1 (not set) de.google.com /beste-sms/ / +sms sms argentinien Macintosh 14 /beste-sms/ 1 2 (not set) de.google.com /beste-sms/ / +sms sms skandinav Android 18 /beste-sms/ 1 3 (not set) de.google.com /beste-sms/ / +sms sms skandinav Macintosh 20 /beste-sms/ 1 4 (not set) de.google.com /beste-sms/ / sms sms iOS 22 /beste-sms/ 1 ... ... ... ... ... ... ... ... ... ... ... 85977 Yemen google.com /reviews/ /iphone/ 45to54 not set) Android 23 /reviews/iphone/ 1 85978 Yemen google.com /tr/ /best-sms/ sms sms Windows 10 /tr/best-sms/ 1 85979 Zambia google.com /best-sms/ /iphone/ +best +sms (not set) Android 16 /best-sms/iphone/ 1 85980 Zimbabwe google.com /reviews/ /testsms/ test test Windows 22 /reviews/testsms/ 1 85981 Zimbabwe google.com /reviews/ /testsms/ testsms testsms Windows 23 /reviews/testsms/ 1

def f(x): x = x.value_counts() y = x.index.astype(str) + ' (' + x.astype(str) + ')' return y.reset_index(drop=True) df = df.groupby(['ga:adMatchedQuery']).apply(lambda x: x.apply(f)) print(df)

ga:country ga:hostname ga:pagePathLevel1 ga:pagePathLevel2 ga:keyword ga:adMatchedQuery ga:operatingSystem ga:hour ga:exitPagePath ga:sessions United States(5683) google.com(14924) /us/(4187) /best-sms/(4565) Undetermined(1855) (not set)(15327) Windows(7616) 18(806) /reviews/testsms/(1880) 1(14005) United Kingdom(1691) zh.google.com(170) /reviews/(4093) /testsms/(3561) free sms(1729) Android(4291) 20(805) /reviews/scandina/(1307) 2(815) Canada(1201) t.google.com(80) /best-sms/(2169) /free-sms/(2344) +sms(1414) iOS(2136) 19(804) /best-sms/(1291) 3(231) Indonesia(445) es.google.com(33) /coupons/(1264) /scandina/(1751) +free +sms(1008) Macintosh(978) 17(787) /coupons/testsms/holiday-deal/(760) 4(92) Hong Kong(443) pl.google.com(33) /uk/(1172) /(1508) 25to34(988) Linux(160) 21(779) /coupons/scandina/holiday-deal/(239) 6(40) Australia(353) fr.google.com(27) /ca/(886) /windows/(365) best sms(803) Chrome OS(73) 16(766) (not set)(112) 5(38)

ga:adMatchedQuery ga:country ga:hostname Undetermined(1855) United States(100) google.com(1000) United Kingdom(200) zh.google.com(12) free sms(1855) United States(100) google.com(1000) United Kingdom(200) zh.google.com(12) ...

1条回答

网友

1楼 · 发布于 2024-06-18 13:18:43

我想只有改变了列的顺序，你才能用我的解决方案：

cols = df.columns.difference(['ga:adMatchedQuery'], sort=False).tolist()   
df = df[['ga:adMatchedQuery'] + cols]

上一答案的数据样本：

以下是按F列分组的数据，列名称的顺序不变：

def f(x):
    x = x.value_counts()
    y = x.index.astype(str) + '(' + x.astype(str) + ')'
    return y.reset_index(drop=True)
df1 = df.groupby(['F']).apply(lambda x: x.apply(f)).reset_index(drop=True)
print (df1)
          B          C        D              E           F
0   Honda(1)  Canada(1)  2011(1)   Salt Lake(1)  Crashed(1)
1    Ford(2)   Italy(1)  2014(1)  Washington(2)      New(3)
2   Honda(1)  Canada(1)  2005(1)        Rome(1)         NaN
3        NaN     USA(1)  2000(1)            NaN         NaN
4   Honda(2)     USA(3)  2001(2)   Salt Lake(2)     Used(3)
5  Toyota(1)        NaN  2010(1)    Ney York(1)         NaN

列名已更改：

cols = df.columns.difference(['F'], sort=False).tolist()
df = df[['F'] + cols]
print (df)
         F       B       C     D           E
1      New   Honda     USA  2000  Washington
2     Used   Honda     USA  2001   Salt Lake
3      New    Ford  Canada  2005  Washington
4     Used  Toyota     USA  2010    Ney York
5     Used   Honda     USA  2001   Salt Lake
6  Crashed   Honda  Canada  2011   Salt Lake
7      New    Ford   Italy  2014        Rome

def f(x):
    x = x.value_counts()
    y = x.index.astype(str) + '(' + x.astype(str) + ')'
    return y.reset_index(drop=True)
df1 = df.groupby(['F']).apply(lambda x: x.apply(f)).reset_index(drop=True)
print (df1)
            F          B          C        D              E
0  Crashed(1)   Honda(1)  Canada(1)  2011(1)   Salt Lake(1)
1      New(3)    Ford(2)   Italy(1)  2014(1)  Washington(2)
2         NaN   Honda(1)  Canada(1)  2005(1)        Rome(1)
3         NaN        NaN     USA(1)  2000(1)            NaN
4     Used(3)   Honda(2)     USA(3)  2001(2)   Salt Lake(2)
5         NaN  Toyota(1)        NaN  2010(1)    Ney York(1)

相关问题更多 >

编程相关推荐

热门问题

热门文章