用于python 3的google分析报告api v4
gaapi4p的Python项目详细描述
gaapi4py
用于Python3的Google Analytics Reporting API v4
先决条件
要使用此库,您需要在google云平台中有一个项目和一个服务帐户密钥,该密钥可以访问您要从中获取数据的google分析帐户。
快速启动
fromgaapi4pyimportGAClientc=GAClient('path/to/service_account.json')request_body={'view_id':'123456789','start_date':'2019-01-01','end_date':'2019-01-31','dimensions':{'ga:sourceMedium','ga:date'},'metrics':{'ga:sessions'},'filter':'ga:sourceMedium==google / organic'# optional filter clause}response=c.get_all_data(request_body)response['info']# sampling and "golden" metadataresponse['data']# Pandas dataframe that contains data from GA
如果要对特定视图或具有特定日期范围的视图发出多个请求,可以为以后的所有请求设置日期范围:
c.set_view_id('123456789')c.set_dateranges('2019-01-01','2019-01-31')request_body_1={'dimensions':{'ga:sourceMedium','ga:date'},'metrics':{'ga:sessions'}}request_body_2={'dimensions':{'ga:deviceCategory','ga:date'},'metrics':{'ga:sessions'}}response_1=c.get_all_data(request_body_1)response_2=c.get_all_data(request_body_2)
避免每天服用tada进行采样
Important! Google Analytics reporting API has a limit of maximum 100 requests per 100 seconds. If you want to iterate over large period of days, you might consider adding
time.sleep(1)
at the end of the loop to avoid reaching this limit.
fromdatetimeimportdate,timedeltafromtimeimportsleepimportpandasaspdfromgaapi4pyimportGAClientc=GAClient('gaapi4py.json')c.set_view_id('123456789')start_date=date(2019,7,1)end_date=date(2019,7,14)df_list=[]iter_date=start_datewhileiter_date<=end_date:c.set_dateranges(iter_date,iter_date)response=c.get_all_data({'dimensions':{'ga:sourceMedium','ga:deviceCategory'},'metrics':{'ga:sessions'}})df=response['data']df['date']=iter_datedf_list.append(response['data'])iter_date=iter_date+timedelta(days=1)time.sleep(1)all_data=pd.concat(df_list,ignore_index=True)
避免“7大维度”限制
如果将SeSession ID和/或HITID存储为自定义维度(Example implementation on Simo Ahava's blog),则可以规避对一个报表中的最大数量和度量的限制。示例如下:
If sampling starts to appear, try to break the set of dimensions into smaller parts and run queries on them.
one_day=date(2019,7,1)c.set_dateranges(one_day,one_day)SESSION_ID_CD_INDEX='2'HIT_ID_CD_INDEX='5'session_id='dimension'+SESSION_ID_CD_INDEXhit_id='dimension'+HIT_ID_CD_INDEX#Get session scope dataresponse_1=c.get_all_data({'dimensions':{'ga:'+session_id,'ga:sourceMedium','ga:campaign','ga:keyword','ga:adContent','ga:userType','ga:deviceCategory'},'metrics':{'ga:sessions'}})response2=c.get_all_data({'dimensions':{'ga:'+session_id,'ga:landingPagePath','ga:secondPagePath','ga:exitPagePath','ga:pageDepth','ga:daysSinceLastSession','ga:sessionCount'},'metrics':{'ga:hits','ga:totalEvents','ga:bounces','ga:sessionDuration'}})all_data=response_1['data'].merge(response2['data'],on=session_id,how='left')all_data.rename(index=str,columns={session_id:'session_id'},inplace=True)all_data.head()# Get hit scope datahits_response_1=c.get_all_data({'dimensions':{'ga:'+session_id,'ga:'+hit_id,'ga:pagePath','ga:previousPagePath','ga:dateHourMinute'},'metrics':{'ga:hits','ga:totalEvents','ga:pageviews'}})hits_response_2=c.get_all_data({'dimensions':{'ga:'+session_id,'ga:'+hit_id,'ga:eventCategory','ga:eventAction','ga:eventLabel'},'metrics':{'ga:totalEvents'}})all_hits_data=hits_response_1['data'].merge(hits_response_2['data'],on=[session_id,hit_id],how='left')all_hits_data.rename(index=str,columns={session_id:'session_id',hit_id:'hit_id'},inplace=True)all_hits_data.head()