Python Pandas计算每个类别的回报百分比

| Number of visits per year | user id | 2013 | 2014 | 2015 | 2016 | A 4 3 6 0 B 3 0 7 3 C 10 6 3 0

3条回答

网友

1楼 · 编辑于 2024-10-02 16:28:11

考虑示例访问数据帧df

df = pd.DataFrame(
    np.random.randint(1, 10, (100, 5)),
    pd.Index(['user_{}'.format(i) for i in range(1, 101)], name='user id'),
    [
        ['Number of visits per year'] * 5,
        [2012, 2013, 2014, 2015, 2016]
    ]
)

df.head()

{8的条目代表8次独立的访问，它应该计数8次。我将使用repeat在value_counts之前完成此操作

^{pr2}$

网友

2楼 · 编辑于 2024-10-02 16:28:11

我使用了每个访问者的索引值，并检查了下一年相同的索引值（也就是相同的vistor_ID）是否大于0。然后以True或False的形式将其添加到字典中，您可以将其用于条形图。我还列出了两个列表（times\u return和return-at-u-all），用于额外的数据操作。在

import pandas as pd

# Part 1, Building the dataframe.

df = pd.DataFrame({
                   'Visitor_ID':[1,2,3],
                   '2010'      :[4,3,10],
                   '2011'      :[3,0,6],
                   '2012'      :[6,7,3],
                   '2013'      :[0,3,0]    
                   })

df.set_index("Visitor_ID", inplace=True)

# Part 2, preparing the required variables.

def dictionary (max_visitors):
    dictionary={}
    for x in range(max_visitors):
        dictionary["number_{}".format(x)] = []
#    print(dictionary)
    return dictionary

# Part 3, Figuring out if the customer returned.             

def compare_yearly_visits(current_year, next_year):    
    index = 1 
    years = df.columns
    for x in df[current_year]: 
#        print (df[years][current_year][index], 'this year.')
#        print (df[years][next_year][index], 'Next year.')
        how_many_visits = df[years][current_year][index] 
        did_he_return   = df[years][next_year][index]

        if did_he_return > 0: 
            # If the visitor returned, add to a bunch of formats:
            returned_at_all.append([how_many_visits, True])
            times_returned.append([how_many_visits, did_he_return])
            dictionary["number_{}".format(x)].append(True)
        else: 
            ## If the visitor did not return, add to a bunch of formats:
            returned_at_all.append([how_many_visits, False])
            dictionary["number_{}".format(x)].append(False)

        index = index +1 

# Part 4, The actual program:
highest_amount_of_visits = 11 # should be done automatically, max(visits)?        
relevant_years = len(df.columns) -1
times_returned = []
returned_at_all = []

dictionary = dictionary(highest_amount_of_visits)
for column in range(relevant_years):  
#   print (dictionary)
    this_year = df.columns[column]
    next_year = df.columns[column+1]
    compare_yearly_visits(this_year, next_year)
    print ("cumulative dictionary up to:", this_year,"\n", dictionary)

网友

3楼 · 编辑于 2024-10-02 16:28:11

请在下面找到我的解决方案。作为一个说明，我非常肯定这是可以改进的。在


# step 0: create data frame
df = pd.DataFrame({'2013':[4, 3, 10], '2014':[3, 0, 6], '2015':[6, 7, 3], '2016':[0, 3, 0]}, index=['A', 'B', 'C'])

# container list of dataframes to be concatenated
frames = []

# iterate through the dataframe one column at a time and determine its value_counts(freq table)
for name, series in df.iteritems():
  frames.append(series.value_counts())

# Merge frequency table for all columns into a dataframe
temp_df = pd.concat(frames, axis=1).transpose().fillna(0)

# Find the key for the new dataframe (i.e. range for number of columns), and append missing ones
cols = temp_df.columns
min = cols.min()
max = cols.max()
for i in range(min, max):
    if (not i in a):
        temp_df[str(i)] = 0

# Calculate percentage
final_df = temp_df.div(temp_df.sum(axis=1), axis=0)

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python Pandas计算每个类别的回报百分比

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >