如何将元素列表附加到数据帧的单个功能中？

print(actors[['primaryName', 'knownForTitles']].head()) primaryName knownForTitles 0 Rowan Atkinson tt0109831,tt0118689,tt0110357,tt0274166 1 Bill Paxton tt0112384,tt0117998,tt0264616,tt0090605 2 Juliette Binoche tt1219827,tt0108394,tt0116209,tt0241303 3 Linda Fiorentino tt0110308,tt0119654,tt0088680,tt0120655 4 Richard Linklater tt0243017,tt1065073,tt2209418,tt0405296

print(movies[['tconst', 'primaryTitle']].head()) tconst primaryTitle 0 tt0001604 The Fatal Wedding 1 tt0002467 Romani, the Brigand 2 tt0003037 Fantomas: The Man in Black 3 tt0003593 Across America by Motor Car 4 tt0003830 Detective Craig's Coup

def add_cast(movie_df, actor_df): results = movie_df.copy() length = len(results) #create an empty feature results['cast'] = "" #iterate through the movie identifiers for index, value in results['tconst'].iteritems(): #create a new dataframe containing all the cast associated with the movie id cast = actor_df[actor_df['knownForTitles'].str.contains(value)] #check to see if the 'primaryName' list is empty if len(list(cast['primaryName'].values)) != 0: #set the new movie 'cast' feature equal to a list of the cast names results.loc[index]['cast'] = list(cast['primaryName'].values) #logging if index % 1000 == 0: logging.warning(f'Results location: {index} out of {length}') #delete cast df to free up memory del cast return results

def actors_loop(movie_df, actor_df): results = movie_df.copy() length = len(actor_df) #create an empty feature results['cast'] = "" #iterate through all actors for index, value in actor_df['knownForTitles'].iteritems(): #skip empties if str(value) == r"\N": logging.warning(f'skipping: {index} with a value of {value}') continue #generate a list of movies that this actor has been in cinemetography = [x.strip() for x in value.split(',')] #iterate through every movie the actor has been in for movie in cinemetography: #pull out the movie info if it exists movie_info = results[results['tconst'] == movie] #continue if empty if len(movie_info) == 0: continue #set the cast variable equal to the actor name results[results['tconst'] == movie]['cast'] = (actor_df['primaryName'].loc[index]) #delete the df to save space ?maybe del movie_info #logging if index % 1000 == 0: logging.warning(f'Results location: {index} out of {length}') return results

1条回答

网友

1楼 · 发布于 2024-10-06 14:28:44

我发现了def actors_loop(movie_df, actor_df)函数的问题。问题是

results['tconst'] == movie]['cast'] = (actor_df['primaryName'].loc[index])

正在将值设置为results数据帧的副本。最好使用df.set_value()方法或df.at[]方法

我还找到了一个更快的解决方案，与其迭代两个数据帧并创建递归循环，不如迭代一次。所以我创建了一个元组列表：

def actor_tuples(actor_df):
    tuples =[]
    for index, value in actor_df['knownForTitles'].iteritems():

        cinemetography = [x.strip() for x in value.split(',')]
        for movie in cinemetography:
            tuples.append((actor_df['primaryName'].loc[index], movie))
    return tuples

这将创建以下形式的元组列表：

[('Fred Astaire', 'tt0043044'),
 ('Lauren Bacall', 'tt0117057')]

然后我创建了一个电影标识号和索引点列表（来自电影数据帧），其形式如下：

{'tt0000009': 0,
 'tt0000147': 1,
 'tt0000335': 2,
 'tt0000502': 3,
 'tt0000574': 4,
 'tt0000615': 5,
 'tt0000630': 6,
 'tt0000675': 7,
 'tt0000676': 8,
 'tt0000679': 9}

然后我使用下面的函数遍历actor元组，并使用电影标识符作为电影字典中的键，这将返回正确的电影索引，我使用该索引将actor name元组添加到目标dataframe：

def add_cast(movie_df, Atuples, Mtuples):
    results_df = movie_df.copy()
    results_df['cast'] = ''
    counter = 0
    total = len(Atuples)


    for tup in Atuples:
        #this passes the movie ID into the movie_dict that will return an index
        try:
            movie_index = Mtuples[tup[1]]
            if results_df.at[movie_index, 'cast'] == '':
                results_df.at[movie_index, 'cast'] += tup[0]
            else:
                results_df.at[movie_index, 'cast'] += ',' + tup[0]
        except KeyError:
            pass

        #logging
        counter +=1
        if counter % 1000000 == 0:
            logging.warning(f'Index {counter} out of {total}, {counter/total}% finished')

    return results_df

它在10分钟内运行了1650万个actor元组（生成2组元组，然后是adding函数）。结果如下：

0  tt0000009                     Miss Jerry      1894                 Romance   
1  tt0000147  The Corbett-Fitzsimmons Fight      1897  Documentary,News,Sport   
2  tt0000335          Soldiers of the Cross      1900         Biography,Drama   
3  tt0000502                       Bohemios      1905                      \N   
4  tt0000574    The Story of the Kelly Gang      1906   Biography,Crime,Drama   

                                                cast  
0  Blanche Bayliss,Alexander Black,William Courte...  
1  Bob Fitzsimmons,Enoch J. Rector,John L. Sulliv...  
2  Herbert Booth,Joseph Perry,Orrie Perry,Reg Per...  
3  Antonio del Pozo,El Mochuelo,Guillermo Perrín,...  
4  Bella Cola,Sam Crewes,W.A. Gibson,Millard John...

谢谢你

相关问题更多 >

编程相关推荐

热门问题

热门文章