我有一个输出下表的数据帧。注意,“Search term”是索引。在
Search term Impressions Clicks Cost Converted clicks
american brewing 286446 104862 8034.18 6831
american brewing supplies 165235 64764 3916.48 4106
brewing supplies 123598 8131 6941.87 278
wine bottles 272969 7438 4944.7 194
www americanbrewing com 2782 1163 227.17 120
home brewing 216138 3744 3468.24 110
wine making 147985 6602 5024.54 108
如果“搜索项”(索引)包含'american brewing'
或'americanbrewing'
,则应用标签'Brand'
,否则将'Non-brand'
应用于标题为Label
的列。在
我在StackOverflow上看到了很多这样的例子:
df['Label'] = df[df['SomeColumn'].str.contains('american brewing|americanbrewing')]
但这不起作用,因为我的'SomeColumn'
是df.index
,当我尝试类似的操作时:
df['Label'] = df[df.index.str.contains('american brewing|americanbrewing')]
我得到错误AttributeError: 'Index' object has no attribute 'str'
我还看到了使用np.where
的例子,看起来很有前途,但是我仍然遇到了同样的问题,因为'Search term'
不是一个列,而是index
。在
df['Label'] = np.where(df['Search term'].str.contains('american brewing|americanbrewing', 'Brand', 'Non-brand')
以下是我的完整代码:
import pandas as pd
import numpy as np
brand_terms = ['american brewing', 'americanbrewing']
data = pd.read_csv(r'sqr.csv', encoding='cp1252')
df = pd.DataFrame(data)
df['Search term'] = df['Search term'].replace(r'[^\w&\' ]', '', regex=True)
df['Cost'] = df['Cost'].replace(r'[^\d\.]', '', regex=True).astype('float')
#print(df.dtypes)
grouped = df.groupby('Search term')
result = grouped[['Impressions', 'Clicks', 'Cost', 'Converted clicks']].sum()
result = result.sort(['Converted clicks','Cost'], ascending=False)
#This doesn't work
result['Label'] = result.where(result['Search term'].str.contains('|'.join(brand_terms), 'Brand', 'Non-brand'))
result.to_csv('sqr_aggregate.csv')
我如何根据Search term
(索引)是否包含几个可能的字符串值中的任何一个,输出Label
列?其中True
,applyBrand
,否则,将Non-brand
应用到Label
列。在
如果您不想重置索引,这里有一种方法。在
您可以将
index
转换为Series
,并应用转换。在尝试更改代码以使用
df.groupby('Search term', as_index = False)
。在相关问题 更多 >
编程相关推荐