除了在大Pandas身上发挥作用外,哪一种方式更好或是另类?

2024-10-01 00:25:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个点击流数据。我正在使用URL列来查找特殊事件。例如,如果URL包含关键字Dealer,那么新的列将被创建为“Is Dealer”,它给出布尔值

测向样本:

**df example:**

字典: 我有一本字典,其中key是“Domain”,value是关键字列表(关键字必须签入URL)

brand_dict = {'volkswagen': ['haendlersuche'], 'mercedes-benz': ['dealer-locator'], 'skoda-auto': ['dealers']}

我需要先检查其他列中的2个条件:如果Domains column=“BMW”并且它包含字典列表中的任何关键字,那么它在新列中给出布尔值

问题是我必须创建3列和3个字典。有什么特别的方法吗

到目前为止,我正在这样做:

 def conv_attribution(domain, url):

        list_output = []

        if domain in dict_config.keys():


            bolcheck1 = False
            for keyword in dict_config[domain]:
                if keyword in url:
                    bolcheck1 = True

            bolcheck2 = False
            for keyword in dict_dealer[domain]:
                if keyword in url:
                    bolcheck2 = True  

            bolcheck3 = False
            for keyword in dict_brand_keywords[domain]:
                if keyword in url:
                      bolcheck3 = True


            if bolcheck1 == True:
                list_output.append(True)
            else:
                list_output.append(False)

            if bolcheck2 == True:
                list_output.append(True)
            else:
                list_output.append(False)

            if bolcheck3 == True:
                list_output.append(keyword)
            else:
                list_output.append("Nan")


   return list_output

请帮忙

所需输出

所需的输出看起来是这样的,但是在modelname中,我想添加从URL提取的模型名

enter image description here


Tags: infalsetrueurloutputif字典domain
1条回答
网友
1楼 · 发布于 2024-10-01 00:25:23

这是一个最小的例子

import pandas as pd
domains = ['bmw','smart','smart','fiat','bmw']
urls = ['https://bmw.com/hello','https://smart.com/world','https://smart.com/hello','https://fiat.com/hello','https://bmw.com/hello']
df = pd.DataFrame({'domain':domains,'urls':urls})
# your config dict
brand_dict = {'bmw': ['hello'], 'smart': ['world'],'fiat':['hello']} 

样品测向

    domain  urls
0   bmw     https://bmw.com/hello
1   smart   https://smart.com/world
2   smart   https://smart.com/hello
3   fiat    https://fiat.com/hello
4   bmw     https://bmw.com/hello

创建新列

df['col_1'] = df.apply(lambda x: any(substring in x.urls for substring in brand_dict[x.domain]) ,axis =1)
df['col_2'] = df.apply(lambda x: any(substring in x.urls for substring in brand_dict[x.domain]) ,axis =1)
df

新df

   domain   urls                    col_1   col_2
0   bmw     https://bmw.com/hello   True    True
1   smart   https://smart.com/world True    True
2   smart   https://smart.com/hello False   False
3   fiat    https://fiat.com/hello  True    True
4   bmw     https://bmw.com/hello   True    True

相关问题 更多 >