<p>数据帧仍然有点难以读取,但使用以下示例:</p>
<pre class="lang-py prettyprint-override"><code>df = pd.DataFrame({'_source.request_url': ['https://google.com/au/?gclid=CjwKCAiAlO7uBRANEiwA_vXQ 5YOAD-mFNQFuM0dbd7lHsRBZSfOvhQynhZMhNHkEX-m7gosL23ABoCyS4QAvD_BwE', 'https://google.com/au/?click_id=CjwKCAiAlO7uBRANEiwA_vXQ 5YOAD-mFNQFuM0dbd7lHsRBZSfOvhQynhZMhNHkEX-m7gosL23ABoCyS4QAvD_BwE', 'no match example'],
'_source.cookie': ['__cfduid=d118f225fac35345d9e1d87e533b596ec1574680126; gclid=EAIaIQobChMIhNSMxZyF5gIVjMjeCh3V2A-pEAAYASABEgJQBPD_BwE;', '__cfduid=d118f225fac35345d9e1d87e533b596ec1574680126; gclid=EAIaIQobChMIhNSMxZyF5gIVjMjeCh3V2A-pEAAYASABEgJQBPD_BwE;', None]})
</code></pre>
<p>要提取=和;之间的字符串,可以使用regex模式<code>r'=(.+?);'</code>。你知道吗</p>
<pre class="lang-py prettyprint-override"><code>import re
def get_glid_from_source(pattern, data):
result = re.search(pattern, str(data))
if result is not None:
return result.group(1)
return None
df['glid_from_url'] = df.apply(lambda x: get_glid_from_source('[gclid|click_id]=(.+?)$', x['_source.request_url']), axis=1)
df['gclid_from_cookie'] = df.apply(lambda x: get_glid_from_source('gclid=(.+?)[;%&]', x['_source.cookie']), axis=1)
</code></pre>
<p>如果数据中没有匹配项,regex将返回None,因此您必须用<code>if result is not None</code>捕获它。你知道吗</p>
<p>输出数据帧为:</p>
<pre><code> _source.request_url _source.cookie glid_from_url gclid_from_cookie
0 https://google.com/au/?gclid=CjwKCAiAlO7uBRANE... __cfduid=d118f225fac35345d9e1d87e533b596ec1574... CjwKCAiAlO7uBRANEiwA_vXQ 5YOAD-mFNQFuM0dbd7lH... EAIaIQobChMIhNSMxZyF5gIVjMjeCh3V2A-pEAAYASABEg...
1 https://google.com/au/?click_id=CjwKCAiAlO7uBR... __cfduid=d118f225fac35345d9e1d87e533b596ec1574... CjwKCAiAlO7uBRANEiwA_vXQ 5YOAD-mFNQFuM0dbd7lH... EAIaIQobChMIhNSMxZyF5gIVjMjeCh3V2A-pEAAYASABEg...
2 no match example None None None
</code></pre>
<p>如果数据中只有一个匹配项,如果有多个匹配项并且您希望捕获该匹配项,则使用<code>re.findall(pattern, data)</code>,则此示例有效。你知道吗</p>