回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我无法在read\u gbq函数中成功执行正则表达式函数(即REGEXP\u EXTRACT)。你知道吗</p>
<p>read\u gbq来自pandas\u gbq模块。你知道吗</p>
<p>Python程序中的import语句是:from pandas\u gbq import read\u gbq。你知道吗</p>
<p>在我的环境中,pandas gbq的版本是:0.8.0</p>
<p>我相信,正则表达式之所以失败,是因为无法识别双引号上的黑色斜杠转义字符。你知道吗</p>
<p>这个正则表达式在大型查询和使用Python的在线RegEx测试程序中运行良好(参见下面的代码部分)。你知道吗</p>
<p>谢谢你的时间和关注</p>
<pre><code>component = 'CO_ORDER_SUMMARY'
def Read_CO_Order_Summary():
query = ('select co.timestamp, co.jsonPayload._userid_ as co_SVOC, co.jsonPayload.response, \
REGEXP_EXTRACT(co.jsonPayload.response, customerOrderId\":\"([^\"]*)\".*) as CustomerOrderID \
from `exported_logs_v2.mcc_checkout_service_servicelog_20190623` co '
'where co.jsonPayload.component = ' '"' + component + '"'
'order by co.timestamp, co.jsonPayload._userid_ '
'limit 1'
)
co_agg = read_gbq(query, projectid, dialect='standard')
return(co_agg)
co_agg = Read_CO_Order_Summary()
**ERROR MESSAGE**
GenericGBQException: Reason: 400 Syntax error: Expected “)” but got string literal “:” at [1:253]
</code></pre>
<pre><code>****************************************
IN REGEX101.com TESTER (using the Python "flavor" setting)
REGEX
customerOrderId\":\"([^\"]*)\".*
STRING
{"lastModifiedDate":"2019-06-23 16:50:18.212","localStoreId":1515,"cartId":"HC100006597310","customerOrderId":"W838207358","
RESULT
Match 1
Full match customerOrderId":"W838207358","
Group 1. W838207358
******************* Big Query ******************
SELECT timestamp, jsonpayload._userid_, jsonpayload.response,
-- REGEXP_EXTRACT(jsonPayload.response, r'\"customerOrderId\":\"(.*?)\","') as CustomerOrderID, ## All 3 of these work
-- REGEXP_EXTRACT(jsonPayload.response, r"customerOrderId\":\"(.*?)\",") as CustomerOrderID
REGEXP_EXTRACT(jsonPayload.response, r"customerOrderId\":\"([^\"]*)\".*") as CustomerOrderID
FROM `exported_logs_v2.mcc_checkout_service_servicelog_201906*`
where jsonpayload.component like '%CO_ORDER_SUMMARY%' ##'%CO_ORDER_SUMMARY%' or '%CO_SECURE_LOGON%'
and ( _TABLE_SUFFIX between "23" and "23" )
and jsonPayload._userid_ = "0516CFC3D4B001FB0S"
order by timestamp asc
</code></pre>