如何从BS4废料中获取字符串

2024-10-01 19:31:36 发布

您现在位置:Python中文网/ 问答频道 /正文

最近我搜集了一个text/javascript,其中包含以下代码:

var spConfigDisabledProducts = [-1
        , '294653', '294655', '294656', '294657', '294658', '294659', '294660', '294661', '294662', '294663', '294664', '294666', '294667', '294668', '294669', '294670', '294671', '294672', '294673'        ];
        {"attributes":{"959":{"id":"959","code":"aw_taglia","label":"Taglia","options":[{"id":"1717","label":"15","price":"0","oldPrice":"0"...

我只想将var spConfigDisabledProducts中的所有数字排除在-1之外,所以我尝试了以下方法:

js = soup.find_all('script')[25].text.replace(',}', '}').replace(',]', ']').strip()

js = json.dumps(js)
obj = json.loads(js)

data_oos = obj.split('var spConfigDisabledProducts = [-1,')
data_oos = data_oos[1].split("];")

但它返回整个javascript,而不仅仅是var spConfigDisabledProducts

我怎样才能解决这个问题? 提前谢谢


Tags: 代码textidjsonobjdatavarjs
1条回答
网友
1楼 · 发布于 2024-10-01 19:31:36

您可以正则表达式输出列表的字符串表示,然后转换为实际的列表,然后切片

import re, json, ast

s = '''var spConfigDisabledProducts = [-1
        , '294653', '294655', '294656', '294657', '294658', '294659', '294660', '294661', '294662', '294663', '294664', '294666', '294667', '294668', '294669', '294670', '294671', '294672', '294673'        ];
        {"attributes":{"959":{"id":"959","code":"aw_taglia","label":"Taglia","options":[{"id":"1717","label":"15","price":"0","oldPrice":"0"'''

p = re.compile(r'spConfigDisabledProducts = (\[[\s\S]*?\])')
data = ast.literal_eval(p.findall(re.sub('\n|\s{2,}','',s))[0])
print(data[1:])

相关问题 更多 >

    热门问题