使用Python提取脚本标签的正则表达式

2024-06-29 01:03:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我用Python编写了以下代码:

import sys, os, requests, datetime, time from bs4 import BeautifulSoup import urllib.request import re import json def get_html(url): headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'} r = requests.get(url, headers=headers) return r.content link = 'https://www.clubx.com.au/products/womanizer-pro?variant=37834367948' soup = BeautifulSoup(get_html(link), 'html.parser') obj = soup.find_all('script')[18] m = re.search(r"\"variants\":\[(.*?)\]", obj.string) if m: data = json.loads(m.group(1)) print(data)

打印结果为:

json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 495 (char 494)

如果我打印obj.string,结果是:

var BOLD = BOLD || {}; BOLD.products = BOLD.products || {}; BOLD.variant_lookup = BOLD.variant_lookup || {};BOLD.variant_lookup[37834367948] ="womanizer-pro";BOLD.variant_lookup[37834368076] ="womanizer-pro";BOLD.variant_lookup[37834368140] ="womanizer-pro";BOLD.variant_lookup[37834368204] ="womanizer-pro";BOLD.variant_lookup[37834368268] ="womanizer-pro";BOLD.products["womanizer-pro"] ={"id":10346996748,"title":"Womanizer Pro","handle":"womanizer-pro","description":"\u003cdiv style=\"text-align: center;\"\u003e\u003cstrong\u003eThis is a web only offer and excludes all retail stores. \u003c/strong\u003e\u003c/div\u003e\n\u003cdiv style=\"text-align: center;\"\u003e\u003cstrong\u003eLimited Time Only\u003c/strong\u003e\u003c/div\u003e\n\u003cp\u003e \u003c/p\u003e\n\u003cdiv style=\"text-align: center;\"\u003e\u003cstrong\u003e\u003c/strong\u003e\u003c/div\u003e\n\u003cp\u003eThe world's most advanced clitoral stimulator - The Womanizer® is in the big world surely only a small novelty in the world of sex toys and erotic products the Womanizer® is a real revolution.\u003c/p\u003e\n\u003cp\u003eThe revolutionary Womanizer® technology makes it possible to stimulate the clitoris without contact for the first time. An unprecedented experience of lust, crowned with powerful orgasms, can be experienced.\u003c/p\u003e\n\u003cp\u003eAn over-stimulation of the clitoris becomes a thing of the past. Orgasms in a previously unexperienced manner and strength - any number of times. \u003cbr\u003e\u003cbr\u003eMany women experience an absolutely new orgasm feeling. The womanizer® technology and mode of operation has nothing to do with a vibrator, the gentlest and yet most powerful stimulator on the world market.\u003cbr\u003e\u003cbr\u003eOur delivery includes the womanizer®, a pouch, 1 spare head, 1 USB charging cable and operating manual.\u003c/p\u003e\n\u003cp\u003e\u003cspan\u003eSee \u003c/span\u003e\u003ca href=\"https://www.clubx.com.au/blogs/product-reviews/is-the-womanizer-right-for-you\" target=\"_blank\" title=\"Womanizer Product Review\" rel=\"noopener noreferrer\"\u003eour blog post\u003c/a\u003e\u003cspan\u003e to get more information on the Womanizer range.\u003c/span\u003e\u003cbr\u003e\u003cbr\u003eFeatures\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eNon-contact stimulation of the clitoris\u003c/li\u003e\n\u003cli\u003ePrecisely adjustable control of the intensity (8 levels)\u003c/li\u003e\n\u003cli\u003eRechargeable lithium-ion battery\u003c/li\u003e\n\u003cli\u003eTop hygiene treatment by removable head (medical silicone)\u003c/li\u003e\n\u003cli\u003eXL stimulation head for large clitoris\u003c/li\u003e\n\u003cli\u003eGerman top design\u003c/li\u003e\n\u003cli\u003eSplashproof\u003c/li\u003e\n\u003cli\u003eNo habituation effect by new Pleasure Air Technology\u003c/li\u003e\n\u003cli\u003eMade with Swarovski elements\u003c/li\u003e\n\u003cli\u003eMaterial: ABS, phthalate free\u003c/li\u003e\n\u003cli\u003eMaterial stimulation head: hypoallergenic medical silicone\u003c/li\u003e\n\u003cli\u003eAdvanced rechargeable lithium-ion battery \u003c/li\u003e\n\u003cli\u003e1-year warranty\u003c/li\u003e\n\u003c/ul\u003e","published_at":"2017-04-18T12:47:49","created_at":"2017-04-18T13:19:07","vendor":"Womanizer","type":"Vibrators","tags":["color-lavender","color-leopard","color-magenta","color-mint","color-rose","color-tattoo","color-white","Deluxe","Discounted","For Her","Stimulators","vibrators"],"price":24999,"price_min":24999,"price_max":24999,"price_varies":false,"compare_at_price":30999,"compare_at_price_min":30999,"compare_at_price_max":30999,"compare_at_price_varies":false,"all_variant_ids":[37834367948,37834368076,37834368140,37834368204,37834368268],"variants":[{"id":37834367948,"title":"Black","option1":"Black","option2":null,"option3":null,"sku":"1725205212","requires_shipping":true,"taxable":true,"featured_image":{"id":2584573542512,"product_id":10346996748,"position":13,"created_at":"2018-05-07T14:01:05+10:00","updated_at":"2018-06-27T16:41:12+10:00","alt":"Womanizer Pro Black - Club X","width":700,"height":700,"src":"https://cdn.shopify.com/s/files/1/0682/0289/products/Womanizer-Pro-Black.png?v=1530081672","variant_ids":[37834367948]},"available":true,"name":"Womanizer Pro - Black","public_title":"Black","options":["Black"],"price":24999,"weight":0,"compare_at_price":30999,"inventory_quantity":2,"inventory_management":"shopify","inventory_policy":"deny","barcode":"703255205212"},{"id":37834368076,"title":"Magenta","option1":"Magenta","option2":null,"option3":null,"sku":"1725205229","requires_shipping":true,"taxable":true,"featured_image":{"id":1186546876441,"product_id":10346996748,"position":11,"created_at":"2018-03-13T21:31:40+11:00","updated_at":"2018-05-22T12:06:24+10:00","alt":"Womanizer Pro Magenta - Club X","width":600,"height":600,"src":"https://cdn.shopify.com/s/files/1/0682/0289/products/246979818968d47fc7413ea41f7d5158_1459914993_grande_ae47b27d-43fb-4e5c-a85f-b9a35036e0f4.jpg?v=1526954784","variant_ids":[37834368076]},"available":true,"name":"Womanizer Pro - Magenta","public_title":"Magenta","options":["Magenta"],"price":24999,"weight":0,"compare_at_price":30999,"inventory_quantity":1,"inventory_management":"shopify","inventory_policy":"deny","barcode":"703255205229"},{"id":37834368140,"title":"Mint","option1":"Mint","option2":null,"option3":null,"sku":"1725205243","requires_shipping":true,"taxable":true,"featured_image":{"id":1186554576921,"product_id":10346996748,"position":14,"created_at":"2018-03-13T21:34:08+11:00","updated_at":"2018-05-22T12:06:25+10:00","alt":"Womanizer Pro Mint - Club X","width":450,"height":488,"src":"https://cdn.shopify.com/s/files/1/0682/0289/products/048d21569da3a18c2c82a335c0e6e17a_1459916999_grande_151de093-60c4-45a1-aace-2ae3bf6cc164.png?v=1526954785","variant_ids":[37834368140]},"available":true,"name":"Womanizer Pro - Mint","public_title":"Mint","options":["Mint"],"price":24999,"weight":0,"compare_at_price":30999,"inventory_quantity":6,"inventory_management":"shopify","inventory_policy":"deny","barcode":"703255205243"},{"id":37834368204,"title":"Rose","option1":"Rose","option2":null,"option3":null,"sku":"172205250","requires_shipping":true,"taxable":true,"featured_image":{"id":1186542157849,"product_id":10346996748,"position":10,"created_at":"2018-03-13T21:30:24+11:00","updated_at":"2018-05-22T12:06:23+10:00","alt":"Womanizer Pro Rose - Club X","width":600,"height":600,"src":"https://cdn.shopify.com/s/files/1/0682/0289/products/7b2cc1561424b59f315dc9832fd4c834_1460014564_grande_e8738aaa-5a85-4f83-a384-fd1b4e1f0b89.jpg?v=1526954783","variant_ids":[37834368204]},"available":true,"name":"Womanizer Pro - Rose","public_title":"Rose","options":["Rose"],"price":24999,"weight":0,"compare_at_price":30999,"inventory_quantity":1,"inventory_management":"shopify","inventory_policy":"deny","barcode":"703255205250"},{"id":37834368268,"title":"White","option1":"White","option2":null,"option3":null,"sku":"1725205236","requires_shipping":true,"taxable":true,"featured_image":{"id":1186555330585,"product_id":10346996748,"position":15,"created_at":"2018-03-13T21:34:23+11:00","updated_at":"2018-05-22T12:06:26+10:00","alt":"Womanizer Pro White - Club X","width":600,"height":600,"src":"https://cdn.shopify.com/s/files/1/0682/0289/products/7892c10587439f32cc3cec90a960b059_1459918683_grande_bf23ffca-e3c1-498a-b5a2-70da92c22591.jpg?v=1526954786","variant_ids":[37834368268]},"available":true,"name":"Womanizer Pro - White","public_title":"White","options":["White"],"price":24999,"weight":0,"compare_at_price":30999,"inventory_quantity":5,"inventory_management":"shopify","inventory_policy":"deny","barcode":"703255205236"}],"available":false,"images":["//cdn.shopify.com/s/files/1/0682/0289/products/e003edb648fd35ff2fa272881176b81b_1460018691.jpg?v=1526954778","//cdn.shopify.com/s/files/1/0682/0289/products/womanizer-pro-purple_1.png?v=1526954779","//cdn.shopify.com/s/files/1/0682/0289/products/womanizer-pro-gold.png?v=1526954779","//cdn.shopify.com/s/files/1/0682/0289/products/8f66fcbb15839f44872691a6f6baa8e1_1459914986.jpg?v=1526954780","//cdn.shopify.com/s/files/1/0682/0289/products/fa5f253c5726b7bfe1eee169eb5df3cc_1459916829.png?v=1526954781","//cdn.shopify.com/s/files/1/0682/0289/products/ace7842e9a0769b9b12d393f14635d85_1459919036.jpg?v=1526954781","//cdn.shopify.com/s/files/1/0682/0289/products/d00d67d74f47e3c8e9414235fa3ebf3b_1460022454.png?v=1526954782","//cdn.shopify.com/s/files/1/0682/0289/products/6557e39ded6ea133945aff0d0a2a7a77_1459917861.jpg?v=1526954782","//cdn.shopify.com/s/files/1/0682/0289/products/5778230f1e1ea39ac8f7f6b9dc1bb623_1459917950_grande_71c126da-c86c-4593-a154-27c2e0dbf133.jpg?v=1528092578","//cdn.shopify.com/s/files/1/0682/0289/products/7b2cc1561424b59f315dc9832fd4c834_1460014564_grande_e8738aaa-5a85-4f83-a384-fd1b4e1f0b89.jpg?v=1526954783","//cdn.shopify.com/s/files/1/0682/0289/products/246979818968d47fc7413ea41f7d5158_1459914993_grande_ae47b27d-43fb-4e5c-a85f-b9a35036e0f4.jpg?v=1526954784","//cdn.shopify.com/s/files/1/0682/0289/products/e8e5cd8a56d792956774acdfb49060b2_1459914573_grande_5fc1db06-b57a-49b1-8461-f4a8f1fc3f9c.jpg?v=1530081671","//cdn.shopify.com/s/files/1/0682/0289/products/Womanizer-Pro-Black.png?v=1530081672","//cdn.shopify.com/s/files/1/0682/0289/products/048d21569da3a18c2c82a335c0e6e17a_1459916999_grande_151de093-60c4-45a1-aace-2ae3bf6cc164.png?v=1526954785","//cdn.shopify.com/s/files/1/0682/0289/products/7892c10587439f32cc3cec90a960b059_1459918683_grande_bf23ffca-e3c1-498a-b5a2-70da92c22591.jpg?v=1526954786"],"featured_image":"//cdn.shopify.com/s/files/1/0682/0289/products/e003edb648fd35ff2fa272881176b81b_1460018691.jpg?v=1526954778","options":["Color"],"url":"/products/womanizer-pro"}

我需要从variants中提取sku和标题。你知道吗


Tags: comidtruecdnfilespriceatproducts
1条回答
网友
1楼 · 发布于 2024-06-29 01:03:36

使用regex模式r"\"variants\":\[(.*?)\]"

演示:

from bs4 import BeautifulSoup
import json
import re

s = """<script>var BOLD = BOLD || {};
    BOLD.products = BOLD.products || {};
    BOLD.variant_lookup = BOLD.variant_lookup || {};BOLD.variant_lookup[31066737740] ="womanizer";BOLD.variant_lookup[31066737804] ="womanizer";BOLD.variant_lookup[31066737868] ="womanizer";BOLD.variant_lookup[31066737996] ="womanizer";BOLD.variant_lookup[1509908217881] ="womanizer";BOLD.products["womanizer"] ={"id":8993669708,"title":"Womanizer","variants":[{"id":37834367948,"title":"Black","option1":"Black","option2":null,"option3":null,"sku":"1725205212"}]}
    </script>
"""

soup = BeautifulSoup(s, "html.parser")
src = soup.find("script")
m = re.search(r"\"variants\":\[(.*?)\]", src.string)
if m:
    data = json.loads(m.group(1))
    print(data)

输出:

{u'sku': u'1725205212', u'title': u'Black', u'id': 37834367948L, u'option2': None, u'option3': None, u'option1': u'Black'}

相关问题 更多 >