如何获取特定的lin

2024-05-18 23:07:31 发布

您现在位置:Python中文网/ 问答频道 /正文

嗨,伙计们,我正在抓取一个网站,它有3个电影链接,在每一个电影有3个链接,我有代码来获得3个链接,但我想选择1,只打印1在这个例子中的openload 1,它也打印它像整个iframe一样,我喜欢它打印清晰的链接,像这样='https://openload.co/embed/cosxf9mWZlg/' 我也要把指纹放在这里,让你们知道我现在是多么的正确

import urllib2
import urllib
import re
import requests
from bs4 import BeautifulSoup
from lxml import html
url= ('http://goldfilmesonline.com/goldstone-legendado-online/','http://goldfilmesonline.com/sob-a-sombra-legendado-online/','http://goldfilmesonline.com/fora-do-rumo-dublado-online/')
b=0

while b < len(url):
    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'}
    a = r = requests.get(url[b], headers=headers)
    soup = BeautifulSoup(a.text,'html.parser')
    x = soup.findAll({'iframe' : 'src'})
    print x
    b+=1

这是指纹

^{pr2}$

Tags: fromimportcomhttpurl电影链接requests
2条回答

好吧,伙计们,我有自己的答案,但这看起来不太恰当,但工作。。。如果中小企业知道一个简单或更好的方法使用相同的模块请帮助谢谢

import urllib2
import urllib
import re
import requests
from bs4 import BeautifulSoup
from lxml import html
url= ('http://goldfilmesonline.com/goldstone-legendado-online/','http://goldfilmesonline.com/sob-a-sombra-legendado-online/','http://goldfilmesonline.com/fora-do-rumo-dublado-online/')
b=0

while b < len(url):
    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'}
    a = r = requests.get(url[b], headers=headers)
    soup = BeautifulSoup(a.text,'html.parser')
    x = soup.findAll({'iframe' : 'src'})
    c = x[1]
    a = re.compile('src="(.+?)"').findall(str(c))
    print a
    b+=1

如果我理解您的要求,您只想打印出src中包含openload的iframe。如果是这种情况,那么您只需循环x并检查openload是否在该帧的src值中。如果这是真的,你会打印出那帧。在

while b < len(url):
    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'}
    a = r = requests.get(url[b], headers=headers)
    soup = BeautifulSoup(a.text,'html.parser')
    x = soup.findAll({'iframe' : 'src'})
    #print x
    for eachFrame in x:
        currentSRC = eachFrame['src']
        if "openload" in currentSRC.lower(): #lowercased here just in case.
            #print currentSRC #uncomment this if you want just the src link to print.
            #print eachFrame #uncomment this if you want the whole iFrame to print
    b+=1

相关问题 更多 >

    热门问题