<p>我更喜欢在试图在线抓取文本或文件时使用<code>requests</code>(<a href="https://requests.readthedocs.io/en/master/" rel="nofollow noreferrer">https://requests.readthedocs.io/en/master/</a>)。我用<code>wget</code>快速尝试了一下,得到了相同的错误(可能链接到<code>wget</code>使用的用户代理HTTP头)</p>
<ul>
<li><code>wget</code>和HTTP头问题:<a href="https://stackoverflow.com/questions/34692009/download-image-from-url-using-python-urllib-but-receiving-http-error-403-forbid">download image from url using python urllib but receiving HTTP Error 403: Forbidden</a></li>
<li>HTTP头文件:<a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent" rel="nofollow noreferrer">https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent</a></li>
</ul>
<p>使用<code>requests</code>的好处是,它允许您以您想要的方式(<a href="https://requests.readthedocs.io/en/master/user/quickstart/#custom-headers" rel="nofollow noreferrer">https://requests.readthedocs.io/en/master/user/quickstart/#custom-headers</a>)修改HTTP头</p>
<pre class="lang-py prettyprint-override"><code>import requests
r = requests.get("https://s3.amazonaws.com/os_uploads/2169504_DFA%20train%20pass.PNG?AWSAccessKeyId=AKIAIPCTK7BDMEW7SP4Q&Expires=1909634500&Signature=aQlQXVR8UuYLtkzjvcKJ5tiVrZQ=&response-content-disposition=attachment;%20filename*=utf-8''DFA%2520train%2520pass.PNG")
with open("myfile.png", "wb") as file:
file.write(r.content)
</code></pre>
<p>我不确定我是否理解您正在尝试做什么,但也许您希望使用格式化字符串来构建URL(<a href="https://docs.python.org/3/library/stdtypes.html?highlight=format#str.format" rel="nofollow noreferrer">https://docs.python.org/3/library/stdtypes.html?highlight=format#str.format</a>)</p>
<p>在您的例子(<code>if x[0:4] == "http":</code>)中,检查字符串索引可能很好,但我认为您应该检查python<code>re</code>包,以使用正则表达式捕获文档(<a href="https://docs.python.org/3/library/re.html" rel="nofollow noreferrer">https://docs.python.org/3/library/re.html</a>)中所需的元素</p>
<pre class="lang-py prettyprint-override"><code>import re
regex = re.compile(r"^http://")
if re.match(regex, mydocument):
<do something>
</code></pre>