<p>您可以使用以下正则表达式从文本中提取Instagram链接:</p>
<pre><code><(.+)\|\(Instagram\)>
</code></pre>
<p><a href="https://regex101.com/r/WpL2dw/1" rel="nofollow noreferrer">See here</a></p>
<p>它搜索由<code><</code>和<code>|(Instagram)></code>包装的任何文本,并将其存储在捕获组中</p>
<hr/>
<p>你可以这样使用它</p>
<pre class="lang-py prettyprint-override"><code>import re
INSTA_LINK_RE = re.compile(r'<(.+)\|\(Instagram\)>')
match = INSTA_LINK_RE.search(json.dumps(data["event"]["attachments"][0]["text"]))
if match:
url = match[1] # gets the first capturing group
</code></pre>
<hr/>
<p>如果您只想获取短代码,请使用<a href="https://regex101.com/r/WpL2dw/2" rel="nofollow noreferrer">this regex</a></p>
<pre><code><https://www.instagram.com/p/(.+)/\|\(Instagram\)>
</code></pre>
<hr/>
<p>如果您有一个<code>str</code>对象要用<code>str</code>正则表达式进行分析,那么这种方法是有效的</p>
<p>如果文本是<code>bytes</code>对象,则需要先对其进行解码</p>
<pre class="lang-py prettyprint-override"><code># JSON files are normally encoded with UTF-8
json.dumps(data["event"]["attachments"][0]["text"]).decode('utf8`)
</code></pre>
<p>。。。或者使用<code>bytes</code>正则表达式</p>
<pre class="lang-py prettyprint-override"><code># note the `b` prefix for the regex pattern
INSTA_LINK_RE = re.compile(br'<(.+)\|\(Instagram\)>')
</code></pre>
<hr/>
<p>要直接获取包含<code>str</code>对象的dict,还可以将编码传递给<code>open</code>函数:</p>
<pre class="lang-py prettyprint-override"><code>f = open('json-test-file-for-insta-url-snippet.json', encoding='utf-8`)
</code></pre>
<hr/>
<p>请参阅一些python文档以了解更多信息:</p>
<ul>
<li><a href="https://docs.python.org/3/library/re.html" rel="nofollow noreferrer">regex</a></li>
<li><a href="https://docs.python.org/3.6/library/stdtypes.html?highlight=bytes#bytes.decode" rel="nofollow noreferrer">bytes.decode</a></li>
<li><a href="https://docs.python.org/fr/3/library/functions.html?highlight=open#open" rel="nofollow noreferrer">open</a></li>
<li><a href="https://docs.python.org/3/howto/unicode.html" rel="nofollow noreferrer">encoding</a></li>
</ul>