<blockquote>
<p>How to limit regex results?</p>
</blockquote>
<p>在我简单回答这个问题之前,我应该澄清一下为什么现在的表达式会产生一个不想要的结果:在子表达式<code>(?:"contributors": .*?, "truncated": .*?, "text": ")</code>中,最后的<code>.*?</code>,尽管它不是贪婪的,却匹配所有的输入</p>
<pre><code>false, "text": "RT @BelloPromotions: Myke Towers Ft. Mariah - Desaparecemos\n@myketowers #myketowers #mariah @mariah #Desaparecemos #music #musica #musicanu\u2026", "is_quote_status": false, "in_reply_to_status_id": null, "id": 1099558111000506369, "favorite_count": 0, "entities": {"symbols": [], "user_mentions": [{"id": 943461023293542400, "indices": [3, 19], "id_str": "943461023293542400", "screen_name": "BelloPromotions", "name": "Bello Promotions \ud83d\udcc8\ud83d\udcb0"}, {"id": 729572008909000704, "indices": [60, 71], "id_str": "729572008909000704", "screen_name": "MykeTowers", "name": "Towers Myke"}, {"id": 775866464, "indices": [92, 99], "id_str": "775866464", "screen_name": "mariah", "name": "Kenzie peretti"}], "hashtags": [{"indices": [72, 83]
</code></pre>
<p>也就是说,从第一个<code>"truncated":</code>到下一个<code>, "text":</code>之间的所有事物都不被后面的<code>"RT…"</code>排除,那就是在不需要的<code>"myketowers"</code>之前的事物。你知道吗</p>
<p>因此,为了阻止表达式匹配所有的输入,我们不能简单地允许每个字符(<code>.</code>)都在<code>"truncated":</code>和<code>, "text":</code>之间,而是只允许那些构成可能值<code>false</code>和<code>true</code>的字符,或者为了简单起见,只允许<em>单词字符</em>(<code>\w</code>);因此,将上述子表达式更改为<code>(?:"contributors": .*?, "truncated": \w*, "text": ")</code>就足够了。你知道吗</p>