<p>我只回答你问题的第一部分:内置的<code>csv</code>模块无法做到这一点</p>
<p>查看CPython源代码,<code>quotechar</code>选项在字段开头是<a href="https://github.com/python/cpython/blob/09eb81711597725f853e4f3b659ce185488b0d8c/Modules/_csv.c#L651" rel="nofollow noreferrer">only processed</a>:</p>
<pre class="lang-c prettyprint-override"><code> case START_FIELD:
/* expecting field */
...
else if (c == dialect->quotechar &&
dialect->quoting != QUOTE_NONE) {
/* start quoted field */
self->state = IN_QUOTED_FIELD;
}
...
break;
</code></pre>
<p>在字段中,<a href="https://github.com/python/cpython/blob/09eb81711597725f853e4f3b659ce185488b0d8c/Modules/_csv.c#L697" rel="nofollow noreferrer">there is no such check</a>:</p>
<pre class="lang-c prettyprint-override"><code> case IN_FIELD:
/* in unquoted field */
if (c == '\n' || c == '\r' || c == '\0') {
/* end of line - return [fields] */
if (parse_save_field(self) < 0)
return -1;
self->state = (c == '\0' ? START_RECORD : EAT_CRNL);
}
else if (c == dialect->escapechar) {
/* possible escaped character */
self->state = ESCAPED_CHAR;
}
else if (c == dialect->delimiter) {
/* save field - wait for new field */
if (parse_save_field(self) < 0)
return -1;
self->state = START_FIELD;
}
else {
/* normal character - save in field */
if (parse_add_char(self, module_state, c) < 0)
return -1;
}
break;
</code></pre>
<p>当解析器处于<code>IN_QUOTED_FIELD</code>状态时,检查<code>quotechar</code>;然而,当遇到引号时,它会返回到<code>IN_FIELD</code>状态,表明我们在一个未引用的字段中。所以这是可能的:</p>
<pre><code>>>> import csv
>>> import io
>>> print(next(csv.reader(io.StringIO('"a,b"cd,e'))))
['a,bcd', 'e']
</code></pre>
<P>但一旦解析器到达初始引用部分的末尾,它将考虑任何后续引用作为数据的一部分。我不知道这种行为是否符合任何(书面或非书面)CSV规范,或者它是否只是一个bug</p>