<p>您可以使用<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html" rel="nofollow noreferrer">converters</a>:</p>
<pre><code>In [156]: def conv(val, default_val=999):
...: try:
...: return int(val)
...: except ValueError:
...: return default_val
...:
In [157]: conv('a')
Out[157]: 999
In [158]: pd.read_csv(r'C:\Temp\test.csv', converters={'a':conv})
Out[158]:
a b c
0 1 11 2000-01-01
1 999 12 2000-01-02
2 3 13 2000-01-02
</code></pre>
<p>另一种方法是在解析CSV文件后以矢量化方式转换列:</p>
^{pr2}$
<p>300.000行DF的速度比较:</p>
<pre><code>In [175]: df = pd.concat([df] * 10**5, ignore_index=True)
In [176]: df.shape
Out[176]: (300000, 3)
In [177]: filename = r'C:\Temp\test.csv'
In [184]: df.to_csv(filename, index=False)
In [185]: %%timeit
...: df = pd.read_csv(filename, parse_dates=['c'], converters={'a':conv, 'b':conv})
...:
632 ms ± 25.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [186]: %%timeit
...: df = pd.read_csv(filename, parse_dates=['c'])
...: df[int_cols] = df[int_cols].apply(pd.to_numeric, errors='coerce').fillna(999).astype(int)
...:
706 ms ± 60.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
</code></pre>