<p><code>pandas</code>没有一个完全复制您想要的内容的选项,这里有一种方法可以做到,这应该是相对有效的。在</p>
<pre><code>In [4]: dfBad
Out[4]:
custId eventDate registerDate
0 1 06/10/1992 06/08/2002
1 2 08/24/2012 20/08/2012
2 3 04/24/2015 04/20/2015
3 4
4 5 10/14/2009 10/10/2009
In [7]: cols
Out[7]: ['eventDate', 'registerDate']
In [9]: dts = dfBad[cols].apply(lambda x: pd.to_datetime(x, errors='coerce', format='%m/%d/%Y'))
In [10]: dts
Out[10]:
eventDate registerDate
0 1992-06-10 2002-06-08
1 2012-08-24 NaT
2 2015-04-24 2015-04-20
3 NaT NaT
4 2009-10-14 2009-10-10
In [11]: mask = pd.isnull(dts) & (dfBad[cols] != '')
In [12]: mask
Out[12]:
eventDate registerDate
0 False False
1 False True
2 False False
3 False False
4 False False
In [13]: mask.any()
Out[13]:
eventDate False
registerDate True
dtype: bool
In [14]: is_bad = mask.any()
In [23]: if is_bad.any():
...: raise ValueError("bad dates in col(s) {0}".format(is_bad[is_bad].index.tolist()))
...: else:
...: df[cols] = dts
...:
-
ValueError Traceback (most recent call last)
<ipython-input-23-579c06ce3c77> in <module>()
1 if is_bad.any():
> 2 raise ValueError("bad dates in col(s) {0}".format(is_bad[is_bad].index.tolist()))
3 else:
4 df[cols] = dts
5
ValueError: bad dates in col(s) ['registerDate']
</code></pre>