回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我有一个数据存储在csv文件如下格式</p>
<pre><code>892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47,1,0,363272,7,,S
894,2,"Myles, Mr. Thomas Francis",male,62,0,0,240276,9.6875,,Q
895,3,"Wirz, Mr. Albert",male,27,0,0,315154,8.6625,,S
896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22,1,1,3101298,12.2875,,S
897,3,"Svensson, Mr. Johan Cervin",male,14,0,0,7538,9.225,,S
</code></pre>
<p>每个<strong>列的数据类型</strong></p>
^{pr2}$
<p>第一列以<em>892893开头。。。897</em>应以<code>int</code>格式存储<code>array</code>。第三列,如<em>“Wilkes,Mrs.James(Ellen Needs)”</em>应该存储在<code>string</code>类型中。但是,第三列是<code>string</code>格式,但是字符的长度是<strong>不是</strong>固定的,也就是说,我不知道这个列中存储的字符的最大长度</p>
<p><strong>我已经做到了:</strong></p>
<pre><code> csv_file_object = csv.reader(open('trainData.csv', 'rb'))
header = csv_file_object.next()
data=[]
for row in csv_file_object:
data.append(row)
data = np.array(data)
</code></pre>
<p>但是,上面的代码将<strong>所有列</strong>读为<code>string</code>,但其中许多<strong>不是<code>string</code>格式</strong>,并以<code>string</code>格式存储信息。另一方面,如果我使用<code>genfromtxt</code>,那么第三列就是问题,因为它在double quota中包含逗号。在</p>
<p>我希望用它自己的数据类型来存储每一列,也就是说,第一列应该存储为<code>int</code>类型。在</p>
<p><strong>我的预期数组:</strong></p>
<pre><code>892 3 "Kelly, Mr. James" male 34.5 0 0 330911 7.8292 NaN Q
893 3 "Wilkes, Mrs. James (Ellen Needs)" female 47 1 0 363272 7 NaN S
894 2 "Myles, Mr. Thomas Francis" male 62 0 0 240276 9.6875 NaN Q
895 3 "Wirz, Mr. Albert" male 27 0 0 315154 8.6625 NaN S
896 3 "Hirvonen, Mrs. Alexander (Helga E Lindqvist)" female 22 1 1 3101298 12.2875 NaN S
897 3 "Svensson, Mr. Johan Cervin" male 14 0 0 7538 9.225 S
</code></pre>
<p>如您所见,如果数据不可用,<code>NaN</code>或其派生者应该被放入。在</p>
<p>我应该读什么csv文件?在</p>