<p>几个问题:</p>
<ul>
<li>对循环变量<code>node</code>调用<code>.find</code>,要求存在一个子节点:<code>current_node.find('child_of_current_node')</code>。但是,由于所有的节点都是根的子节点,所以不需要维护自己的子节点,所以不需要循环</li>
<li>不检查<code>NoneType</code>,这可能是由于缺少带有<code>find()</code>的节点而导致的,并阻止检索{<cd6>}或{<cd7>}或其他属性</li>
<li>不使用<code>.text</code>检索节点内容,否则返回<code><Element...</code>对象</li>
</ul>
<p>考虑使用<a href="https://stackoverflow.com/questions/394809/does-python-have-a-ternary-conditional-operator">ternary condition expression</a><code>a if condition else b</code>进行此调整,以确保变量具有值,而不管:</p>
<pre><code>rows = []
s_name = xroot.attrib.get("ID")
s_student = xroot.find("StudentID").text if xroot.find("StudentID") is not None else None
s_task = xroot.find("TaskID").text if xroot.find("TaskID") is not None else None
s_source = xroot.find("DataSource").text if xroot.find("DataSource") is not None else None
s_desc = xroot.find("ProblemDescription").text if xroot.find("ProblemDescription") is not None else None
s_question = xroot.find("Question").text if xroot.find("Question") is not None else None
s_ans = xroot.find("Answer").text if xroot.find("Answer") is not None else None
s_label = xroot.find("Label").text if xroot.find("Label") is not None else None
s_contextrequired = xroot.find("ContextRequired").text if xroot.find("ContextRequired") is not None else None
s_extraInfoinAnswer = xroot.find("ExtraInfoInAnswer").text if xroot.find("ExtraInfoInAnswer") is not None else None
s_comments = xroot.find("Comments").text if xroot.find("Comments") is not None else None
s_watch = xroot.find("Watch").text if xroot.find("Watch") is not None else None
s_referenceAnswers = xroot.find("ReferenceAnswers").text if xroot.find("ReferenceAnswers") is not None else None
rows.append({"ID": s_name,"StudentID":s_student, "TaskID": s_task,
"DataSource": s_source, "ProblemDescription": s_desc ,
"Question": s_question , "Answer": s_ans ,"Label": s_label,
"s_contextrequired": s_contextrequired , "ExtraInfoInAnswer": s_extraInfoinAnswer ,
"Comments": s_comments , "Watch": s_watch, "ReferenceAnswers": s_referenceAnswers
})
out_df = pd.DataFrame(rows, columns = df_cols)
</code></pre>
<p>或者,运行一个更动态的版本,使用iterator变量为内部字典赋值:</p>
^{pr2}$
<p>或列表/听写理解:</p>
<pre><code>rows = [{node.tag: node.text} for node in xroot]
out_df = pd.DataFrame(rows, columns = df_cols)
</code></pre>