<p>为了检测样式,可以使用像<a href="https://github.com/deepspace2/StyleFrame" rel="nofollow noreferrer">styleframe</a>这样的外部包。对每个示例文件重复步骤1和2</p>
<ol>
<li>阅读示例文件,确定<code>style</code>为粗体的索引</li>
</ol>
<pre><code>from styleframe import StyleFrame
sf = StyleFrame.read_excel('Example-1.xlsx', read_style=True, use_openpyxl_styles=False, headers=None)
indices=[]
for i in range(0, len(sf)):
for val in sf.iloc[i]:
if(val.style.bold):
indices.append(i)
</code></pre>
<ol start=“2”>
<li>查找索引之间的值</李>
</ol>
<pre><code>df=pd.read_excel("Example-1.xlsx", headers=None)
df=df.astype(str)
columns=[]
values=[]
for i in range(0,len(indices)):
print(i)
columns.append(df.iloc[indices[i]].values[0])
if(i+1<len(indices)):
values.append(list(df.iloc[indices[i]+1:indices[i+1]].values))
else:
if(indices[i]+1<len(df)):
values.append(list(df.iloc[indices[i]+1:].values))
else:
values.append([])
values=list(map(lambda z: " ".join([x[0] for x in z]), values))
temp_dict=dict(zip(columns, values))
</code></pre>
<ol start=“3”>
<li>以下代码根据需要创建最终数据帧-</li>
</ol>
<pre><code>final_dict=[]
final_dict.append(temp_dict)
final_df=pd.DataFrame.from_dict(final_dict)
</code></pre>
<p><code>Example File</code>必须包含一个额外的头,以减少歧义</p>
<p><a href="https://i.stack.imgur.com/oqJZT.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/oqJZT.png" alt="Sample Input"/></a></p>