<p>假设我理解正确,您可以拆分包含所有列名的字符串输入(用空格分隔),然后使用列表理解构建一个字典,然后从中生成一个空数据帧。你知道吗</p>
<pre><code>import pandas as pd
string="""
MODIFIED
CORE SERVER ACTIVE PASSIVE PACKAGES
-
cs010 1.9.2.0-2+auto166 1.9.2.0-2+auto146 no
"""
string = string.split("-")[0]
col_names ={name:[ ] for name in re.split(r"\s\s+", string)
if name is not ""}
df = pd.DataFrame(col_names)
print(col_names)
print(df)
# with output below:
{'MODIFIED': [], 'CORE SERVER': [], 'ACTIVE': [], 'PASSIVE': [], 'PACKAGES': []}
Empty DataFrame
Columns: [MODIFIED, CORE SERVER, ACTIVE, PASSIVE, PACKAGES]
Index: []
</code></pre>
<p>正则表达式拆分的文档在这里:<a href="https://docs.python.org/3/library/re.html#re.split" rel="nofollow noreferrer">re.split()</a>如果您想使用正则表达式。你知道吗</p>
<p>因为您有行输出,可以在行中加倍,但似乎有连字号指示列大小,您可以使用类似于:</p>
<pre><code>import re
import pandas as pd
string="""
MODIFIED
CORE SERVER ACTIVE PASSIVE PACKAGES
-
cs010 1.9.2.0-2+auto166 1.9.2.0-2+auto146 no
"""
rows = [row for row in re.split(r"\n|\r", string)]
for row in rows:
if " -" in row:
# get all of the splits below columns
indices = [i for i,j in enumerate(row) if j.isspace()]
# After you find the column width stop checking rows.
break
indices.insert(0, 0)
matrix = [ ]
for row in rows:
# from your output, hyphens show where headers stop
if " -" in row:
break
matrix.append([row[i:j] for i,j in zip(indices, indices[1:]+[None])])
n = (len(indices))
col_names = [""]*n
for i in range(n):
for row in matrix:
col_names[i] += row[i]
col_names[i] = col_names[i].strip()
df = pd.DataFrame(columns=[c for c in col_names if c is not ''])
print(df)
# with output:
Empty DataFrame
Columns: [CORE SERVER, ACTIVE, PASSIVE, MODIFIED PACKAGES]
Index: []
</code></pre>
<p>这段代码不是有史以来最有效的东西,但可以完成任务,不需要添加许多函数。你知道吗</p>