擅长:python、mysql、java
<p>只需使用<code>:</code>作为分隔符直接读取文本文件即可。然后将数据帧子集为仅保留<code>Detected Text</code>行。使用<code>assign</code>添加所需的列。用链式调用将所有内容包装在列表中</p>
<pre class="lang-py prettyprint-override"><code>file_list = glob.glob(os.path.join(os.getcwd(), "text_files", "*.txt"))
# BUILD LIST OF DFs WITH LIST COMPREHENSION
df_list = [(pd.read_csv(file_path, sep=":", header=0, names=['key', 'text']) # IMPORT COLON-SEPARATED TEXT FILE
.query("key=='Detected Text'") # SUBSET DF FOR NEEDED ROWS
.drop(['key'], axis='columns') # DROP UNNEEDED COLUMN
.assign(id = os.join.basename(file_path).split('_')[0]) # EXTRACT STRING BEFORE UNDERSCORE
.reindex(['id', 'text'], axis='columns') # RE-ORDER COLUMNS
) for file_path in file_list]
final_df = pd.concat(df_list, ignore_index=True) # VERTICALLY STACK ALL DFs
</code></pre>