用标记构造CSV文件

2024-05-18 15:33:26 发布

您现在位置:Python中文网/ 问答频道 /正文

首先,我需要理论方面的帮助。 (如果有人已经面临这个问题,并有样本代码,将不胜感激)

假设你有一个产品,例如肥皂。内部描述将是很多标签(内文本文件)。你知道吗

line 1 productName:SOAP1, productCategory:Bath, productSubCategory: Soap, bla, bla, bla
line 2 productName:SOAP2, productCategory:Bath, productSubCategory: Soap, bla, bla, bla
line 3 productName:SOAP3, productCategory:Bath, productSubCategory: Soap, bla, bla, bla

所有列都有“:”

我需要使用python代码将这些标记转换为CSV,如下所示:

productName    productCategory    productSubCategory
  SOAP1             Bath                 Soap
  SOAP2             Bath                 Soap
  SOAP3             Bath                 Soap

我不知道最好的办法是什么。你知道吗


Tags: 代码产品line理论soap样本bla肥皂
3条回答

这允许您拥有动态标题。你知道吗

import pandas as pd

df = pd.read_csv(r'yourfile.txt',header=None)
print (df)
#                0                     1                         2
#productName:SOAP1, productCategory:Bath, productSubCategory: Soap
#productName:SOAP2, productCategory:Bath, productSubCategory: Soap
#productName:SOAP3, productCategory:Bath, productSubCategory: Soap

headerlist = []
for x in df.loc[0,:]:
    headerlist.append(x.split(':')[0])

for x in df.index:
    for y in df.columns:
        df.loc[x,y] = df.loc[x,y].split(':')[1]
df.columns = headerlist

print (df)
#  productName  productCategory  productSubCategory
#0       SOAP1             Bath                Soap
#1       SOAP2             Bath                Soap
#2       SOAP3             Bath                Soap

有趣的是,您可以使用csv模块来读取输入和写入输出文件。你知道吗

import csv

inp_filename = 'tagged.txt'
out_filename = 'csv_from_tagged.csv'

with open(inp_filename, 'r', newline='') as inp:
    line = next(inp)
    fieldnames = [elem.split(':')[0] for elem in line.split(',')]

    inp.seek(0)  # Rewind

    with open(out_filename, 'w', newline='') as outp:
        csv_writer = csv.DictWriter(outp, fieldnames)
        csv_writer.writeheader()

        for row in csv.reader(inp):
            as_dict = dict(tuple(elem.split(':')) for elem in row)
            csv_writer.writerow(as_dict)

print('done')
import re
import csv

columns = ['productName', 'productCategory', 'productSubCategory']

with open('data.txt') as infile:
  with open('result.csv', 'w') as outfile:
    writer = csv.DictWriter(outfile, columns)
    writer.writeheader()
    for line in infile:
      row = {}
      for column in columns:
        pattern = column + ':(.+?)(, |$)'
        match = re.search(pattern, line)
        row[column] = match.group(1)
      writer.writerow(row)

Demo

如果你不熟悉正则表达式,是时候做一些谷歌和阅读。你知道吗

此解决方案假定每个项的形式为<tag>:<value>,后跟(1)逗号和空格(", ")或(2)行尾(在regex中由$表示)。如果值包含", ",则结果将不正确。:后面的任何空格都将包含在值中。你知道吗

相关问题 更多 >

    热门问题