在Python中需要一个正则表达式来拆分字符串

2024-09-27 21:22:42 发布

您现在位置:Python中文网/ 问答频道 /正文

str = 'FW201703002082017MF0164EXESTBOPF01163500116000 0001201700258000580000116000.WALTERS BAY BOGAWANTALAWA 1M'

上面的表达式是需要分开提取的字符串,如下所示:

Borkername = FW
Sale year = 2017
Saleno = 0300
sale_dte = 20.08.2017 # date need to be format
Factoryno = MF0164
Catalogu code= EXEST
Grade =BOPF
Gross weight =01163.50 #decimal point needed
Net Weight = 01163.50 #decimal point needed
Lot_No = 0001
invoice_year = 2017
invoice_no = 00258
price = 000580.00 #decimal point needed
Netweight = 01160.00 #decimal point needed
Buyer = 'WALTERS BAY BOGAWANTALAWA'
Buyer_code = '1M'

这是一条没有分母的单行线。所以,请帮助我写一个正则表达式,用python将每个字段分隔到panda的列中。你知道吗

例如:

(\A[A-Z]{2}) 

这将给我前两个字符。我怎样才能得到下一个4位数作为年份?你知道吗


Tags: 字符串表达式codeinvoicebuyeryearpointdecimal
1条回答
网友
1楼 · 发布于 2024-09-27 21:22:42

你得分两次完成。首先使用正则表达式将字符串拆分为(大部分)固定长度的段。然后使用返回的列表,手动将字段固定为所需的格式。例如:

import re            
import csv

headings = [
    "Borkername", "Sale year", "Saleno", "sale_dte", "Factoryno", "Catalogu code", "Grade", "Gross weight", 
    "Net Weight", "Lot_No", "invoice_year", "invoice_no", "price", "Netweight", "Buyer", "Buyer_code"]

re_fields = re.compile(r'(.{2})(.{4})(.{3})(.{8})(.{6})(.{5})(.{4})(.{7})(.{7}) (.{4})(.{4})(.{5})(.{8})(.{7}).(.*?) (.{2})$')

with open('input.txt') as f_input, open('output.csv', 'w', newline='') as f_output:
    csv_writer = csv.writer(f_output)
    csv_writer.writerow(headings)

    for line in f_input:
        fields = list(re_fields.match(line).groups())

        fields[3] = "{}.{}.{}".format(fields[3][:2], fields[3][2:4], fields[3][4:])
        fields[7] = float("{}.{}".format(fields[7][:5], fields[7][5:]))
        fields[8] = float("{}.{}".format(fields[8][:5], fields[8][5:]))
        fields[12] = float("{}.{}".format(fields[12][:6], fields[12][6:]))
        fields[13] = float("{}.{}".format(fields[13][:5], fields[13][5:]))

        csv_writer.writerow(fields)

这将给您output.csv包含:

Borkername,Sale year,Saleno,sale_dte,Factoryno,Catalogu code,Grade,Gross weight,Net Weight,Lot_No,invoice_year,invoice_no,price,Netweight,Buyer,Buyer_code
FW,2017,030,02.08.2017,MF0164,EXEST,BOPF,1163.5,1160.0,0001,2017,00258,580.0,1160.0,WALTERS BAY BOGAWANTALAWA,1M

然后可以使用熊猫读入:

import pandas as pd

data = pd.read_csv('output.csv')
print data

它给出:

  Borkername  Sale year  Saleno    sale_dte Factoryno Catalogu code Grade  Gross weight  Net Weight  Lot_No  \
0         FW       2017      30  02.08.2017    MF0164         EXEST  BOPF        1163.5      1160.0       1   
   invoice_year  invoice_no  price  Netweight                      Buyer Buyer_code  
0          2017         258  580.0     1160.0  WALTERS BAY BOGAWANTALAWA         1M

相关问题 更多 >

    热门问题