如何将以某个值开头的字符串分隔到单独的列中?

2024-09-28 21:49:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据,我希望将所有以SAR开头的值都放入“价格”列中。“SAR”值分散在整个数据帧中

name,rating,random,price
"Microwave Oven Sharp 20 Litres, White, R-20AS-W",5.0 out of 5 stars,3,SAR 199.00 
"REBUNE ELECTRIC OVEN 10L, RE1016",SAR 149.00 ,,
Geepas 20 Liter Microwave Oven - GMO1894,SAR 186.00 ,,
Nikai Microwave - 20 LTR -NMO515N8N,5.0 out of 5 stars,3,SAR 192.15 
LG 42 Liter Neo Chef Inverter Microwave with Grill - MH8265CIS,"SAR 1,050.00 ",,

我想要下面这样的。如果数据不可用,则应写“不可用”:

name,rating,random,price
    "Microwave Oven Sharp 20 Litres, White, R-20AS-W",5.0 out of 5 stars,3,SAR 199.00 
    "REBUNE ELECTRIC OVEN 10L, RE1016",Unavailable,Unavailable,SAR 149.00 
    Geepas 20 Liter Microwave Oven - GMO1894,Unavailable,Unavailable,SAR 186.00 
    Nikai Microwave - 20 LTR -NMO515N8N,5.0 out of 5 stars,3,SAR 192.15 
    LG 42 Liter Neo Chef Inverter Microwave with Grill - MH8265CIS,Unavailable,Unavailable,"SAR 1,050.00 "

Tags: of数据namerandomoutpricestarsrating
2条回答

在这里我不使用pandas,而是坚持使用好的csv模块,因为文件可以在行级别处理,而pandas的强度来自于处理列

该算法很简单:测试一行中的所有字段,看看它们是否以SAR开头,并将其移动到price列,将其设置为Unavailable,如果为空,也将其设置为Unavailable

代码可以是:

with open('input.csv') as fdin, open('output.csv', 'w', newline='') as fdout:
    na = 'Unavailable'
    rd = csv.reader(fdin)
    wr = csv.writer(fdout)
    _ = wr.writerow(next(rd))   # copy header
    for row in rd:
        for i in range(len(row) - 1):
            if row[i].startswith('SAR'):
                row[3] = row[i]
                row[i] = na
            elif len(row[i]) == 0:
                row[i] = na
        if len(row[3]) == 0: row[3] = na
        _ = wr.writerow(row)

一种方法是将字符串拆分为一系列要使用的列表:

a = ''' "Microwave Oven Sharp 20 Litres, White, R-20AS-W",5.0 out of 5 stars,3,SAR 199.00 
    "REBUNE ELECTRIC OVEN 10L, RE1016",Unavailable,Unavailable,SAR 149.00 
    Geepas 20 Liter Microwave Oven - GMO1894,Unavailable,Unavailable,SAR 186.00 
    Nikai Microwave - 20 LTR -NMO515N8N,5.0 out of 5 stars,3,SAR 192.15 
    LG 42 Liter Neo Chef Inverter Microwave with Grill - MH8265CIS,Unavailable,Unavailable,"SAR 1,050.00 " '''

# split string starting from SAR into a list, resplit each string in the list starting with space, ignore first occurance
b = [x.split(' ')[0] for x in a.split('SAR ')][1:] 

# convert to float
c = [float(x.replace(',','')) for x in b]

相关问题 更多 >