查找列名称并在整个列值中保留某些字符串

2024-10-03 21:34:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我想在csv中设置“status”列的格式,并将字符串保留在逗号('sometext',)旁边的单引号中

例如:

输入enter image description here

如第2行及;3-如果在任何列值中发现多个值,则应使用管道符号(|)连接,例如电话充电器

预期的输出应粘贴在相同的状态列中,如下所示

enter image description here

我的尝试(无效):

import pandas as pd 
df = pd.read_csv("test projects.csv") 
scol = df.columns.get_loc("Status") 
statusRegex = re.
compile("'\t',"?"'\t',") mo = statusRegex.search (scol.column)

Tags: csv字符串df管道格式status符号pd
3条回答

假设您有df as:

df = pd.DataFrame([[[{'a':'1', 'b': '4'}]], [[{'a':'1', 'b': '2'}, {'a':'3', 'b': '5'}]]], columns=['pr'])

df:

     pr
0   [{'a': '1', 'b': '4'}]
1   [{'a': '1', 'b': '2'}, {'a': '3', 'b': '5'}]

df['comb'] = df.pr.apply(lambda x: '|'.join([i['a'] for i in x]))

df:

    pr                                              comb
0   [{'a': '1', 'b': '4'}]                          1
1   [{'a': '1', 'b': '2'}, {'a': '3', 'b': '5'}]    1|3

谢谢大家的帮助和建议。请查找最终工作代码

df=pd.read\u csv('test projects.csv'))

行=len(df['input'])

def get_值(值):

m = re.findall("'(.+?)'",value)

word = ""

for mm in m:

    if 'value' not in str(mm):
        if 'autolabel_strategy' not in str(mm):
            if 'String Matching' not in str(mm):
              word += mm + "|"
return str(word).rsplit('|',1)[0]

阿尔斯特=[]

ans_lst=[]

对于范围内的r(行):

auto_label = df['autolabeledValues'][r]
answers = df['answers'][r]

al = get_values(auto_label)
ans = get_values(answers)
al_lst.append(al)
ans_lst.append(ans)

df['a']=allst

df['b']=ans_lst

df.to_csv(“Output.csv”,index=False)

import pandas as pd

# simplified mock data
df = pd.DataFrame(dict(
    value=[23432] * 3,
    Status=[
        [{'product.type': 'Laptop'}],
        [{'product.type': 'Laptop'}, {'product.type': 'Charger'}],
        [{'product.type': 'TV'}, {'product.type': 'Remote'}]
    ]
))

# make a method to do the desired formatting / extration of data
def da_piper(cell):
    """extracts product.type and concatenates with a pipe"""
    vals = [_['product.type'] for _ in cell]  # get only the product.type values
    return '|'.join(vals)  # join them with a pipe

# save to desired column
df['output'] = df['Status'].apply(da_piper)  # apply the method to the Status col

其他帮助:您不需要使用read_excel,因为csv不是excel格式。它是标准格式的逗号分隔值。在这种情况下,您可以这样做:

import pandas as pd
  
# make a method to do the desired formatting / extration of data
def da_piper(cell):
    """extracts product.type and concatenates with a pipe"""
    vals = [_['product.type'] for _ in cell]  # get only the product.type values
    return '|'.join(vals)  # join them with a pipe
    
# read csv to dataframe
df = pd.read_csv("test projects.csv")

# apply method and save to desired column
df['Status'] = df['Status'].apply(da_piper)  # apply the method to the Status col

相关问题 更多 >