如何根据列表中的后续字符串将字符串列表转换为两列数据帧

product ratings Samsung Galaxy A12 5 out of 5(6) Screenguard No rating Samsung Mos / A02s 4 out of 5(1) Pillow No rating

2条回答

网友

1楼 · 编辑于 2024-09-29 01:21:29

这适用于发布的数据，但我建议您调整用于刮取的代码，以便在找不到评级时返回“无评级”

import pandas as pd

data = [
    "Samsung Galaxy A12 ",
    "5 out of 5(6)",
    "Screenguard",
    "Samsung Galaxy Mos / A02s ",
    "4 out of 5(1)",
    "Pillow",
]

products = [product for product in data if not "out of" in (product)]

ratings = []

idx = 1

for product in products:
    idx = data.index(product)

    if idx>=len(data)-1:
        ratings.append('No rating')
    elif  not 'out of' in data[idx+1]:
        ratings.append('No rating')
    else:
        ratings.append(data[idx+1])

df = pd.DataFrame({'product':products, 'rating': ratings})

Sample Output

                      product         rating
0         Samsung Galaxy A12   5 out of 5(6)
1                 Screenguard      No rating
2  Samsung Galaxy Mos / A02s   4 out of 5(1)
3                      Pillow      No rating

网友

2楼 · 编辑于 2024-09-29 01:21:29

正如在评论中提到的，我认为在清理时处理这个问题更好/更干净

这是一个我认为可以解决你问题的代码


#Init data
import pandas as pd

data = ['Samsung Galaxy A12 ',
 '5 out of 5(6)',
 'Screenguard',
 'Samsung Galaxy Mos / A02s ',
 '4 out of 5(1)'
  'Pillow']

# Create function
def clean_data_to_df(data):
    phones, ratings = [], []
    for idx, value in enumerate(data):
        # 1st phone
        if idx == 0:
            phones.append(value)
            continue
        # Add rating
        if 'out of' in value:
            ratings.append(value)
            continue

        # If not a rating, it is a phone.
        phones.append(value)

        if 'out of' not in data[idx-1]:
            ratings.append('No Rating')

    if len(phones)>len(ratings):
        ratings.append('No Rating')
    return pd.DataFrame({'phone':phones, 'ratings':ratings})

clean_data_to_df(data)

输出



        phone                   ratings
0   Samsung Galaxy A12          5 out of 5(6)
1   Screenguard                 No Rating
2   Samsung Galaxy Mos / A02s   4 out of 5(1)
3   Pillow                      No Rating

输出

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何根据列表中的后续字符串将字符串列表转换为两列数据帧

输出

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >