有没有办法从短语列表中创建列？

{"TV", "Internet", "Wireless Internet", "Kitchen", "Free Parking on Premises", "Buzzer/Wireless Intercom", "Heating", "Family/Kid Friendly", "Washer,Dryer", "Smoke Detector", "Carbon Monoxide Detector", "First Aid Kit", "Safety Card", "Fire Extinguisher", "Essentials" } {"TV", "Internet", "Wireless Internet", "Air Conditioning", "Kitchen", "Pets Allowed", "Pets live on this property", "Dog(s)", "Heating", "Family/Kid Friendly", "Washer", "Dryer", "Smoke Detector", "Carbon Monoxide Detector", "Fire Extinguisher", "Essentials", "Shampoo", "Lock on Bedroom Door", "Hangers", "Hair Dryer", "Iron" }

3条回答

网友

1楼 · 编辑于 2024-09-28 03:24:40

你可以这样做

import pandas as pd

rows = [["TV", "Internet", "Wireless Internet", "Kitchen", "Free Parking on Premises",
         "Buzzer/Wireless Intercom", "Heating", "Family/Kid Friendly",
         "Washer,Dryer", "Smoke Detector", "Carbon Monoxide Detector",
         "First Aid Kit", "Safety Card", "Fire Extinguisher", "Essentials"
         ],

        ["TV", "Internet", "Wireless Internet", "Air Conditioning", "Kitchen",
         "Pets Allowed", "Pets live on this property", "Dog(s)", "Heating",
         "Family/Kid Friendly", "Washer", "Dryer", "Smoke Detector",
         "Carbon Monoxide Detector", "Fire Extinguisher", "Essentials",
         "Shampoo", "Lock on Bedroom Door", "Hangers", "Hair Dryer", "Iron"
         ]
        ]

header = list(set(rows[0]+rows[1]))
words_count = {}
for i in header:
    words_count[i] = []
for row in rows:
    for i in header:
        words_count[i].append(row.count(i))

df = pd.DataFrame(data=words_count, columns=header)

print(df)

# Output
   Safety Card  Hair Dryer  ...  Carbon Monoxide Detector  Shampoo
0            1           0  ...                         1        0
1            0           1  ...                         1        1

[2 rows x 26 columns]

网友

2楼 · 编辑于 2024-09-28 03:24:40

首先为DataFrame创建列：


set1 = {"TV", "Internet", "Wireless Internet", "Kitchen", "Free Parking on Premises",
 "Buzzer/Wireless Intercom", "Heating", "Family/Kid Friendly",
 "Washer,Dryer", "Smoke Detector", "Carbon Monoxide Detector",
 "First Aid Kit", "Safety Card", "Fire Extinguisher", "Essentials"
 }

set2 = {"TV", "Internet", "Wireless Internet", "Air Conditioning", "Kitchen",
 "Pets Allowed", "Pets live on this property", "Dog(s)", "Heating",
 "Family/Kid Friendly", "Washer", "Dryer", "Smoke Detector",
 "Carbon Monoxide Detector", "Fire Extinguisher", "Essentials",
 "Shampoo", "Lock on Bedroom Door", "Hangers", "Hair Dryer", "Iron"
 }

# Create a list of iterables for later
list_of_sets = [set1, set2]

# Create a list with the "splat" operator, and then create a set from the list
columns = set([*set1, *set2])

# Optionally remove spaces, commas, etc
columns_optional = set([x.replace(" ", "").replace(",", "").replace("/", "") for x in columns])

现在创建数据帧行：


def create_rows(list_of_iterables, columns):
    """Iterate through list of iterables (i.e. sets of words) 
    and check if they're in the columns"""
    
    list_of_df_rows = []
    for iterable in list_of_iterables:
        row_dict = {}
        for col in columns:
            # Set it to zero at first
            row_dict[col] = 0
            for item in iterable:
                if col == item:
                    # Change it to 1 if we found a match
                    row_dict[col] = 1
                    
        list_of_df_rows.append(row_dict)
    
    return list_of_df_rows

# Create DataFrame rows
rows = create_rows(list_of_sets, columns)

# Create DataFrame that's tall, not wide, at first
df = pd.DataFrame(rows, columns=columns)

print(df)
>>> Air Conditioning  TV  ...  Free Parking on Premises  Washer,Dryer
0                 0   1   ...                         1             1
1                 1   1   ...                         0             0

网友

3楼 · 编辑于 2024-09-28 03:24:40

既然你不是在问为什么你的代码不起作用，你一定是在问一个算法。-创建一个字典，其中键是短语，值是每行0或1的列表。一个collections.defaultdict(list)应该会有帮助

d = {'phrase1':[row1,row2,...],'phrase2':[row1,row2,...],...}

迭代显示为集合的行，以便使用集合操作
每行
- 查找该行中词典中尚未列出的短语。这是行和字典键之间的区别-row - d.keys()
  - 对于该差异中的每个短语，在其值后面附加零
    - 对于第一行追加零，对于第二行追加一个零，对于第三行追加两个零
- 查找不在此行中的上一个短语。这是字典键和行-d.keys() - row之间的区别
  - 对于该差异中的每个短语，附加一个零
- 为行中的每个短语添加一个
将字典馈送到数据帧构造函数-df = pandas.DataFrame(d)

相关问题更多 >

编程相关推荐

热门问题

热门文章