任务：我正在尝试从字典列表创建一个pandas数据框。问题：这会为每个字典项创建一个数据帧

# Step 1 create a webscraper which takes three sets of data (price, bedrooms and bathrooms) from a website and populate into three separate lists for container in containers: try: price_container=container.find("a",{"class":"listing-price text-price"}) price_strip=price_container.text.strip() price_list=[] price_list.append(price_strip) except TypeError: continue try: bedroom_container = container.find("span",{"class":"icon num-beds"}) bedroom_strip=(bedroom_container["title"]) bedroom_list=[] bedroom_list.append(bedroom_strip) except TypeError: continue try: bathroom_container=container.find("span", {"class":"icon num-baths"}) bathroom_strip=(bathroom_container["title"]) bathroom_list=[] bathroom_list.append(bathroom_strip) except TypeError: continue # Step 2 create a dictionary data = {'price':price_list, 'bedrooms':bedroom_list, 'bathrooms':bathrooms_list} # Step 3 turn it into a pandas dataframe and print the output d=pd.DataFrame(data) print(d)

price bedrooms bathrooms 0 £200,000 3 2 [1 rows x 3 columns] price bedrooms bathrooms 0 £400,000 5 3 [1 rows x 3 columns] prices bedrooms bathrooms 0 £900,000 6 4 [1 rows x 3 columns] and so on.....

price bedrooms bathrooms 0 £200,000 3 2 0 £200,000 3 2 0 £200,000 3 2 [3 rows x 3 columns] price bedrooms bathrooms 0 £400,000 5 3 0 £400,000 5 3 0 £400,000 5 3 [3 rows x 3 columns] price bedrooms bathrooms 0 £900,000 6 4 0 £900,000 6 4 0 £900,000 6 4 [1 rows x 3 columns] and so on...

3条回答

网友

1楼 · 编辑于 2024-05-19 11:04:47

您在这里测试的代码部分很好-列表字典将始终返回单个数据帧。所以这一部分：

pd.DataFrame(data)

不可能是问题的原因。事实上，它被隐藏在一个循环中，所以运行了三次。这同样适用于您的列表，这些列表被反复定义

把这些部分从循环中去掉，你应该会没事的

网友

2楼 · 编辑于 2024-05-19 11:04:47

您必须合并这三个列表

df = pd.DataFrame(data["price"] + data["bedrooms"] + data["bathrooms"] )

如果您想要更通用的东西：

list_ = [item for i in data for item in data[i]]
df = pd.DataFrame(list_)

网友

3楼 · 编辑于 2024-05-19 11:04:47

首先，您应该在for循环之前执行price_list=[]和bedroom_list=[]和bathroom_list=[]，否则它们最多只有1个元素，因为每次循环都会重置为[]，然后附加单个元素。第二，如果您希望有单个数据帧，您应该在for循环之外创建它，即dedentdata = {'price':price_list, 'bedrooms':bedroom_list, 'bathrooms':bathrooms_list} 和下面的句子。最后，在缺少数据的情况下，您应该表示它-如果除了第一个continue之外的任何数据都将被执行，那么您的price_list，bedroom_list，bathroom_list将具有不同的长度。我建议用price_list.append(None)替换第一个continue，用bedroom_list.append(None)替换第二个bedroom_list.append(None)，用bathroom_list.append(None)替换第三个bathroom_list.append(None)，这样在数据帧中就可以清楚地指示数据丢失的位置

相关问题更多 >

编程相关推荐

热门问题

热门文章