我有一些短语的列表,我想将它们转换成数据框中的列,作为机器学习模型的输入。代码应该在所有数据行中找到唯一的短语,为唯一的行创建列,并通过显示一个1
来指示该短语是否存在于行中,如果该短语存在,则显示一个0
来指示该短语是否存在于行中
这些短语如下所示:
{"TV", "Internet", "Wireless Internet", "Kitchen", "Free Parking on Premises",
"Buzzer/Wireless Intercom", "Heating", "Family/Kid Friendly",
"Washer,Dryer", "Smoke Detector", "Carbon Monoxide Detector",
"First Aid Kit", "Safety Card", "Fire Extinguisher", "Essentials"
}
{"TV", "Internet", "Wireless Internet", "Air Conditioning", "Kitchen",
"Pets Allowed", "Pets live on this property", "Dog(s)", "Heating",
"Family/Kid Friendly", "Washer", "Dryer", "Smoke Detector",
"Carbon Monoxide Detector", "Fire Extinguisher", "Essentials",
"Shampoo", "Lock on Bedroom Door", "Hangers", "Hair Dryer", "Iron"
}
数据帧中所需的输出:
你可以这样做
首先为DataFrame创建列:
现在创建数据帧行:
既然你不是在问为什么你的代码不起作用,你一定是在问一个算法。-创建一个字典,其中键是短语,值是每行0或1的列表。一个collections.defaultdict(list)应该会有帮助
row - d.keys()
d.keys() - row
之间的区别df = pandas.DataFrame(d)
相关问题 更多 >
编程相关推荐