datafram中的空引号用法

import pandas as pd df = pd.read_csv("yelp_business.csv") df = df.loc[(df['categories'].str.contains('chinese', case = False)) | (df['name'].str.contains('subway', case = False)) | (df['categories'].str.contains('', case = False)) | (df['address'].str.contains('', case = False))] print df

import pandas as pd df = pd.read_csv("yelp_business.csv") cusine = 'chinese' name = 'subway' address #address has no assigned value or is NULL df = df.loc[(df['categories'].str.contains(cusine, case = False)) | (df['name'].str.contains(name, case = False)) | (df['address'].str.contains(address, case = False))] print df

2条回答

网友
1楼 · 编辑于 2024-09-26 22:55:21

有关包含的详细信息，请参阅 https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.contains.html

网友
2楼 · 编辑于 2024-09-26 22:55:21

我认为这里有可能的链式条件是|对于categories列，对于查找空字符串使用^""$-它用引号匹配字符串的开头和结尾：
df = pd.read_csv("yelp_business.csv") df1 = df.loc[(df['categories'].str.contains('chinese|^""$', case = False)) | (df['name'].str.contains('subway', case = False)) | (df['address'].str.contains('^""$', case = False))] print (len(df1)) 11320
print (df1.head()) business_id name neighborhood \ 9 TGWhGNusxyMaA4kQVBNeew "Detailing Gone Mobile" NaN 53 4srfPk1s8nlm1YusyDUbjg ***"Subway" Southeast 57 spDZkD6cp0JUUm6ghIWHzA "Kitchen M" Unionville 63 r6Jw8oRCeumxu7Y1WRxT7A "D&D Cleaning" NaN 88 YhV93k9uiMdr3FlV4FHjwA "Caviness Studio" NaN address city state postal_code latitude \ 9 ***"" Henderson NV 89014 36.055825 53 "6889 S Eastern Ave, Ste 101" Las Vegas NV 89119 36.064652 57 "8515 McCowan Road" Markham ON L3P 5E5 43.867918 63 ***"" Urbana IL 61802 40.110588 88 ***"" Phoenix AZ 85001 33.449967 longitude stars review_count is_open \ 9 -115.046350 5.0 7 1 53 -115.118954 2.5 6 1 57 -79.283687 3.0 80 1 63 -88.207270 5.0 4 0 88 -112.070223 5.0 4 1 categories 9 Automotive;Auto Detailing 53 Fast Food;Restaurants;Sandwiches 57 ***Restaurants;Chinese 63 Home Cleaning;Home Services;Window Washing 88 Marketing;Men's Clothing;Restaurants;Graphic D...
编辑：如果需要过滤空值和NaNs值：
df2 = df.loc[(df['categories'].str.contains('chinese', case = False)) | (df['name'].str.contains('subway', case = False)) & ~((df['address'] == '""') | (df['categories'] == '""'))] print (df2.head()) business_id name neighborhood \ 53 4srfPk1s8nlm1YusyDUbjg "Subway" Southeast 57 spDZkD6cp0JUUm6ghIWHzA "Kitchen M" Unionville 96 dTWfATVrBfKj7Vdn0qWVWg "Flavor Cuisine" Scarborough 126 WUiDaFQRZ8wKYGLvmjFjAw "China Buffet" University City 145 vzx1WdVivFsaN4QYrez2rw "Subway" NaN address city state postal_code \ 53 "6889 S Eastern Ave, Ste 101" Las Vegas NV 89119 57 "8515 McCowan Road" Markham ON L3P 5E5 96 "8 Glen Watford Drive" Toronto ON M1S 2C1 126 "8630 University Executive Park Dr" Charlotte NC 28262 145 "5111 Boulder Hwy" Las Vegas NV 89122 latitude longitude stars review_count is_open \ 53 36.064652 -115.118954 2.5 6 1 57 43.867918 -79.283687 3.0 80 1 96 43.787061 -79.276166 3.0 6 1 126 35.306173 -80.752672 3.5 76 1 145 36.112895 -115.062353 3.0 3 1 categories 53 Fast Food;Restaurants;Sandwiches 57 Restaurants;Chinese 96 Restaurants;Chinese;Food Court 126 Buffets;Restaurants;Sushi Bars;Chinese 145 Sandwiches;Restaurants;Fast Food

相关问题更多 >

编程相关推荐

热门问题

热门文章