datafram中的空引号用法

2024-09-26 22:55:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试将|df.loc合并以提取数据。我写的代码提取了csv文件中的所有内容。这是原始的csv文件:https://drive.google.com/open?id=16eo29mF0pn_qNw-BGpZyVM9PBxv2aN1G

import pandas as pd

df = pd.read_csv("yelp_business.csv")
df = df.loc[(df['categories'].str.contains('chinese', case = False)) | (df['name'].str.contains('subway', case = False)) | (df['categories'].str.contains('', case = False)) | (df['address'].str.contains('', case = False))]

print df

看起来空白引用^ {< CD3>}在^ {< CD4>}中不起作用,或者{^ {CD1>}在^ {CD2>}中不起作用。它不只是返回带有chinese餐馆(数字为4171)的行和带有餐馆名称subway的行,而是返回所有174,568

编辑

我想要的输出应该是类别chinese的所有行和名称subway的所有行,同时考虑到地址可能没有任何赋值或为null

import pandas as pd

df = pd.read_csv("yelp_business.csv")

cusine = 'chinese'
name = 'subway'
address #address has no assigned value or is NULL

df = df.loc[(df['categories'].str.contains(cusine, case = False)) |
            (df['name'].str.contains(name, case = False)) | 
            (df['address'].str.contains(address, case = False))]


print df

这段代码给了我一个错误NameError: name 'address' is not defined


Tags: 文件csv代码namefalsedfaddressloc
2条回答

我认为这里有可能的链式条件是|对于categories列,对于查找空字符串使用^""$-它用引号匹配字符串的开头和结尾:

df = pd.read_csv("yelp_business.csv")

df1 = df.loc[(df['categories'].str.contains('chinese|^""$', case = False)) |
            (df['name'].str.contains('subway', case = False)) | 
            (df['address'].str.contains('^""$', case = False))]
print (len(df1))
11320

print (df1.head())

               business_id                     name neighborhood  \
9   TGWhGNusxyMaA4kQVBNeew  "Detailing Gone Mobile"          NaN   
53  4srfPk1s8nlm1YusyDUbjg              ***"Subway"    Southeast   
57  spDZkD6cp0JUUm6ghIWHzA              "Kitchen M"   Unionville   
63  r6Jw8oRCeumxu7Y1WRxT7A           "D&D Cleaning"          NaN   
88  YhV93k9uiMdr3FlV4FHjwA        "Caviness Studio"          NaN   

                          address       city state postal_code   latitude  \
9                           ***""  Henderson    NV       89014  36.055825   
53  "6889 S Eastern Ave, Ste 101"  Las Vegas    NV       89119  36.064652   
57            "8515 McCowan Road"    Markham    ON     L3P 5E5  43.867918   
63                          ***""     Urbana    IL       61802  40.110588   
88                          ***""    Phoenix    AZ       85001  33.449967   

     longitude  stars  review_count  is_open  \
9  -115.046350    5.0             7        1   
53 -115.118954    2.5             6        1   
57  -79.283687    3.0            80        1   
63  -88.207270    5.0             4        0   
88 -112.070223    5.0             4        1   

                                           categories  
9                           Automotive;Auto Detailing  
53                   Fast Food;Restaurants;Sandwiches  
57                             ***Restaurants;Chinese  
63         Home Cleaning;Home Services;Window Washing  
88  Marketing;Men's Clothing;Restaurants;Graphic D...  

编辑:如果需要过滤空值和NaNs值:

df2 = df.loc[(df['categories'].str.contains('chinese', case = False)) |
            (df['name'].str.contains('subway', case = False)) & 
           ~((df['address'] == '""') | (df['categories'] == '""'))]

print (df2.head())
                business_id              name     neighborhood  \
53   4srfPk1s8nlm1YusyDUbjg          "Subway"        Southeast   
57   spDZkD6cp0JUUm6ghIWHzA       "Kitchen M"       Unionville   
96   dTWfATVrBfKj7Vdn0qWVWg  "Flavor Cuisine"      Scarborough   
126  WUiDaFQRZ8wKYGLvmjFjAw    "China Buffet"  University City   
145  vzx1WdVivFsaN4QYrez2rw          "Subway"              NaN   

                                 address       city state postal_code  \
53         "6889 S Eastern Ave, Ste 101"  Las Vegas    NV       89119   
57                   "8515 McCowan Road"    Markham    ON     L3P 5E5   
96                "8 Glen Watford Drive"    Toronto    ON     M1S 2C1   
126  "8630 University Executive Park Dr"  Charlotte    NC       28262   
145                   "5111 Boulder Hwy"  Las Vegas    NV       89122   

      latitude   longitude  stars  review_count  is_open  \
53   36.064652 -115.118954    2.5             6        1   
57   43.867918  -79.283687    3.0            80        1   
96   43.787061  -79.276166    3.0             6        1   
126  35.306173  -80.752672    3.5            76        1   
145  36.112895 -115.062353    3.0             3        1   

                                 categories  
53         Fast Food;Restaurants;Sandwiches  
57                      Restaurants;Chinese  
96           Restaurants;Chinese;Food Court  
126  Buffets;Restaurants;Sushi Bars;Chinese  
145        Sandwiches;Restaurants;Fast Food  

相关问题 更多 >

    热门问题