Python中的列选择

searchfor = ['ISBN13','BAR CODE','ISBN NO#','ISBN','ISBN1','ISBN 13','ISBN_13','ITEM','ISBN NUMBER','ISBN No','ISBN-13','ISBN (13 DIGITS)','EAN','ALIAS','ITEMCODE'] searchfor1 = ['TITLE','BOOK NAME','NAME','TITLE NAME','TITLES','BOOKNAME','BKDESC','PRODUCT NAME','ITEM DESCRIPTION','TITLE 18','COMPLETETITLE'] for f, i in zip(files_txt1, num1): df = pd.read_excel(f,encoding='sys.getfilesystemencoding()') df.columns = df.columns.str.upper() df1['Isbn'] = df[df.columns[df.columns.str.contains('|'.join(searchfor))]] df1['Title']= df[df.columns[df.columns.to_series().str.contains('|'.join(searchfor1))]]

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) C:\Users\Ruchir_Kumar_Jha\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\pandas\core\common.py in _asarray_tuplesafe(values, dtype) 376 result = np.empty(len(values), dtype=object) --> 377 result[:] = values 378 except ValueError: ValueError: could not broadcast input array from shape (31807,0) into shape (31807) During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) C:\Users\Ruchir_Kumar_Jha\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\pandas\core\frame.py in _ensure_valid_index(self, value) 2375 try: -> 2376 value = Series(value) 2377 except: C:\Users\Ruchir_Kumar_Jha\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath) 247 data = _sanitize_array(data, index, dtype, copy, --> 248 raise_cast_failure=True) 249 C:\Users\Ruchir_Kumar_Jha\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\pandas\core\series.py in _sanitize_array(data, index, dtype, copy, raise_cast_failure) 3028 else: -> 3029 subarr = _asarray_tuplesafe(data, dtype=dtype) 3030 C:\Users\Ruchir_Kumar_Jha\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\pandas\core\common.py in _asarray_tuplesafe(values, dtype) 379 # we have a list-of-list --> 380 result[:] = [tuple(x) for x in values] 381 ValueError: cannot copy sequence with size 0 to array axis with dimension 31807 During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) <ipython-input-23-9e043c13fef2> in <module>() 11 df.columns = df.columns.str.upper() 12 #print(list(df.columns)) ---> 13 df1['Isbn'] = df[df.columns[df.columns.str.contains('|'.join(searchfor))]] 14 df1['Title'] = df[df.columns[df.columns.to_series().str.contains('|'.join(searchfor1))]] 15 df1['Curr'] = df[df.columns[df.columns.to_series().str.contains('|'.join(searchfor2))]] C:\Users\Ruchir_Kumar_Jha\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value) 2329 else: 2330 # set column -> 2331 self._set_item(key, value) 2332 2333 def _setitem_slice(self, key, value): C:\Users\Ruchir_Kumar_Jha\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value) 2394 """ 2395 -> 2396 self._ensure_valid_index(value) 2397 value = self._sanitize_column(key, value) 2398 NDFrame._set_item(self, key, value) C:\Users\Ruchir_Kumar_Jha\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\pandas\core\frame.py in _ensure_valid_index(self, value) 2376 value = Series(value) 2377 except: -> 2378 raise ValueError('Cannot set a frame with no defined index ' 2379 'and a value that cannot be converted to a ' 2380 'Series') ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

2条回答

网友

1楼 · 编辑于 2024-10-03 11:21:53

只要您没有匹配项或只有1个匹配项，这将起作用

searchfor = ['ISBN13','BAR CODE','ISBN NO#','ISBN','ISBN1','ISBN 13','ISBN_13','ITEM','ISBN NUMBER','ISBN No','ISBN-13','ISBN (13 DIGITS)','EAN','ALIAS','ITEMCODE']
searchfor1 = ['TITLE','BOOK NAME','NAME','TITLE NAME','TITLES','BOOKNAME','BKDESC','PRODUCT NAME','ITEM DESCRIPTION','TITLE 18','COMPLETETITLE']

for f, i in zip(files_txt1, num1): 
    df = pd.read_excel(f,encoding='sys.getfilesystemencoding()') 
    df.columns = df.columns.str.upper()

    cols = df.columns

    is_isbn = cols.isin(searchfor)
    df1['Isbn'] = df[cols[is_isbn]] if is_isbn.any() else None

    is_title = cols.isin(searchfor1)
    df1['Title'] = df[cols[is_title]] if is_title.any() else None

网友

2楼 · 编辑于 2024-10-03 11:21:53

你不需要这些，如果你事先知道你的列，只要在创建dataFrame和导出文件到Pandas本身的时候尝试一下，这样你也会大大减少内存的使用。你知道吗

df = pd.read_csv(file_name, usecols=['ISBN13','BAR CODE','ISBN NO#','ISBN','ISBN1','ISBN 13','ISBN_13','ITEM','ISBN NUMBER','ISBN No','ISBN-13','ISBN (13 DIGITS)','EAN','ALIAS','ITEMCODE']).fillna('')

相关问题更多 >

编程相关推荐

热门问题

热门文章