根据值的类型筛选Dataframe中的数据

2024-10-02 12:30:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我几乎不想使用.loc函数过滤我的数据帧集,条件基于我的一列中的数据类型

我的目标是(使用.apply)只在具有特定类型的行上对列应用函数

我尝试使用“dtype”,但我的列有两种不同类型的值。所以我只得到“对象”

所以,当我这样做时:print(df.info(verbose=True))我得到:

 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   address              26419 non-null  object
.
.
.

以下是我尝试运行的内容:

import ipaddress as ipa
.
.
.
    df.loc['EXCEPTION'] = df.loc[isinstance(df['address'], ipa.IPv4Network)].apply(
        return_row_with_exception,
        axis=1)

它应该只更新数据帧“df”上的列“EXCEPTION”,只更新列“address”中的数据为IPv4Network类型的行。函数“return_row_with_exception”根据使用行中其他列的规则,返回每行“exception”的字符串内容

不幸的是,我得到了这个错误,有人能帮我这个:D

Traceback (most recent call last):
  File "pythonProject1111\venv\lib\site-packages\pandas\core\indexes\base.py", line 2895, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 98, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index_class_helper.pxi", line 93, in pandas._libs.index.Int64Engine._check_type
KeyError: False

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "pythonProject1111\main.py", line 14, in <module>
    abc = lib_read_from_imap.process_abc(abc)
  File "pythonProject1111\libs\read_from_abc.py", line 178, in process_abc
    df_file_abc = scaexc.fill_scan_exception(df_file_abc)
  File "pythonProject1111\libs\process_scan_exception.py", line 80, in fill_scan_exception
    print(df.loc[isinstance(df['address'], ipa.IPv4Network)])
  File "pythonProject1111\venv\lib\site-packages\pandas\core\indexing.py", line 879, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "pythonProject1111\venv\lib\site-packages\pandas\core\indexing.py", line 1110, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "pythonProject1111\venv\lib\site-packages\pandas\core\indexing.py", line 1059, in _get_label
    return self.obj.xs(label, axis=axis)
  File "pythonProject1111\venv\lib\site-packages\pandas\core\generic.py", line 3491, in xs
    loc = self.index.get_loc(key)
  File "pythonProject1111\venv\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
    raise KeyError(key) from err
KeyError: False

非常感谢


Tags: inpypandasdfgetindexlibline
1条回答
网友
1楼 · 发布于 2024-10-02 12:30:09

正如您所提到的,dtypes如果您有多个类型,那么它就可以工作。以下是您可以做的:

employees = [('jack', 34, 'Sydney', 155),
            ('Riti', 31, 'Delhi', 177.5),
            ('Aadi', 16, 'Mumbai', 81),
            ('Mohit', 31, 45, 167),
            ('Veena', 12, 'Delhi', 'Serge'),
            ('Shaunak', 35, 'Mumbai', 135),
            ('Shaun', 35, 'Colombo', 111)
            ]
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks'])
empDfObj.applymap(type).apply(pd.value_counts).fillna(0)

使用.apply的地方

给你

                 Name  Age  City  Marks
<class 'str'>     7.0  0.0   6.0      1
<class 'int'>     0.0  7.0   1.0      5
<class 'float'>   0.0  0.0   0.0      1

你甚至可以指望他们:-)

相关问题 更多 >

    热门问题