奇怪的核型行为

start_time = time.time() lstNumberCounts = [] lstIllFormed = [] dfClicks = pd.read_csv('Oct3_distinct_Members.csv') dfClicks['UNIV_MBR_ID'] = dfClicks['UNIV_MBR_ID'].str.split('-').str[0] dfClicks['UNIV_MBR_ID'] = dfClicks['UNIV_MBR_ID'].apply(pd.to_numeric,errors='ignore') for item in dfClicks['UNIV_MBR_ID']: if (np.issubdtype(item,np.integer)): lstNumberCounts.append(math.floor(math.log10(item))+1) else: lstIllFormed.append(item) print("---Processing Time: %s seconds ---" % (time.time() - start_time))

1条回答

网友

1楼 · 发布于 2024-10-01 04:44:52

行pd.to_numeric,errors='ignore'returns either a numeric value or the input。所以对于“ga99266e”，它返回“ga99266e”，这是一个字符串。如果您输入numpys issubdtype一个字符串，it checks if the string is the name of a dtype。（例如。np.ISUBD类型（'int'，int）返回True）。你知道吗

因此，您需要首先检查字段是否仍然是字符串，如果不是，则可以检查它是否是numpy整数。你知道吗

尝试：

import pandas as pd 
import numpy as np 
import time 
import math
start_time = time.time()
lstNumberCounts = []
lstIllFormed = []

dfClicks = pd.read_csv('Oct3_distinct_Members.csv')
dfClicks['UNIV_MBR_ID'] = dfClicks['UNIV_MBR_ID'].str.split('-').str[0]
dfClicks['UNIV_MBR_ID'] = dfClicks['UNIV_MBR_ID'].apply(pd.to_numeric,errors='ignore')

for item in dfClicks['UNIV_MBR_ID']:
    if not (isinstance(item,str)):
        if (np.issubdtype(item,np.integer)):
            lstNumberCounts.append(math.floor(math.log10(item))+1)
    else:
        lstIllFormed.append(item)


print(" -Processing Time: %s seconds  -" % (time.time() - start_time))

“a123456”或任何以“a”开头的字符串与np.issubdtype一起工作，因为numpy将其解释为一个代码，告诉它下面的数字是什么类型的数字。See:

Array-protocol type strings (see The Array Interface)
The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised. The supported kinds are
'?' boolean
'b' (signed) byte
'B' unsigned byte
'i' (signed) integer
'u' unsigned integer
'f' floating-point
'c' complex-floating point
'm' timedelta
'M' datetime
'O' (Python) objects
'S', 'a' zero-terminated bytes (not recommended)
'U' Unicode string
'V' raw data (void)

相关问题更多 >

编程相关推荐

热门问题

热门文章