问:ValueError:未知类型对象python是什么意思?

2024-09-22 14:28:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在做一些测试,以检查我的采样算法中的一些选项是否能更好地更改其值

当我在做这些测试时(直到现在还没有任何问题),我试图运行更多的测试以获得更多的结果,但我得到了以下错误:

ValueError                                Traceback (most recent call last)
<ipython-input-65-41ecc2b0a0ed> in <module>
     22 for g in range(0,10000):
     23         # sample
---> 24         sample_df = stratified_sample(df,test,size=38, keep_index=False)
     25         pathaux = "C://Users//Pedro//Desktop//EscolhasAlgoritmos//Stratified//Stratified_Tests//"
     26         example = "exampleFCUL"

<ipython-input-10-7aba847839db> in stratified_sample(df, strata, size, seed, keep_index)
     79         # final dataframe
     80         if first:
---> 81             stratified_df = df.query(qry).sample(n=n, random_state=seed).reset_index(drop=(not keep_index))
     82             first = False
     83         else:

D:\Anaconda\lib\site-packages\pandas\core\frame.py in query(self, expr, inplace, **kwargs)
   3182         kwargs["level"] = kwargs.pop("level", 0) + 1
   3183         kwargs["target"] = None
-> 3184         res = self.eval(expr, **kwargs)
   3185 
   3186         try:

D:\Anaconda\lib\site-packages\pandas\core\frame.py in eval(self, expr, inplace, **kwargs)
   3298             kwargs["target"] = self
   3299         kwargs["resolvers"] = kwargs.get("resolvers", ()) + tuple(resolvers)
-> 3300         return _eval(expr, inplace=inplace, **kwargs)
   3301 
   3302     def select_dtypes(self, include=None, exclude=None):

D:\Anaconda\lib\site-packages\pandas\core\computation\eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
    325         eng = _engines[engine]
    326         eng_inst = eng(parsed_expr)
--> 327         ret = eng_inst.evaluate()
    328 
    329         if parsed_expr.assigner is None:

D:\Anaconda\lib\site-packages\pandas\core\computation\engines.py in evaluate(self)
     68 
     69         # make sure no names in resolvers and locals/globals clash
---> 70         res = self._evaluate()
     71         return _reconstruct_object(
     72             self.result_type, res, self.aligned_axes, self.expr.terms.return_type

D:\Anaconda\lib\site-packages\pandas\core\computation\engines.py in _evaluate(self)
    117             truediv = scope["truediv"]
    118             _check_ne_builtin_clash(self.expr)
--> 119             return ne.evaluate(s, local_dict=scope, truediv=truediv)
    120         except KeyError as e:
    121             # python 3 compat kludge

D:\Anaconda\lib\site-packages\numexpr\necompiler.py in evaluate(ex, local_dict, global_dict, out, order, casting, **kwargs)
    820     # Create a signature
    821     signature = [(name, getType(arg)) for (name, arg) in
--> 822                  zip(names, arguments)]
    823 
    824     # Look up numexpr if possible.

D:\Anaconda\lib\site-packages\numexpr\necompiler.py in <listcomp>(.0)
    819 
    820     # Create a signature
--> 821     signature = [(name, getType(arg)) for (name, arg) in
    822                  zip(names, arguments)]
    823 

D:\Anaconda\lib\site-packages\numexpr\necompiler.py in getType(a)
    701     if kind == 'S':
    702         return bytes
--> 703     raise ValueError("unknown type %s" % a.dtype.name)
    704 
    705 

ValueError: unknown type object`

当我试图在stackoverflow和google中理解/搜索我的代码(如下所示)中可以做些什么来修复这个问题时,我找不到关于这个错误的任何信息,也找不到解决方法。 是的,我知道对于python来说,它是一个未知类型,但它是一个我正在输入的列表,就像一直以来的情况一样,它在没有任何故障的情况下运行过列表,因此我无法理解显示的错误

代码:

def transform(multilevelDict):
    return {"t"+'_'+str(key) : (transform(value) if isinstance(value, dict) else value) for key, value in multilevelDict.items()}
df = pd.read_csv('testingwebsitedata6.csv', sep=';')
df['Element_Count'] =  df['Element_Count'].apply((json.loads))
df['Tag_Count'] =  df['Tag_Count'].apply((json.loads))
for i in range(len(df['Tag_Count'])):
    df['Tag_Count'][i] = transform(df['Tag_Count'][i])
df1 = pd.DataFrame(df['Element_Count'].values.tolist())
df2 = pd.DataFrame(df['Tag_Count'].values.tolist())
df = pd.concat([df.drop('Element_Count', axis=1), df1], axis=1)
df= pd.concat([df.drop('Tag_Count', axis=1), df2], axis=1)
df= df.fillna(0)
df[df.select_dtypes(include=['float64']).columns]= df.select_dtypes(include=['float64']).astype(int)
df


test= ['link', 'document', 'heading', 'form', 'textbox', 'button', 'list', 'listitem', 'img', 'navigation', 'banner', 'main', 'article', 'contentinfo', 'checkbox', 'table', 'rowgroup', 'row', 'cell', 'listbox', 'presentation', 'figure', 'columnheader', 'separator', 'group', 'region']

print('test1')
print('\n')
for g in range(0,10000):
        # sample
        sample_df = stratified_sample(df,test,size=38, keep_index=False)
        pathaux = "C://Users//Pedro//Desktop//EscolhasAlgoritmos//Stratified//Stratified_Tests//"
        example = "exampleFCUL"
        randomnumber = g+1
        csv = ".csv"
        path = pathaux + '26'+'//'+ example +str(randomnumber) + csv
        chosencolumns= ["Uri"]
        sample_df.to_csv(path,sep=';', index = False, columns =chosencolumns, header = False)   

使用分层采样功能:

def stratified_sample(df, strata, size=None, seed=None, keep_index= True):
    '''
    It samples data from a pandas dataframe using strata. These functions use
    proportionate stratification:
    n1 = (N1/N) * n
    where:
        - n1 is the sample size of stratum 1
        - N1 is the population size of stratum 1
        - N is the total population size
        - n is the sampling size
    Parameters
    ----------
    :df: pandas dataframe from which data will be sampled.
    :strata: list containing columns that will be used in the stratified sampling.
    :size: sampling size. If not informed, a sampling size will be calculated
        using Cochran adjusted sampling formula:
        cochran_n = (Z**2 * p * q) /e**2
        where:
            - Z is the z-value. In this case we use 1.96 representing 95%
            - p is the estimated proportion of the population which has an
                attribute. In this case we use 0.5
            - q is 1-p
            - e is the margin of error
        This formula is adjusted as follows:
        adjusted_cochran = cochran_n / 1+((cochran_n -1)/N)
        where:
            - cochran_n = result of the previous formula
            - N is the population size
    :seed: sampling seed
    :keep_index: if True, it keeps a column with the original population index indicator
    
    Returns
    -------
    A sampled pandas dataframe based in a set of strata.
    Examples
    --------
    >> df.head()
        id  sex age city 
    0   123 M   20  XYZ
    1   456 M   25  XYZ
    2   789 M   21  YZX
    3   987 F   40  ZXY
    4   654 M   45  ZXY
    ...
    # This returns a sample stratified by sex and city containing 30% of the size of
    # the original data
    >> stratified = stratified_sample(df=df, strata=['sex', 'city'], size=0.3)
    Requirements
    ------------
    - pandas
    - numpy
    '''
    population = len(df)
    size = __smpl_size(population, size)
    tmp = df[strata]
    tmp['size'] = 1
    tmp_grpd = tmp.groupby(strata).count().reset_index()
    tmp_grpd['samp_size'] = round(size/population * tmp_grpd['size']).astype(int)

    # controlling variable to create the dataframe or append to it
    first = True 
    for i in range(len(tmp_grpd)):
        # query generator for each iteration
        qry=''
        for s in range(len(strata)):
            stratum = strata[s]
            value = tmp_grpd.iloc[i][stratum]
            n = tmp_grpd.iloc[i]['samp_size']

            if type(value) == str:
                value = "'" + str(value) + "'"
            
            if s != len(strata)-1:
                qry = qry + stratum + ' == ' + str(value) +' & '
            else:
                qry = qry + stratum + ' == ' + str(value)
        
        # final dataframe
        if first:
            stratified_df = df.query(qry).sample(n=n, random_state=seed).reset_index(drop=(not keep_index))
            first = False
        else:
            tmp_df = df.query(qry).sample(n=n, random_state=seed).reset_index(drop=(not keep_index))
            stratified_df = stratified_df.append(tmp_df, ignore_index=True)
    
    return stratified_df



def stratified_sample_report(df, strata, size=None):
    '''
    Generates a dataframe reporting the counts in each stratum and the counts
    for the final sampled dataframe.
    Parameters
    ----------
    :df: pandas dataframe from which data will be sampled.
    :strata: list containing columns that will be used in the stratified sampling.
    :size: sampling size. If not informed, a sampling size will be calculated
        using Cochran adjusted sampling formula:
        cochran_n = (Z**2 * p * q) /e**2
        where:
            - Z is the z-value. In this case we use 1.96 representing 95%
            - p is the estimated proportion of the population which has an
                attribute. In this case we use 0.5
            - q is 1-p
            - e is the margin of error
        This formula is adjusted as follows:
        adjusted_cochran = cochran_n / 1+((cochran_n -1)/N)
        where:
            - cochran_n = result of the previous formula
            - N is the population size
    Returns
    -------
    A dataframe reporting the counts in each stratum and the counts
    for the final sampled dataframe.
    '''
    population = len(df)
    size = __smpl_size(population, size)
    tmp = df[strata]
    tmp['size'] = 1
    tmp_grpd = tmp.groupby(strata).count().reset_index()
    tmp_grpd['samp_size'] = round(size/population * tmp_grpd['size']).astype(int)
    return tmp_grpd


def __smpl_size(population, size):
    '''
    A function to compute the sample size. If not informed, a sampling 
    size will be calculated using Cochran adjusted sampling formula:
        cochran_n = (Z**2 * p * q) /e**2
        where:
            - Z is the z-value. In this case we use 1.96 representing 95%
            - p is the estimated proportion of the population which has an
                attribute. In this case we use 0.5
            - q is 1-p
            - e is the margin of error
        This formula is adjusted as follows:
        adjusted_cochran = cochran_n / 1+((cochran_n -1)/N)
        where:
            - cochran_n = result of the previous formula
            - N is the population size
    Parameters

    ----------
        :population: population size
        :size: sample size (default = None)
    Returns
    -------
    Calculated sample size to be used in the functions:
        - stratified_sample
        - stratified_sample_report
    '''
    if size is None:
        cochran_n = round(((1.96)**2 * 0.5 * 0.5)/ 0.02**2)
        n = round(cochran_n/(1+((cochran_n -1) /population)))
    elif size >= 0 and size < 1:
        n = round(population * size)
    elif size < 0:
        raise ValueError('Parameter "size" must be an integer or a proportion between 0 and 0.99.')
    elif size >= 1:
        n = size
    return n 

编辑

在做这个问题之前,我检查了所有变量的所有类型,它们与我之前的测试相同

有什么建议吗

(任何我忘记提及的,您认为对理解问题很重要的内容,请说出来,我将在中编辑)


Tags: ofthesampleinselfdfsizeindex