确定原因的问题:ValueError:规格化后无穷大或太大

2024-09-29 21:39:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个先标准化的数据集,去掉na,现在,我尝试df[col]=预处理.scale(df[col].values)这里我得到一个错误:ValueError:Input包含无穷大或者一个对于dtype('float64')来说太大的值。你知道吗

以下是我所做的步骤:

1-通过删除NAN确保数据表(pandas)没有NAN 2-使用pct\u change规范化值 3-呼叫pct\ U change后立即删除na

然后尝试尺度函数,得到误差

以下是代码片段:

从主呼叫:

dataset = f"./Data/Original/{RATIO_TO_PREDICT}.csv" df = pd.read_csv(dataset) df.set_index("Timestamp", inplace = True) #calculate volume candle type 1 #calculate volume candle type 2 #df['VC1_Future'] = df["VC1"].shift(-FUTURE_PERIOD_PREDICT) #df['VC1_Target'] = list(map(classify,df["VC1"], df["VC1_Future"])) #df['VC2_Future'] = df["VC2"].shift(-FUTURE_PERIOD_PREDICT) #df['VC2_Target'] = list(map(classify,df["VC2"], df["VC2_Future"])) df.fillna(method="ffill", inplace = True) df.dropna(inplace=True) df['Price_Future'] = df["Close"].shift(-FUTURE_PERIOD_PREDICT) # We go N number of time to the future, get that value and put it in this row's FUTURE PRICE value df['Price_Target'] = list(map(classify,df["Close"], df["Price_Future"])) # Now we compare the current price with that future price to see if we went up, down or none, here we use the 0.015 or 1.5% spread to make sure we pass commision # Now we want to separate part of the data for training and another part for testing times = sorted(df.index.values) last_5pct = times[-int(0.1 * len(times))] # We get the final columns we want, making sure we are not including any of the High, Low, and Open values. Remember that Price Target is last. That is OUR GOAL !!! #dfs = df[["Close", "Volume", "Price_Future", "Price_Target"]]#, "VC1", "VC2", "VC1_Future", "VC2_Future", "VC1_Target", "VC2_Target", "Price_Future", "Price_Target"]] # We finally separate the data into two different lists validation_df = df[(df.index >= last_5pct)] training_df = df[(df.index < last_5pct)] # We save each list into a file so that we don't need to make this process walk through again unless A) we get new data B) we loose previous data on hard drive Message(name) print(len(df), len(training_df), len(validation_df)) Message(len(df)) #training_df.dropna(inplace=True) print(np.isfinite(training_df).all()) print('') #validation_df.dropna(inplace=True) print(np.isfinite(validation_df).all()) Train_X, Train_Y = preprocess(training_df)

现在,说到函数,这里是一个开始:

def preprocess(df) : df.drop('Price_Future', 1) #df.drop('VC1_Future', 1) #df.drop('VC2_Future', 1) for col in df.columns: if col != "Price_Target" and col != "VC1_Target" and col != "VC2_Target": df[col] = df[col].pct_change() # gets the percent change, other than the volume, the data now should sit between -1 and 1, the formula : (value[i] / value[i-1]) - 1 df.dropna(inplace=True) df[col] = preprocessing.scale(df[col].values)

你可能注意到了,当我打电话给总管的时候,我正在检查nan,结果是:

Open True High True Low True Close True Volume True Price_Future False Price_Target True dtype: bool

在函数的开头,我去掉了Price\u Future列,那么,为什么在缩放线处会出现这个错误呢?你知道吗

此外,上述代码还会导致许多警告:

试图在数据帧的切片副本上设置值。 尝试改用.loc[row\u indexer,col\u indexer]=value

但是我对python和所有这些东西都是新手,所以我不知道如何修复函数的代码。你知道吗

有人请帮忙。你知道吗

谢谢


Tags: andthetotruetargetdfvaluetraining
1条回答
网友
1楼 · 发布于 2024-09-29 21:39:27

哎哟,找到主要问题了

df[列]=预处理.scale(df[col].值)

是错的

df[列]=预处理.scale(df[列])

注意scale调用中缺少.value!!!你知道吗

但是请有人帮我处理那些警告信息。你知道吗

相关问题 更多 >

    热门问题