为什么我用python sklearn从看似非随机的代码中得到随机结果?

2024-06-25 22:38:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我根据回答更新了问题。

我有一个名为“str\u tuple”的字符串列表。我想计算列表中第一个元素和其余元素之间的相似性度量。我运行以下六行代码片段。你知道吗

让我完全困惑的是,每次运行代码时,结果似乎都是完全随机的。然而,我看不到任何随机引入我的六行。你知道吗

更新:

指出TruncatedSVD()有一个“random\u state”参数。指定“随机状态”将给出固定的结果(这是完全正确的)。但是,如果您更改“随机\u状态”,结果将更改。但对于其他字符串(例如str2),无论如何更改“random\u state”,结果都是相同的。事实上,这些弦来自家得宝卡格比赛。我有一个pd系列包含数千个这样的字符串,它们中的大多数会给出类似str2的非随机结果(不管设置了什么“random\u state”)。由于一些未知的原因,str1是一个例子,每次你改变“random\u state”时都会给出随机结果。我开始想也许str1的一些内在特征会起作用。你知道吗

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import TruncatedSVD
from sklearn.preprocessing import Normalizer

# str1 yields random results
str1 = [u'l bracket', u'simpson strong tie 12 gaug angl', u'angl make joint stronger provid consist straight corner simpson strong tie offer wide varieti angl various size thick handl light duti job project structur connect need bent skew match project outdoor project moistur present use zmax zinc coat connector provid extra resist corros look "z" end model number .versatil connector various 90 connect home repair projectsstrong angl nail screw fasten alonehelp ensur joint consist straight strongdimensions: 3 in. xbi 3 in. xbi 1 0.5 in. made 12 gaug steelgalvan extra corros resistanceinstal 10 d common nail 9 xbi 1 0.5 in. strong drive sd screw', u'simpson strong-tie', u'', u'versatile connector for various 90\xe2\xb0 connections and home repair projects stronger than angled nailing or screw fastening alone help ensure joints are consistently straight and strong dimensions: 3 in. x 3 in. x 1-1/2 in. made from 12-gauge steel galvanized for extra corrosion resistance install with 10d common nails or #9 x 1-1/2 in. strong-drive sd screws']
# str2 yields non-random result     
str2 = [u'angl bracket', u'simpson strong tie 12 gaug angl', u'angl make joint stronger provid consist straight corner simpson strong tie offer wide varieti angl various size thick handl light duti job project structur connect need bent skew match project outdoor project moistur present use zmax zinc coat connector provid extra resist corros look "z" end model number .versatil connector various 90 connect home repair projectsstrong angl nail screw fasten alonehelp ensur joint consist straight strongdimensions: 3 in. xbi 3 in. xbi 1 0.5 in. made 12 gaug steelgalvan extra corros resistanceinstal 10 d common nail 9 xbi 1 0.5 in. strong drive sd screw', u'simpson strong-tie', u'', u'versatile connector for various 90\xe2\xb0 connections and home repair projects stronger than angled nailing or screw fastening alone help ensure joints are consistently straight and strong dimensions: 3 in. x 3 in. x 1-1/2 in. made from 12-gauge steel galvanized for extra corrosion resistance install with 10d common nails or #9 x 1-1/2 in. strong-drive sd screws']   

vectorizer = CountVectorizer(token_pattern=r"\d+\.\d+|\d+\/\d+|\b\w+\b")
# replacing str1 with str2 gives non-ramdom result regardless of random_state
cmat = vectorizer.fit_transform(str1).astype(float)    # sparse matrix
cmat = TruncatedSVD(2).fit_transform(cmat)    # dense numpy array
cmat = Normalizer().fit_transform(cmat)    # dense numpy array
sim = np.dot(cmat, cmat.T)
sim[0,1:].tolist()

Tags: inprojectconnectorrandomextrastrongvariousstraight
1条回答
网友
1楼 · 发布于 2024-06-25 22:38:16

默认情况下,Truncated SVD遵循随机算法。因此,必须指定要设置为numpy.random.seed值的RandomState值。你知道吗

cmat = TruncatedSVD(n_components=2, random_state=42).fit_transform(cmat)

Docs

class sklearn.decomposition.TruncatedSVD(n_components=2, algorithm='randomized', n_iter=5, random_state=None, tol=0.0)


为了使它产生非随机输出,列表的起始元素必须出现多次。也就是说,如果str1的起始元素是anglversatilesimpson,那么它将给出非随机结果。因为str2在列表的开头至少重复了angl多次,所以它不会返回随机输出。你知道吗

因此,随机性是对给定列表中元素出现的不同程度的度量。在这些情况下,指定RandomState将有助于生成唯一的输出。
[感谢@wen指出这一点]

相关问题 更多 >