如何在石榴中加入先验信息?换句话说:石榴是否支持增量学习?

2024-10-02 06:25:52 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我使用pomegranate对当时可用的数据进行模型拟合。一旦有更多的数据进来,我想相应地更新模型。换句话说,是否可以使用pomegranate用新数据更新现有模型,而不重写以前的参数?我要说清楚的是:我不是指核心外学习,因为我的问题是数据在不同的时间点可用,而不是在单个时间点内存数据太大

以下是我尝试的:

>>> from pomegranate.distributions import BetaDistribution

>>> # suppose a coin generated the following data, where 1 is head and 0 is tail
>>> data1 = [0, 0, 0, 1, 0, 1, 0, 1, 0, 0]

>>> # as usual, we fit a Beta distribution to infer the bias of the coin
>>> model = BetaDistribution(1, 1)
>>> model.summarize(data1)  # compute sufficient statistics

>>> # presume we have seen all the data available so far,
>>> # we can now estimate the parameters
>>> model.from_summaries()

>>> # this results in the following model (so far so good)
>>> model
{
    "class" :"Distribution",
    "name" :"BetaDistribution",
    "parameters" :[
        3.0,
        7.0
    ],
    "frozen" :false
}

>>> # now suppose the coin is flipped a few more times, getting the following data
>>> data2 = [0, 1, 0, 0, 1]

>>> # we would like to update the model parameters accordingly
>>> model.summarize(data2)

>>> # but this fits only data2, overriding the previous parameters
>>> model.from_summaries()
>>> model
{
    "class" :"Distribution",
    "name" :"BetaDistribution",
    "parameters" :[
        2.0,
        3.0
    ],
    "frozen" :false
}


>>> # however I want to get the result that corresponds to the following,
>>> # but ideally without having to "drag along" data1
>>> data3 = data1 + data2
>>> model.fit(data3)
>>> model  # this should be the final model
{
    "class" :"Distribution",
    "name" :"BetaDistribution",
    "parameters" :[
        5.0,
        10.0
    ],
    "frozen" :false
}

编辑:

问这个问题的另一种方式是:pomegranate支持增量还是在线学习?基本上,我在寻找类似于scikit-learnpartial_fit()的东西,你可以找到here

考虑到pomegranate支持out-of-core learning,我觉得我忽略了一些东西。有什么帮助吗


Tags: theto数据from模型datamodelwe
1条回答
网友
1楼 · 发布于 2024-10-02 06:25:52

问题实际上是from_summaries。在Beta分布的情况下:self.summaries = [0, 0]。所有的from_summaries方法都是破坏性的。它们用分布中的参数替换摘要。总结可以随时更新,以获得更多的观察结果,但不能更新参数

我认为这是个糟糕的设计。最好将它们视为观察值的累加器,将参数视为派生的缓存值

如果您这样做:

model = BetaDistribution(1, 1)
model.summarize(data1)
model.summarize(data2)
model.from_summaries()
model

您会发现,它确实产生了与使用model.summarize(data1 + data2)相同的结果

相关问题 更多 >

    热门问题