使用apriori或其他方法从序列中提取trio

2024-09-24 19:57:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用以下代码

activity = pd.read_excel('data/Activity-Patterns-2.xlsx')
activity = activity.join(activity['Activity Sequence'].str.split('---->', expand=True))
activity = activity[['Station1', 'Station2', 'Station3', 'Repetition']]

Station1    Station2    Station3    Repetition
0   Singapore Changi Airport (SIN)  Terminal 1  Universal Studios Singapore 52
1   Gardens by the Bay  Flower Dome Cloud Forest    52
2   Marina Bay Sands    Singapore Changi Airport (SIN)  Gardens by the Bay  52
3   Singapore   Singapore Changi Airport (SIN)  Gardens by the Bay  51
4   Universal Studios Singapore Singapore Changi Airport (SIN)  Marina Bay Sands    51

dataset = activity[['Station1', 'Station2', 'Station3']]
## Convert the dataset to a numpy array (list of lists of travel records)
dataset = np.array(dataset)
from mlxtend.preprocessing import OnehotTransactions

oht = OnehotTransactions()
oht_ary = oht.fit(dataset).transform(dataset)
df = pd.DataFrame(oht_ary, columns=oht.columns_)
from mlxtend.frequent_patterns import apriori, association_rules

frequent_itemsets_apriori = apriori(df, min_support=0.0002, use_colnames=True, max_len = 5)

这允许我根据定义的最小支持度提取一些顶层序列,但我不能生成所有可能的三元组。你知道吗

我的意思是,我只需要基于支持等的模式A>;B>;C

这是我的数据集: https://drive.google.com/file/d/1eO_BUXW82zR6nojeDKyrH77Ys3UXuQ14/view?usp=sharing


Tags: thebysinactivitydatasetbayairportapriori