我正在使用以下代码
activity = pd.read_excel('data/Activity-Patterns-2.xlsx')
activity = activity.join(activity['Activity Sequence'].str.split('---->', expand=True))
activity = activity[['Station1', 'Station2', 'Station3', 'Repetition']]
Station1 Station2 Station3 Repetition
0 Singapore Changi Airport (SIN) Terminal 1 Universal Studios Singapore 52
1 Gardens by the Bay Flower Dome Cloud Forest 52
2 Marina Bay Sands Singapore Changi Airport (SIN) Gardens by the Bay 52
3 Singapore Singapore Changi Airport (SIN) Gardens by the Bay 51
4 Universal Studios Singapore Singapore Changi Airport (SIN) Marina Bay Sands 51
dataset = activity[['Station1', 'Station2', 'Station3']]
## Convert the dataset to a numpy array (list of lists of travel records)
dataset = np.array(dataset)
from mlxtend.preprocessing import OnehotTransactions
oht = OnehotTransactions()
oht_ary = oht.fit(dataset).transform(dataset)
df = pd.DataFrame(oht_ary, columns=oht.columns_)
from mlxtend.frequent_patterns import apriori, association_rules
frequent_itemsets_apriori = apriori(df, min_support=0.0002, use_colnames=True, max_len = 5)
这允许我根据定义的最小支持度提取一些顶层序列,但我不能生成所有可能的三元组。你知道吗
我的意思是,我只需要基于支持等的模式A>;B>;C
这是我的数据集: https://drive.google.com/file/d/1eO_BUXW82zR6nojeDKyrH77Ys3UXuQ14/view?usp=sharing
目前没有回答
相关问题 更多 >
编程相关推荐