R中100k*100k矩阵的距离矩阵

# declare an empty data frame and append data to it matchedStr_vecA <- data.frame(row_index = integer(), col_index = integer(), vecA_i = character(), vecA_j = character(), dist_diff_vecA = double(), stringsAsFactors=FALSE) k = 1 # (keeps track of the pointer to the data frame) # Run 2 different loops to calculate the bottom half of the matrix (below the diagonal - # as the diagonal elements will be zero and the upper half is the mirror image of the bottom half) for (i in 1:length(vecA)) { for (j in 1:length(vecA)) { if (i < j) { dist_diff_vecA <- stringdist(vecA[i], vecA[j], method = "lv") matchedStr_invId[k,] <- c(i, j, vecA[i], vecA[j], dist_diff_vecA) k <- k + 1 } } }

1条回答

网友

1楼 · 发布于 2024-10-03 06:25:43

我在计算距离矩阵时遇到了同样的问题，我用Python成功地解决了这个问题。这个问题讨论了解决方案的关键要素，以确保在线程之间平均分配计算： How to split diagonal matrix into equal number of items each along one of axis?

有两点需要指出：

两点之间的距离通常是对称的，因此可以重用此数学特征并计算i和j元素之间的距离一次，然后将其存储或重用为j和i之间的距离。
除非你对不精确的结果满意，否则算法不能在O（n^2）以下优化。既然你是编程新手，我甚至不会考虑这样做。
您应该能够使用索引拆分来并行化计算，正如我在上面的问题中所建议的那样，以获得接近最优的解决方案。

相关问题更多 >

编程相关推荐

热门问题

热门文章