构建sklearn文本分类器并用coremltools进行转换

1条回答

网友

1楼 · 发布于 2024-10-03 02:39:54

现在，如果要将tf-idf矢量器转换为.mlmodel格式，则不能在管道中包含tf-idf矢量器。解决这一问题的方法是分别对数据进行矢量化，然后用矢量化的数据训练模型（线性SVC、Random Forest…）。然后需要计算设备上的tf-idf表示，然后将其插入模型中。这是我写的tf-idf函数的副本。在

func tfidf(document: String) -> MLMultiArray{
    let wordsFile = Bundle.main.path(forResource: "words_ordered", ofType: "txt")
    let dataFile = Bundle.main.path(forResource: "data", ofType: "txt")
    do {
        let wordsFileText = try String(contentsOfFile: wordsFile!, encoding: String.Encoding.utf8)
        var wordsData = wordsFileText.components(separatedBy: .newlines)
        let dataFileText = try String(contentsOfFile: dataFile!, encoding: String.Encoding.utf8)
        var data = dataFileText.components(separatedBy: .newlines)
        let wordsInMessage = document.split(separator: " ")
        var vectorized = try MLMultiArray(shape: [NSNumber(integerLiteral: wordsData.count)], dataType: MLMultiArrayDataType.double)
        for i in 0..<wordsData.count{
            let word = wordsData[i]
            if document.contains(word){
                var wordCount = 0
                for substr in wordsInMessage{
                    if substr.elementsEqual(word){
                        wordCount += 1
                    }
                }
                let tf = Double(wordCount) / Double(wordsInMessage.count)
                var docCount = 0
                for line in data{
                    if line.contains(word) {
                        docCount += 1
                    }
                }
                let idf = log(Double(data.count) / Double(docCount))
                vectorized[i] = NSNumber(value: tf * idf)
            } else {
                vectorized[i] = 0.0
            }
        }
        return vectorized
    } catch {
        return MLMultiArray()
    }
}

编辑：在http://gokulswamy.me/imessage-spam-detection/上写了一篇关于如何做到这一点的文章。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

构建sklearn文本分类器并用coremltools进行转换

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >