存储和检索大量小型非结构化消息的最快方法

import json message = { 'meta1': "measurement", 'location': "NYC", 'time': "20200101", 'value1': 1.0, 'value2': 2.0, 'value3': 3.0, 'value4': 4.0 } json_message = json.dumps(message) %%timeit json.loads(json_message)

import msgpack message = { 'meta1': "measurement", 'location': "NYC", 'time': "20200101", 'value1': 1.0, 'value2': 2.0, 'value3': 3.0, 'value4': 4.0 } msgpack_message = msgpack.packb(message) %%timeit msgpack.unpackb(msgpack_message)

#include <iostream> #include "json.hpp" using json = nlohmann::json; const std::string message = "{\"value\": \"hello\"}"; int main() { auto jsonMessage = json::parse(message); for(size_t i=0; i<1000000; ++i) { jsonMessage = json::parse(message); } std::cout << jsonMessage["value"] << std::endl; // To avoid having the compiler optimize the loop away. };

1条回答

网友

1楼 · 发布于 2024-09-29 23:27:50

我假设消息只包含几个基本类型的命名属性（在运行时定义），这些基本类型是例如字符串、整数和浮点数

为了快速实施，最好：

避免文本解析（速度慢，因为顺序性强且充满条件）
避免检查消息是否格式错误（此处不需要，因为它们都应该格式正确）
尽量避免拨款
处理消息块

因此，我们首先需要设计一个简单而快速的二进制消息协议：

二进制消息包含其属性数（按1字节编码），后跟属性列表。每个属性都包含一个以其大小（编码为1字节）为前缀的字符串，后跟属性类型（std:：variant中类型的索引，编码为1字节）以及属性值（大小前缀字符串、64位整数或64位浮点数）

每个编码的消息都是一个字节流，可以放入一个大的缓冲区（分配一次并重新用于多个传入消息）

以下是从原始二进制缓冲区解码消息的代码：

#include <unordered_map>
#include <variant>
#include <climits>

// Define the possible types here
using AttrType = std::variant<std::string_view, int64_t, double>;

// Decode the `msgData` buffer and write the decoded message into `result`.
// Assume the message is not ill-formed!
// msgData must not be freed or modified while the resulting map is being used.
void decode(const char* msgData, std::unordered_map<std::string_view, AttrType>& result)
{
    static_assert(CHAR_BIT == 8);

    const size_t attrCount = msgData[0];
    size_t cur = 1;

    result.clear();

    for(size_t i=0 ; i<attrCount ; ++i)
    {
        const size_t keyLen = msgData[cur];
        std::string_view key(msgData+cur+1, keyLen);
        cur += 1 + keyLen;
        const size_t attrType = msgData[cur];
        cur++;

        // A switch could be better if there is more types
        if(attrType == 0) // std::string_view
        {
            const size_t valueLen = msgData[cur];
            std::string_view value(msgData+cur+1, valueLen);
            cur += 1 + valueLen;

            result[key] = std::move(AttrType(value));
        }
        else if(attrType == 1) // Native-endian 64-bit integer
        {
            int64_t value;

            // Required to not break the strict aliasing rule
            std::memcpy(&value, msgData+cur, sizeof(int64_t));
            cur += sizeof(int64_t);

            result[key] = std::move(AttrType(value));
        }
        else // IEEE-754 double
        {
            double value;

            // Required to not break the strict aliasing rule
            std::memcpy(&value, msgData+cur, sizeof(double));
            cur += sizeof(double);

            result[key] = std::move(AttrType(value));
        }
    }
}

您可能也需要编写编码函数（基于相同的想法）

下面是一个用法示例（基于json相关代码）：

const char* message = "\x01\x05value\x00\x05hello";

void bench()
{
    std::unordered_map<std::string_view, AttrType> decodedMsg;
    decodedMsg.reserve(16);

    decode(message, decodedMsg);

    for(size_t i=0; i<1000*1000; ++i)
    {
        decode(message, decodedMsg);
    }

    visit([](const auto& v) { cout << "Result: " << v << endl; }, decodedMsg["value"]);
}

在我的机器上（使用Intel i7-9700KF处理器），基于您的基准测试，我使用nlohmann json库获得270万条消息/秒的代码，使用新代码获得35.4万条消息/秒的代码

请注意，此代码可以快得多。事实上，大部分时间都花在高效的哈希和分配上。您可以通过使用更快的哈希映射实现（例如boost:：container:：flat_映射或ska:：bytell_hash_映射）和/或使用自定义分配器来缓解此问题。另一种方法是构建自己的经过仔细调优的哈希映射实现。另一种选择是使用键值对向量并使用线性搜索来执行查找（这应该很快，因为您的消息不应该有很多属性，而且您说您需要每个消息的一小部分属性）。但是，消息越大，解码速度越慢。因此，您可能需要利用并行性来更快地解码消息块。所有这些，都有可能达到超过100米的消息/秒

相关问题更多 >

编程相关推荐

热门问题

热门文章