如何在事件中心只接收最近的数据

2024-10-02 16:21:46 发布

您现在位置:Python中文网/ 问答频道 /正文

在eventhub中,我有“sender”和“receiver”脚本用于这两个脚本之间的通信

我面临的问题是,似乎我收到了一个数据集,我昨天发送加上一个我刚刚一起发送。我试图通过时间段或事件数来控制数据量

sender.py的基本代码如下:


CONSUMER_GROUP = "$default"
OFFSET = Offset("-1")
PARTITION = "0"

total = 0
last_sn = -1
last_offset = "-1"
client = EventHubClient(ADDRESS, debug=False, username=USER, password=KEY)
try:
    receiver = client.add_receiver(
        CONSUMER_GROUP, PARTITION, prefetch=0, offset=OFFSET)
    client.run()
    start_time = time.time()
    batch = receiver.receive(timeout=100)

    for event_data in batch[-10:]:
        print("Received: {}".format(event_data.body_as_str(encoding='UTF-8')))
        total += 1

    end_time = time.time()
    client.stop()
    run_time = end_time - start_time
    print("Received {} messages in {} seconds".format(total, run_time))

except KeyboardInterrupt:
    pass
finally:
    client.stop()


Tags: run脚本clienttimeconsumerbatchgroupstart
1条回答
网友
1楼 · 发布于 2024-10-02 16:21:46

我刚刚找到了一个解决方案,它使用偏移量来控制事件数据的读取过程

我们首先要做的是得到事件数据的偏移量

代码如下:

logger = logging.getLogger("azure")

ADDRESS = "amqps://xxx.servicebus.windows.net/xxx"
USER = "RootManageSharedAccessKey"
KEY = "xxx"

CONSUMER_GROUP = "$default"

#first, set offset to -1 to read all the event data
OFFSET = Offset("-1")
PARTITION = "0"

total = 0
last_sn = -1
last_offset = "-1"
client = EventHubClient(ADDRESS, debug=False, username=USER, password=KEY)
try:
    receiver = client.add_receiver(
        CONSUMER_GROUP, PARTITION, prefetch=5000, offset=OFFSET)
    client.run()
    start_time = time.time()
    print("**begin receive**")
    for event_data in receiver.receive(timeout=100):
        last_offset = event_data.offset.value
        last_sn = event_data.sequence_number
        #here, we print out the offset of each event data
        print("Received: {}, last_offset: {}, last_sn: {}".format(event_data.body_as_str(encoding='UTF-8'),last_offset,last_sn))        
        total += 1

    end_time = time.time()
    client.stop()
    run_time = end_time - start_time
    print("Received {} messages in {} seconds".format(total, run_time))

except KeyboardInterrupt:
    pass
finally:
    client.stop()

执行后,可以看到每个数据的所有偏移量,截图如下:

enter image description here

然后,您知道每个事件数据的偏移量。如果你想得到从40到53的数据,40的偏移量是237080,所以在你的代码中,把偏移量改为一个小于237080的值,在这行代码中把它设置为237079

代码如下:

logger = logging.getLogger("azure")

ADDRESS = "amqps://xxx.servicebus.windows.net/xx"
USER = "RootManageSharedAccessKey"
KEY = "xxx"

CONSUMER_GROUP = "$default"

#set the offset
OFFSET = Offset("237079")
PARTITION = "0"

total = 0
last_sn = -1
last_offset = "-1"
client = EventHubClient(ADDRESS, debug=False, username=USER, password=KEY)
try:
    receiver = client.add_receiver(
        CONSUMER_GROUP, PARTITION, prefetch=5000, offset=OFFSET)
    client.run()
    start_time = time.time()
    print("**begin receive**")
    for event_data in receiver.receive(timeout=100):
        last_offset = event_data.offset.value
        last_sn = event_data.sequence_number
        print("Received: {}, last_offset: {}, last_sn: {}".format(event_data.body_as_str(encoding='UTF-8'),last_offset,last_sn))        
        total += 1

    end_time = time.time()
    client.stop()
    run_time = end_time - start_time
    print("Received {} messages in {} seconds".format(total, run_time))

except KeyboardInterrupt:
    pass
finally:
    client.stop()

执行代码后,只返回指定偏移量的事件数据。截图如下:

enter image description here

相关问题 更多 >