如何计算符号/字节并制作直方图

2条回答

网友

1楼 · 编辑于 2024-10-02 16:30:00

您可以对文件内容使用[Python 3.Docs]: class collections.Counter([iterable-or-mapping])：

>>> import collections
>>>
>>> file_name = r"C:\Windows\comsetup.log"
>>>
>>> with open(file_name, "rb") as fin:
...     text = fin.read()
...
>>> len(text)
771
>>>
>>> text
b'COM+[12:31:53]: ********************************************************************************\r\nCOM+[12:31:53]: Setup started - [DATE:12,24,2019 TIME: 12:31 pm]\r\nCOM+[12:31:53]: ********************************************************************************\r\nCOM+[12:31:53]: Start CComMig::Discover\r\nCOM+[12:31:53]: Return XML stream: <migXml xmlns=""><rules context="system"><include><objectSet></objectSet></include></rules></migXml>\r\nCOM+[12:31:53]: End CComMig::Discover - Return 0x00000000\r\nCOM+[12:31:56]: ********************************************************************************\r\nCOM+[12:31:56]: Setup (COMMIG) finished - [DATE:12,24,2019 TIME: 12:31 pm]\r\nCOM+[12:31:56]: ********************************************************************************\r\n'
>>>
>>> hist = collections.Counter(text)
>>>
>>> hist
Counter({42: 320, 58: 38, 32: 32, 49: 26, 101: 19, 50: 17, 51: 17, 77: 16, 116: 16, 67: 14, 91: 11, 93: 11, 48: 11, 109: 11, 79: 10, 115: 10, 105: 10, 43: 9, 53: 9, 13: 9, 10: 9, 114: 9, 117: 8, 110: 8, 60: 8, 62: 8, 111: 7, 99: 7, 108: 7, 83: 5, 100: 5, 69: 5, 112: 4, 68: 4, 84: 4, 44: 4, 103: 4, 34: 4, 47: 4, 97: 3, 45: 3, 73: 3, 88: 3, 120: 3, 54: 3, 65: 2, 52: 2, 57: 2, 118: 2, 82: 2, 61: 2, 98: 2, 106: 2, 76: 1, 121: 1, 40: 1, 71: 1, 41: 1, 102: 1, 104: 1})
>>>
>>> chr(42).encode()  # For testing purposes only
b'*'
>>>
>>> text.count(b"*")
320

hist是一种映射，其中每个键都是文本中遇到的字节（[0..255]），对应的值是其出现次数

网友

2楼 · 编辑于 2024-10-02 16:30:00

试试这个：

import sys

import requests 
from io import StringIO

import seaborn as sns # for data visualization
sns.set()

# To just take a file from https://norvig.com/big.txt
fin = StringIO(requests.get('https://norvig.com/big.txt').content.decode('utf8'))

num_symbols, num_bytes = [], []

for line in fin:
    # Get size of string in bytes.
    num_bytes.append(sys.getsizeof(line))
    # Get no. of chars in string
    num_symbols.append(len(line))

# Plot the graph.
sns.distplot(num_symbols)

# Plot the other graph.
sns.set()
sns.distplot(num_bytes)

最可能的情况是，将它们绘制在一起会提供更多信息，请尝试：

sns.distplot(num_symbols, label="chars")
sns.distplot(num_bytes, label="bytes")

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何计算符号/字节并制作直方图

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >