产生带有反斜杠但不包括注释块的连接行

# this entire line is a comment - don't include it in the output <line0> # this entire line is a comment - don't include it in the output <line1># comment <line2> # this entire line is a comment - don't include it in the output <line3.1 \ line3.2 \ line3.3> <line4.1 \ line4.2> <line5># comment \ # more comment1 \ more comment2> <line6> # here's a comment line continued to the next line \ this line is part of the comment from the previous line

try: file_name = open('path/to/file.txt', 'r') except FileNotFoundError: print("File could not be found. Please check spelling of file name!") sys.exit() #Read lines in file Lines = file_name.read().splitlines() class FileLineGen: def get_filelines(path: str) -> Iterator[str]: for line in Lines: #Exclude a line if it starts with # if line.startswith("#"): line.replace(line, "") continue if "#" in line: #Split at where the # is located line.split('#') #Yield everything before the comment block yield line.split('#')[0] continue if line.endswith('\\'): #Yield everything but the backslash line = line[:-1] yield line continue #Yield the line in all other cases else: yield line gen = get_filelines(file_name) for line in Lines: print(next(gen))

2条回答

网友

1楼 · 编辑于 2024-09-24 08:27:35

您有两个运算符，#和\。后者优先于前者。这意味着您应该先检查并处理它。以下是使用列表作为缓冲区来建立行的一种简单方法：

def my_generator(f):
    buffer = []
    for line in f:
        line = line.rstrip('\n')
        if line.endswith('\\'):
            buffer.append(line[:-1])
            continue
        line = ''.join(buffer) + line
        buffer = []
        if '#' in line:
            line = line[:line.index('#')]
        if line:
            yield line

包装一个iterable行并使用ducktyping的好处是，您可以传入任何行为类似于字符串容器的内容，而不仅仅是文本文件：

text = """# this entire line is a comment - don't include it in the output
<line0>
# this entire line is a comment - don't include it in the output
<line1># comment
<line2>
# this entire line is a comment - don't include it in the output
<line3.1 \
line3.2 \
line3.3>
<line4.1 \
line4.2>
<line5># comment \
# more comment1 \
more comment2>
<line6>
# here's a comment line continued to the next line \
this line is part of the comment from the previous line'"""

for line in my_generator(text.splitlines()):
    print(line)

结果如预期：

<line0>
<line1>
<line2>
<line3.1 line3.2 line3.3>
<line4.1 line4.2>
<line5>
<line6>

编写循环的另一种方法是

print('\n'.join(my_generator(text.splitlines())))

网友

2楼 · 编辑于 2024-09-24 08:27:35

我建议使用re.sub方法

def line_gen(text: str):

    text = re.sub(r"\s+\\\n", '', text)   # Remove any \ break
    text = re.sub(r"#(.*)\n", '\n', text) # Remove any comment
    # If the last line it is a comment it won't have a final \n.
    # We have to remove it as well.
    text = re.sub(r"#.*", '', text) 

    for line in text.rsplit():  # Using rsplit here we get ride of all unwanted spaces.
        yield line


with open("/tmp/data.txt") as f:
    text = f.read()

    for line in line_gen(text):
        print(line)

data.txt的内容

# this entire line is a comment - don't include it in the output
<line0>
# this entire line is a comment - don't include it in the output
<line1># comment
<line2>
# this entire line is a comment - don't include it in the output
<line3.1 \
line3.2 \
line3.3>
<line4.1 \
line4.2>
<line5># comment \
# more comment1 \
more comment2>
<line6>
# here's a comment line continued to the next line \
this line is part of the comment from the previous line

结果：

<line0>
<line1>
<line2>
<line3.1line3.2line3.3>
<line4.1line4.2>
<line5>
<line6>

相关问题更多 >

编程相关推荐

热门问题

热门文章