python中的正则表达式有问题

2024-10-01 00:16:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我从开源工具“cloc”得到了以下输出。我想使用python正则表达式获取语言列中的所有项

$ cloc .
       6 text files.
       6 unique files.                              
       3 files ignored.

github.com/AlDanial/cloc v 1.80  T=0.02 s (238.3 files/s, 34909.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                           1             46            110            347
Markdown                         1              8              0             35
Dockerfile                       1              6              0             19
YAML                             1              0              0             15
-------------------------------------------------------------------------------
SUM:                             4             60            110            416
-------------------------------------------------------------------------------

我正在使用以下代码,但到目前为止没有运气

class Cloc():
    def cloc_scan(self, dir_path=None):
        if dir_path is not None:
            cmd = 'cloc {}'.format(dir_path)
            returncode, stdout, stderr = util.run_command(
                cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
            if returncode != 0:
                logger.debug_error = "returncode is {returncode}\nstdout:\n{stdout}".format(
                    returncode=returncode, stdout=stdout)
                raise Exception(logger.debug_error)
            else:
                logger.debug("Cloc scan successful.")
                if stdout:
                    matches = []
                    for line in stdout.splitlines():
                        regex = r"^([^ \t \n \- \gLS]+)"
                        match = re.findall(regex, line)
                        matches.append(match)
                        if line:
                            if line[:1] == "[":
                                logger.debug("{line}".format(line=line))
                            else:
                                logger.debug("{line}".format(line=line))
                    languages = [x for x in matches if x]
                    languages = [item for sublist in languages for item in sublist]
                    logger.info(languages)
                    logger.info(stdout)
                    return stdout
        else:
            logger.info("Unable to run scan without path to source code directory")

Tags: pathindebugformatforscanifdir
2条回答

您可以使用re.split分割每一行,并使用空格作为分隔符

例如,如果内容是STDOUT的内容,则可以执行以下操作:

>>> import re
>>> for line in content.splitlines():
...     print(re.split(r'\s+', line))
... 

结果是这样的:

['']
['', '6', 'text', 'files.']
['', '6', 'unique', 'files.', '']
['', '3', 'files', 'ignored.']
['']
['github.com/AlDanial/cloc', 'v', '1.80', 'T=0.02', 's', '(238.3', 'files/s,', '34909.8', 'lines/s)']
['                                       -']
['Language', 'files', 'blank', 'comment', 'code']
['                                       -']
['Python', '1', '46', '110', '347']
['Markdown', '1', '8', '0', '35']
['Dockerfile', '1', '6', '0', '19']
['YAML', '1', '0', '0', '15']
['                                       -']
['SUM:', '4', '60', '110', '416']
['                                       -']

因此,如果您这样做,可以使cloc的结果更加清晰,从而使您的生活更加轻松:

cloc ./my_repo_here/   csv  quiet | tail -n +3 | cut -d ',' -f 2

这产生了以下结果:

Python
Markdown
Dockerfile
YAML

相关问题 更多 >