使用Git(Python3)可靠地检索SHA和行内容

2024-09-28 21:48:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在编写一个包(Python>;=3.5),它使用git blame来检索有关文件的信息。我正在努力用自定义代码替换GitPython依赖项,这些代码只支持我们实际需要的功能的一小部分(并以我们实际需要的形式提供数据)

我发现git blame -lts最接近我需要的,即检索文件中每一行的提交SHA和行内容。这给了我类似的输出

82a3e5021b7131e31fc5b110194a77ebee907955 books/main/docs/index.md  5) Softwareplattform [ILIAS](https://www.ilias.de/), die an zahlreichen

我已经处理过了

       line_pattern = re.compile('(.*?)\s.*\s*\d\)(\s*.*)')

        for line in cmd.stdout():
            m = line_pattern.match(line)
            if m:
                sha = m.group(1)
                content = m.group(2).strip()

这很有效。然而,该软件包的维护人员正确地警告说,“这可能会为非常特定的用户组引入难以调试的错误。可能需要跨多个OS和GIT版本进行大量的单元测试。”

我之所以采用这种方法,是因为我发现git blame --porcelain的输出解析起来有些乏味

30ed8daf1c48e4a7302de23b6ed262ab13122d31 1 1 1
author XY
author-mail <XY>
author-time 1580742131
author-tz +0100
committer XY
committer-mail <XY>
committer-time 1580742131
committer-tz +0100
summary Stub-Outline-Dateien
filename home/docs/README.md
        hero: abcdefghijklmnopqrstuvwxyz
82a3e5021b7131e31fc5b110194a77ebee907955 18 18

82a3e5021b7131e31fc5b110194a77ebee907955 19 19
        ---
82a3e5021b7131e31fc5b110194a77ebee907955 20 20

...

我不喜欢那种在字符串列表上进行迭代的方式

我的问题是:

1)我是否应该更好地使用--porcelain输出,因为它明确用于机器消费? 2) 我能期望这种格式在Git版本和操作系统上具有健壮性吗?我是否可以假设以制表符字符开头的行是内容行,这是源行输出的最后一行,并且制表符之后的任何内容都是原始行内容


Tags: 文件代码git版本docs内容linegroup
1条回答
网友
1楼 · 发布于 2024-09-28 21:48:58

不知道这是否是最好的解决方案,我在这里不等待答案就尝试了一下。我假设我的两个问题的答案是“是”

在这里的上下文中可以看到以下代码:https://github.com/uliska/mkdocs-git-authors-plugin/blob/6f5822c641452cea3edb82c2bbb9ed63bd254d2e/mkdocs_git_authors_plugin/repo.py#L466-L565

    def _process_git_blame(self):
        """
        Execute git blame and parse the results.

        This retrieves all data we need, also for the Commit object.
        Each line will be associated with a Commit object and counted
        to its author's "account".
        Whether empty lines are counted is determined by the
        count_empty_lines configuration option.

        git blame  porcelain will produce output like the following
        for each line in a file:

        When a commit is first seen in that file:
            30ed8daf1c48e4a7302de23b6ed262ab13122d31 1 2 1
            author John Doe
            author-mail <j.doe@example.com>
            author-time 1580742131
            author-tz +0100
            committer John Doe
            committer-mail <j.doe@example.com>
            committer-time 1580742131
            summary Fancy commit message title
            filename home/docs/README.md
                    line content (indicated by TAB. May be empty after that)

        When a commit has already been seen *in that file*:
            82a3e5021b7131e31fc5b110194a77ebee907955 4 5
                    line content

        In this case the metadata is not repeated, but it is guaranteed that
        a Commit object with that SHA has already been created so we don't
        need that information anymore.

        When a line has not been committed yet:
            0000000000000000000000000000000000000000 1 1 1
            author Not Committed Yet
            author-mail <not.committed.yet>
            author-time 1583342617
            author-tz +0100
            committer Not Committed Yet
            committer-mail <not.committed.yet>
            committer-time 1583342617
            committer-tz +0100
            summary Version of books/main/docs/index.md from books/main/docs/index.md
            previous 1f0c3455841488fe0f010e5f56226026b5c5d0b3 books/main/docs/index.md
            filename books/main/docs/index.md
                    uncommitted line content

        In this case exactly one Commit object with the special SHA and fake
        author will be created and counted.

        Args:
             -
        Returns:
             - (this method works through side effects)
        """

        re_sha = re.compile('^\w{40}')

        cmd = GitCommand('blame', [' porcelain', str(self._path)])
        cmd.run()

        commit_data = {}
        for line in cmd.stdout():
            key = line.split(' ')[0]
            m = re_sha.match(key)
            if m:
                commit_data = {
                    'sha': key
                }
            elif key in [
                'author',
                'author-mail',
                'author-time',
                'author-tz',
                'summary'
            ]:
                commit_data[key] = line[len(key)+1:]
            elif line.startswith('\t'):
                # assign the line to a commit
                # and create the Commit object if necessary
                commit = self.repo().get_commit(
                    commit_data.get('sha'),
                    # The following values are guaranteed to be present
                    # when a commit is seen for the first time,
                    # so they can be used for creating a Commit object.
                    author_name=commit_data.get('author'),
                    author_email=commit_data.get('author-mail'),
                    author_time=commit_data.get('author-time'),
                    author_tz=commit_data.get('author-tz'),
                    summary=commit_data.get('summary')
                )
                if len(line) > 1 or self.repo().config('count_empty_lines'):
                    author = commit.author()
                    if author not in self._authors:
                        self._authors.append(author)
                    author.add_lines(self, commit)
                    self.add_total_lines()
                    self.repo().add_total_lines()

相关问题 更多 >