Python Git差异部分

2024-10-01 00:20:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我想用Python代码解析git diff,我有兴趣从diff parser获得以下信息:

  1. 删除/添加行的内容以及行号。在
  2. 文件名。在
  3. 文件的状态,无论是删除、重命名还是添加。在

为此,我使用unidiff 0.5.2,并编写了以下代码:

    from unidiff import PatchSet
    import git
    import os

    commit_sha1 = 'b4defafcb26ab86843bbe3464a4cf54cdc978696'
    repo_directory_address = '/my/git/repo'
    repository = git.Repo(repo_directory_address)
    commit = repository.commit(commit_sha1)
    diff_index = commit.diff(commit_sha1+'~1', create_patch=True)
    diff_text = reduce(lambda x, y: str(x)+os.linesep+str(y), diff_index).split(os.linesep)
    patch = PatchSet(diff_text)
    print patch[0].is_added_file

我正在使用GitPython生成Git diff。对于上述代码,我收到以下错误:

^{pr2}$

如果你能帮我纠正这个错误,我将不胜感激。在


Tags: 代码importgitindexosaddressrepositorydiff
2条回答

使用diff_index[i].diff作为tdichp的建议,并将源文件和目标文件行添加到diff中,否则unidiff将抛出。以下是我的工作代码示例:

diffs = []
diff_index = commit.diff(prev_commit, create_patch=True)
for diff in diff_index.iter_change_type('M'):
  if diff.a_path[-3:] == ".js":
    diffs.append(diff)

if diffs:
  for d in diffs:
    a_path = " - " + d.a_rawpath.decode('utf-8')
    b_path = "+++ " + d.b_rawpath.decode('utf-8')

    # Get detailed info
    patch = PatchSet(a_path + os.linesep + b_path + os.linesep + d.diff.decode('utf-8'))

    for h in patch[0]:
      for l in h:
        print("  " + str(l.source_line_no) + " <-> " + str(l.target_line_no))
      print("")

更新
我发现我以前的答案已经不起作用了。以下是新的解决方案:
对于这个解决方案,您需要 git 和{a2}包。在

import git
from unidiff import PatchSet

from cStringIO import StringIO

commit_sha1 = 'commit_sha'
repo_directory_address = "your/repo/address"

repository = git.Repo(repo_directory_address)
commit = repository.commit(commit_sha1)

uni_diff_text = repository.git.diff(commit_sha1+ '~1', commit_sha1,
                                    ignore_blank_lines=True, 
                                    ignore_space_at_eol=True)

patch_set = PatchSet(StringIO(uni_diff_text), encoding='utf-8')

change_list = []  # list of changes 
                  # [(file_name, [row_number_of_deleted_line],
                  # [row_number_of_added_lines]), ... ]

for patched_file in patch_set:
    file_path = patched_file.path  # file name
    print('file name :' + file_path)
    del_line_no = [line.target_line_no 
                   for hunk in patched_file for line in hunk 
                   if line.is_added and
                   line.value.strip() != '']  # the row number of deleted lines
    print('deleted lines : ' + str(del_line_no))
    ad_line_no = [line.source_line_no for hunk in patched_file 
                  for line in hunk if line.is_removed and
                  line.value.strip() != '']   # the row number of added liens
    print('added lines : ' + str(ad_line_no))
    change_list.append((file_path, del_line_no, ad_line_no))


旧解决方案(此方案可能不再有效)

最后,我找到了解决办法。gitpython的输出与标准git diff输出略有不同。在标准的git diff源文件中,以-开头,但是gitpython的输出以开头,正如您在运行以下python代码的输出中看到的那样(这个示例是用elasticsearch repository)生成的:

^{pr2}$

部分输出如下:

core/src/main/java/org/elasticsearch/action/index/IndexRequest.java
=======================================================
lhs: 100644 | f8b0ce6c13fd819a02b1df612adc929674749220
rhs: 100644 | b792241b56ce548e7dd12ac46068b0bcf4649195
    a/core/src/main/java/org/elasticsearch/action/index/IndexRequest.java
+++ b/core/src/main/java/org/elasticsearch/action/index/IndexRequest.java
@@ -20,16 +20,18 @@
package org.elasticsearch.action.index;

 import org.elasticsearch.ElasticsearchGenerationException;
+import org.elasticsearch.Version;
 import org.elasticsearch.action.ActionRequestValidationException;
 import org.elasticsearch.action.DocumentRequest;
 import org.elasticsearch.action.RoutingMissingException;
 import org.elasticsearch.action.TimestampParsingException;
 import org.elasticsearch.action.support.replication.ReplicationRequest;
 import org.elasticsearch.client.Requests;
+import org.elasticsearch.cluster.metadata.IndexMetaData;
 import org.elasticsearch.cluster.metadata.MappingMetaData;
 import org.elasticsearch.cluster.metadata.MetaData;
 import org.elasticsearch.common.Nullable;
-import org.elasticsearch.common.UUIDs;
+import org.elasticsearch.common.Strings;
 import org.elasticsearch.common.bytes.BytesArray;
 import org.elasticsearch.common.bytes.BytesReference;

如您所见,源文件的第4行以开头。要解决此问题,您需要编辑unidiff 0.5.2的源文件中的正则表达式,该文件位于/unidiff/constants.py中:

RE_SOURCE_FILENAME = re.compile(
                      r'^ - (?P<filename>[^\t\n]+)(?:\t(?P<timestamp>[^\n]+))?')

收件人:

RE_SOURCE_FILENAME = re.compile(
                   r'^    (?P<filename>[^\t\n]+)(?:\t(?P<timestamp>[^\n]+))?')

PS:如果源文件重命名,gitpython将生成diff start with-。但它不会抛出错误,因为我过滤了重命名文件的git diff(diff_filter='cr')。在

相关问题 更多 >