使我的awk shell脚本更高效（解析python）

find ~/svn/ -name *.py | xargs grep -hn "^import\|^from" | awk -F ":" '{print $2}' | awk '{if (/from/) print $2; else {$1 = ""; print $0} }' | sed 's/,\| /\n/g' | sort | uniq > /tmp/pythonpkgs.txt

3条回答

网友

1楼 · 编辑于 2024-06-28 16:15:25

很聪明的设置开始，但有几个地方可以清理：

1: find ~/svn/ -name *.py 
2: | xargs grep -hn "^import\|^from"
3: | awk -F ":" '{print $2}' 
4: | awk '{if (/from/) print $2; else {$1 = ""; print $0} }' 
5: | sed 's/,\| /\n/g' 
6: | sort 
7: | uniq > /tmp/pythonpkgs.txt

第3行：您不需要第一次awk剥离/打印，只需在grep上不包含-n，这样就不会在输出中包含行号。你知道吗

time find ./<<my_large_project>> -name *.py 
| xargs grep -hn "^import\|^from" 
| awk '{if (/from/) print $2; else {$1 = ""; print $0} }' 
| sed 's/,\| /\n/g' 
| sort 
| uniq
~~snip~~
real    0m0.492s
user    0m0.208s
sys     0m0.116s

第6-7行和第4-5行：如果有很多重复行，可以在运行awk和sed之前通过sort和uniq-ing来加快执行速度

time find ./<<my_large_project>> -name *.py 
| xargs grep -hn "^import\|^from" 
| sort 
| uniq 
| awk '{if (/from/) print $2; else {$1 = ""; print $0} }' 
| sed 's/,\| /\n/g'
~~snip~~
real    0m0.464s
user    0m0.224s
sys     0m0.140s

请注意，这将丢失PEP 0328中描述的多行导入。对这些导入的支持将使regex搜索变得相对简单，因为您必须寻找可选的括号并注意前面的空白。你知道吗

网友

2楼 · 编辑于 2024-06-28 16:15:25

为特定的构造编写源代码是非常脆弱的，在很多情况下可能会失败。例如，考虑：

import foo ; print 123

或者

import foo, \
   bar

或者

 str = '''
 import foo
 '''

等等

如果您对一种更健壮的方法感兴趣，这就是如何使用python自己的编译器可靠地解析导入的名称：

import ast

def list_imports(source):
    for node in ast.walk(ast.parse(source)):
        if isinstance(node, ast.Import):
            for name in node.names:
                yield name.name
        if isinstance(node, ast.ImportFrom):
            yield node.module

用法：

 for name in sorted(set(list_imports(some_source))):
     print name

网友

3楼 · 编辑于 2024-06-28 16:15:25

以下是我的综合awk：

/^[ \t]*import .* as/  {
    sub("^[ \t]+","");          # remove leading whitespace
    sub("[ \t]*#.*","");        # remove comments
    print $2;
    next;
}
/^[ \t]*from (.*) import/ {
    sub("^[ \t]+","");          # remove leading whitespace
    sub("[ \t]*#.*","");        # remove comments
    print $2;
    next;
}
/^[ \t]*import (.*)/  {
    sub("^[ \t]+","");          # remove leading whitespace
    sub("[ \t]*#.*","");        # remove comments
    split(substr($0,7),a,",");  # split on commas
    for (i=1;i<=length(a);i++) {
        gsub("[ \t]+","",a[i]); # trim whitespace
        print a[i];
    }
    next;
}

呼叫方式：

find . -name '*.py' -exec awk -f scan.awk {} \; | sort | uniq

如前所述，它不考虑一些潜在的情况，例如用“；”连接的行或用“\”拆分的行，或用“（）”分组的行，但它将覆盖大多数Python代码。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章