Bash:存储替换的子字符串

网友

1楼 · 编辑于 2024-10-06 12:11:41

AWK可用于此目的。你知道吗

见https://www.gnu.org/software/gawk/manual/html_node/Redirection.html 其中包含以下概念示例：

$ awk '{ print $2 > "phone-list"
>        print $1 > "name-list" }' mail-list
$ cat phone-list
-| 555-5553
-| 555-3412
…
$ cat name-list
-| Amelia
-| Anthony
…

邮件列表中有两列信息：第一列包含姓名，第二列包含电话号码。你知道吗

请参阅match(string,regex)函数（http://www.grymoire.com/Unix/Awk.html#uh-47）以获取正则表达式，记住$0指定读入的整行。此函数返回RSTART和RLENGTH，可与substr(string,position,length)（http://www.grymoire.com/Unix/Awk.html#uh-43）函数一起使用，以返回匹配的模式（如果按行搜索，则string=$0）。你知道吗

关于AWK的一个很好的介绍是：http://www.grymoire.com/Unix/Awk.html …可能看起来很长，但值得投资。你知道吗

更新

如果您实际处理的是包含信息字段的多行，并且您并不特别关心找到的项目是否以相同的列形式打印，那么下面的操作将起作用：

echo -e " apple pears banana \n kiwi ananas cocoa\n pork" | 
awk '{
  #printf "\n"
  for(j=1;j<=NF;j++){
    i=match($j,/[ab][a-z]+/)
    if(i>0){
      print $j > "removed.txt"
    }else{
      printf $j " "
    }
  }
}'

如果您确实想保留列形式，那么您可以使用上面注释的printf函数，只需稍加调整即可获得正确的结果（并将第二个print替换为printf $j " "）。但是，由于AWK处理字段，如果您在要捕获的单个字段中有多个模式实例（即在字段之间没有分隔符），那么上述方法将导致问题。你知道吗

更新2

下面是一个更好的解决方案，它将确保找到所有匹配项，并且与字段无关：

echo -e " apple pears banana \n kiwi ananas cocoa" |
awk '
BEGIN {
  regex="a.{2,3}";
}
{
  ibeg=1;
  imat=match(substr($0,ibeg),regex);
  after=$0;
  while (imat) {
    before = substr($0,ibeg,RSTART-1);
    pattern = substr($0,ibeg+RSTART-1,RLENGTH);
    after = substr($0,ibeg+RSTART+RLENGTH-1);
    printf before;
    print pattern >"removed.txt";
    ibeg=ibeg+RSTART+RLENGTH-1;
    imat=match(substr($0,ibeg),regex);
  }
  print after;
}
'

输出：

e peba
kiwi ocoa

已删除：

$ cat removed.txt
appl
ars
anan
anan
as c

网友
2楼 · 编辑于 2024-10-06 12:11:41

使用sed可以做到这一点，但是由于regex和文件名不是固定的，sed不能很好地处理shell变量，awk是更好的工具。我们要运行的awk代码可能如下所示：
{ head = "" tail = $0 while(match(tail, re)) { # while there's a match in the # part of the line we haven't # yet inspected print substr(tail, RSTART, RLENGTH) > file # print the match to the # file head = head substr(tail, 1, RSTART - 1) # split off the parts before tail = substr(tail, RSTART + RLENGTH) # and after the match } print head tail # print what's left in the end }
使用合适的参数re和file。感谢@EdMorton，他指出了原代码的一个问题，并提出了这一修正案。你知道吗
为了让这个问题变得可以调用，让我们在它周围放一个小shell样板：
#!/bin/sh if [ $# -ne 2 ]; then echo "Usage: $0 regex filename" exit 1 fi awk -v re="$1" -v file="$2" ' { head = "" tail = $0 while(match(tail, re)) { print substr(tail, RSTART, RLENGTH) > file head = head substr(tail, 1, RSTART - 1) tail = substr(tail, RSTART + RLENGTH) } print head tail }'
把它放在一个文件magic_script，chmod +x里，就这样了。当然，你也可以直接打电话给awk
awk -v re=' [ab][a-z]+' -v file=removed.txt '{ head = ""; tail = $0; while(match(tail, re)) { print substr(tail, RSTART, RLENGTH) > file; head = head substr(tail, 1, RSTART - 1); tail = substr(tail, RSTART + RLENGTH); } print head tail }'

网友
3楼 · 编辑于 2024-10-06 12:11:41

以下是一个解决方案，它可以保持线条完整，而不是删除的线条：

$ echo -e "apple pears banana \n kiwi ananas cocoa" \
| awk '{ for (i=1;i<=NF;++i) { if ($i ~ /^[ab][a-z]+/) { print $i > "removed.txt"; $i=""}} print }'
 pears 
kiwi  cocoa

$ cat removed.txt 
apple
banana
ananas

相关问题更多 >

编程相关推荐

热门问题

热门文章