JAVAutil。扫描仪解释这行代码是用JAVA编写的

2 年 Questions & Answers 17688

在HACKERRANK中，这行代码经常出现。我认为这是跳过空白，但那是什么意思

 scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");

Tags:

共 (6) 个答案

# 1 楼答案

这里已经有一个类似的问题了。它不会跳过空格，因为它的unicode字符不存在（u0020）

\r=CR（回车）//在X之前的Mac OS中用作新行字符

\n=LF（换行）//在Unix/Mac OS X中用作新行字符

\r\n=CR+LF//在Windows中用作新行字符

u2028=line separator

u2029=paragraph separator

u0085=next line
# 2 楼答案

Skip\r\n适用于Windows

其余的是标准的\r=CR，\n=LF（参见\r\n , \r , \n what is the difference between them?）

然后是一些Unicode特殊字符：

u2028 = LINE SEPARATOR（https://www.fileformat.info/info/unicode/char/2028/index.htm）

u2029 = PARAGRAPH SEPARATOR （http://www.fileformat.info/info/unicode/char/2029/index.htm）

u0085 = NEXT LINE（https://www.fileformat.info/info/unicode/char/0085/index.htm）
# 3 楼答案
```
 scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
```
1. 在Unix和所有类似Unix的系统中，\n是行尾的代码， \r没有什么特别的意思
2. 因此，在C语言和大多数语言中，它们以某种方式复制了它（甚至远程），\n是行尾的标准转义序列（根据需要转换为操作系统特定序列或从操作系统特定序列转换而来）
3. 在旧的Mac系统（OS X之前）中，\r是行尾的代码相反，在Windows（以及许多旧的操作系统）中，行尾代码是2 字符，\r\n，按此顺序排列，作为（令人惊讶的；-）结果（回到比Windows旧得多的OSs），\r\n是标准 Internet上文本格式的行终止
u0085 NEXT LINE (NEL)

U2029 PARAGRAPH SEPARATOR

U2028 LINE SEPARATOR'

这背后的全部逻辑是，当输入来自扫描仪时，删除额外的空间和额外的新行
# 4 楼答案
OpenJDK的源代码显示，nextLine（）将此正则表达式用于行分隔符：
```
private static final String LINE_SEPARATOR_PATTERN = "\r\n|[\n\r\u2028\u2029\u0085]";
```
- \r\n是一个窗口行结束
- \n是一个UNIX行结尾
- \r是一个麦金托什（OSX之前）行的结尾
- \u2028是LINE SEPARATOR
- \u2029是PARAGRAPH SEPARATOR
- \u0085是NEXT LINE (NEL)
# 5 楼答案

整个过程是一个正则表达式，因此您可以简单地将其放入https://regexr.com或https://regex101.com/中，它将为您提供正则表达式每个部分的含义的完整描述

这是给你的：

(\r\n|[\n\r\u2028\u2029\u0085])? / gm

1st Capturing Group (\r\n|[\n\r\u2028\u2029\u0085])?

? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)

1st Alternative \r\n

\r matches a carriage return (ASCII 13)

\n matches a line-feed (newline) character (ASCII 10)

2nd Alternative [\n\r\u2028\u2029\u0085]

Match a single character present in the list below

[\n\r\u2028\u2029\u0085]

\n matches a line-feed (newline) character (ASCII 10)

\r matches a carriage return (ASCII 13)

\u2028 matches the character   with index 202816 (823210 or 200508) literally (case sensitive)

\u2029 matches the character   with index 202916 (823310 or 200518) literally (case sensitive)

\u0085 matches the character with index 8516 (13310 or 2058) literally (case sensitive)

Global pattern flags

g modifier: global. All matches (don't return after first match)

m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

至于scanner.skip这确实（Scanner Pattern Tutorial）：

The java.util.Scanner.skip(Pattern pattern) method skips input that matches the specified pattern, ignoring delimiters. This method will skip input if an anchored match of the specified pattern succeeds.If a match to the specified pattern is not found at the current position, then no input is skipped and a NoSuchElementException is thrown.

我还建议阅读Alan Moore's这里的答案RegEx in Java: how to deal with newline他谈到了Java1.8中的新方法
# 6 楼答案
扫描仪。skip跳过与模式匹配的输入，此处模式为：-

（\r\n |[\n\r\u2028\u2029\u0085]）
- ?？完全匹配前一个字符中的零个或一个
- |另类
- []匹配中存在的单个字符
- \r与回车符匹配
- \n换行符
- \u2028匹配索引为2018基16（8232基10或20050基8）的字符，区分大小写
- \u2029匹配索引2029以16为底（8233以10为底或20051以8为底）区分大小写的字符
- \u0085匹配索引85以16为基数（133以10为基数或205以8为基数）区分大小写的字符
1st Alternative \r\n
- \r与回车符匹配（ASCII 13）
- \n匹配换行符（换行符）字符（ASCII 10）
2nd Alternative [\n\r\u2028\u2029\u0085]
- 匹配下面列表中的单个字符[\n\r\u2028\u2029\u0085]
- \n匹配换行符（换行符）字符（ASCII 10）
- \r与回车符匹配（ASCII 13）
- \u2028使用索引202816（823210或200508）逐字（区分大小写）行分隔符匹配字符
- \u2029使用索引202916（823310或200518）逐字（区分大小写）段落分隔符匹配字符
- \u0085将字符与下一行的索引8516（13310或2058）按字面（区分大小写）进行匹配

Python中文网

有 Java 编程相关的问题?

JAVAutil。扫描仪解释这行代码是用JAVA编写的

共 (6) 个答案

# 1 楼答案

# 2 楼答案

# 3 楼答案

# 4 楼答案

# 5 楼答案

# 6 楼答案

（\r\n |[\n\r\u2028\u2029\u0085]）