<p>我想我今天会做一些新的事情,所以我决定“学习”d。请注意,这是我写的第一个d,所以我可能会完全离开。在</p>
<p>我首先尝试的是手动缓冲:</p>
<pre><code>foreach (chunk; infile.byChunk(100000)) {
linect += splitLines(cast(string) chunk).length;
}
</code></pre>
<p>请注意,这是不正确的,因为它忽略了跨越边界的线,但稍后会进行修复。在</p>
<p>这有点帮助,但还远远不够。它确实允许我测试</p>
^{pr2}$
<p>这表明所有的时间都在<code>splitLines</code>。在</p>
<p>我做了<a href="https://github.com/D-Programming-Language/phobos/blob/master/std/string.d#L1976" rel="nofollow">^{<cd1>}</a>的本地副本。仅此一项,速度就提高了2倍!我没想到会这样。我两个都跑</p>
<pre><code>dmd -release -inline -O -m64 -boundscheck=on
dmd -release -inline -O -m64 -boundscheck=off
</code></pre>
<p>不管怎样都差不多。在</p>
<p>然后我重写了<code>splitLines</code>,专门针对<code>s[i].sizeof == 1</code>,这似乎只比Python慢,因为它也破坏了段落分隔符。在</p>
<p>为了完成它,我做了一个范围并进一步优化了它,这使得代码接近Python的速度。考虑到Python不会中断段落分隔符,而且它的底层代码是用C编写的,这似乎没问题。这段代码在长度超过8k的行上可能具有<code>O(n²)</code>性能,但我不确定。在</p>
<pre class="lang-none prettyprint-override"><code>import std.range;
import std.stdio;
auto lines(File file, KeepTerminator keepTerm = KeepTerminator.no) {
struct Result {
public File.ByChunk chunks;
public KeepTerminator keepTerm;
private string nextLine;
private ubyte[] cache;
this(File file, KeepTerminator keepTerm) {
chunks = file.byChunk(8192);
this.keepTerm = keepTerm;
if (chunks.empty) {
nextLine = null;
}
else {
// Initialize cache and run an
// iteration to set nextLine
popFront;
}
}
@property bool empty() {
return nextLine is null;
}
@property auto ref front() {
return nextLine;
}
void popFront() {
size_t i;
while (true) {
// Iterate until we run out of cache
// or we meet a potential end-of-line
while (
i < cache.length &&
cache[i] != '\n' &&
cache[i] != 0xA8 &&
cache[i] != 0xA9
) {
++i;
}
if (i == cache.length) {
// Can't extend; just give the rest
if (chunks.empty) {
nextLine = cache.length ? cast(string) cache : null;
cache = new ubyte[0];
return;
}
// Extend cache
cache ~= chunks.front;
chunks.popFront;
continue;
}
// Check for false-positives from the end-of-line heuristic
if (cache[i] != '\n') {
if (i < 2 || cache[i - 2] != 0xE2 || cache[i - 1] != 0x80) {
continue;
}
}
break;
}
size_t iEnd = i + 1;
if (keepTerm == KeepTerminator.no) {
// E2 80 A9 or E2 80 A9
if (cache[i] != '\n') {
iEnd -= 3;
}
// \r\n
else if (i > 1 && cache[i - 1] == '\r') {
iEnd -= 2;
}
// \n
else {
iEnd -= 1;
}
}
nextLine = cast(string) cache[0 .. iEnd];
cache = cache[i + 1 .. $];
}
}
return Result(file, keepTerm);
}
int main(string[] args)
{
if (args.length < 2) {
return 1;
}
auto file = File(args[1]);
writeln("There are: ", walkLength(lines(file)), " lines.");
return 0;
}
</code></pre>