<p><strong>方法1</strong></p>
<pre><code>a = np.where(df != "", "1", "0").astype("|S1")
df["bin"] = np.apply_along_axis(lambda x: x.tostring().decode("utf-8"), 1, a)
</code></pre>
<p><strong>方法2</strong></p>
<pre><code>df["bin"] = np.append(
np.where(df != "", "1", "0").astype("S1"),
np.array([["\n"]]).astype("S1").repeat(df.shape[0], axis=0),
axis=1
).tostring().decode("utf-8")[:-1].split("\n")
</code></pre>
<p>方法2将<code>\n</code>追加到numpy数组的末尾</p>
<pre><code>array([[b'1', b'0', b'1', b'0', b'0', b'\n'],
[b'1', b'1', b'1', b'0', b'0', b'\n'],
[b'1', b'1', b'1', b'1', b'0', b'\n'],
...,
[b'1', b'0', b'0', b'0', b'0', b'\n'],
[b'1', b'0', b'1', b'0', b'1', b'\n'],
[b'1', b'0', b'1', b'0', b'0', b'\n']], dtype='|S1')
</code></pre>
<p>然后调用<code>tostring</code>和<code>decode</code>。删除最后一个“\n”,然后按“\n”拆分。你知道吗</p>
<p><strong>方法3</strong>(使用<code>view</code>参考:<a href="https://stackoverflow.com/questions/10984471/numpy-array-of-chars-to-string">numpy array of chars to string</a>)</p>
<pre><code>np.ascontiguousarray(
np.where(df != "", "1", "0").astype("S1")
).view('|S5').astype(str)
</code></pre>
<h2>时间安排:</h2>
<pre><code>(Based on jezrael's setup df = pd.concat([df] * 1000, ignore_index=True))
# method 2
%timeit np.append(np.where(df != "", "1", "0").astype("S1"), np.array([["\n"]]).astype("S1").repeat(df.shape[0], axis=0), axis=1).tostring().decode("utf-8")[:-1].split("\n")
12.3 ms ± 175 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# method 3
%timeit np.ascontiguousarray(np.where(df != "", "1", "0").astype("S1")).view('|S5').astype(str)
12.8 ms ± 164 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# method 1 (slower)
%timeit np.apply_along_axis(lambda x: x.tostring().decode("utf-8"), 1, np.where(df != "", "1", "0").astype("S1"))
45 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
</code></pre>
<p>耶斯雷尔的复制实验</p>
<pre><code>In [99]: %timeit df.astype(bool).astype(int).astype(str).values.sum(axis=1)
28.9 ms ± 782 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [100]: %timeit (df != '').astype(int).astype(str).values.sum(axis=1)
29 ms ± 645 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [101]: %timeit (df != '').astype(int).astype(str).apply(''.join, axis=1)
168 ms ± 2.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [102]: %timeit df.astype(bool).astype(int).astype(str).apply(''.join, axis=1)
173 ms ± 7.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [103]: %timeit df.astype(bool).astype(int).apply(lambda row: ''.join(str(i) for i in row), axis=1)
159 ms ± 3.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
</code></pre>