擅长:python、mysql、java
<p>使用相同的工作流,您可以按照建议手动设置<code>divisions</code>},<a href="https://github.com/dask/dask/issues/1848#issuecomment-267138530" rel="nofollow noreferrer">here</a></p>
<pre><code>import dask.dataframe as dd
import pandas as pd
import numpy as np
pd.DataFrame(np.random.rand(25000, 2)).to_csv("tempfile.csv", index=False)
df = dd.read_csv("tempfile.csv")
ndf = pd.DataFrame(np.random.randint(1000, 3500, size=2500))
df.divisions = (0, len(df)-1)
df["Note"] = dd.from_array(np.repeat(ndf.values, 10))
</code></pre>
<p>我不认为使用<code>np.repeat</code>是非常有效的,尤其是对于大df。在</p>