擅长:python、mysql、java
<p>鉴于规格,我看不出您的第一行<code>Nan, Nan</code>来自何处。可能是你的例子中的打字错误?无论如何,这里有一个可能的解决办法。在</p>
<pre><code>import re
# returns words with at least one hyphen
def split_phrase(phrase):
return re.findall('(\w+(?:-\w+)+)', phrase)
# get all words with hyphens
words_with_hyphens = sum(df.Phrases.apply(split_phrase).values)
# split all words into parts
split_words = [word.split('-') for word in words_with_hyphens]
# keep words with two parts only, else return (Nan, Nan)
new_data = [(ws[0], ws[1]) if len(ws) == 2 else (np.nan, np.nan) for ws in split_words]
# create the new DataFrame
pd.DataFrame(new_data, columns=['part1', 'part2'])
# part1 | part2
#
# 0 Yellow | Green
# 1 Jong | il
# 2 methyl | butane
# 3 da | derp
# 4 NaN | NaN
</code></pre>