这个命令到底在做什么？

2条回答

网友

1楼 · 编辑于 2024-09-19 20:52:14

是的，你是对的，这基本上是将一列拆分为一个破折号，将员工ID号和他们的实际姓名分开（在原始数据中，他们合并在一列中）。

我将给您介绍一下zip和lambda的工作：

This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables.

示例：

>>> my_list1 = [1, 2, 3, 4, 5]
>>> my_list2 = ['a', 'e', 'i', 'o', 'u']
>>> zip(my_list1, my_list2) #Returns a list of tuples, each tuple has a number corresponding to its number vowel.
[(1, 'a'), (2, 'e'), (3, 'i'), (4, 'o'), (5, 'u')]
>>>

^{}^{from the docs}：

Small anonymous functions can be created with the lambda keyword. This function returns the sum of its two arguments: lambda a, b: a+b

示例：

>>> #Writing a function that squares numbers
>>>
>>> #Long way 
>>> def square(x):
...     return x**2
... 
>>>
>>> #Short way 
>>> square = lambda x: x**2
>>>

网友

2楼 · 编辑于 2024-09-19 20:52:14

apply方法为序列的每一行调用lambda函数一次， df['Original Column']。为每行调用一次Python函数是一个方法对于慢度（如果序列有许多行）。一般来说，要最大限度地提高性能使用Pandas时，只有在没有其他选项的情况下才使用apply方法。你知道吗

在这里使用zip会增加效率。zip返回Python列表元组数。Python列表和元组比Pandas系列需要更多的空间当序列中的值是本机NumPy数据类型时。字符串可以是由本机NumPy数据类型表示，因此将数据保留在一个系列中更为重要节省空间。所以zip，像apply，如果可能的话，这里应该避免。你知道吗

在本例中，您可以代替Panda的vectorized string method, ^{}：

df = pd.DataFrame({'Original':['abc-def']*3+['foo']})
#   Original
# 0  abc-def
# 1  abc-def
# 2  abc-def
# 3      foo

df[['New1', 'New2']] = df['Original'].str.extract(r'([^-]*)-?(.*)')
print(df)

收益率

  Original New1 New2
0  abc-def  abc  def
1  abc-def  abc  def
2  abc-def  abc  def
3      foo  foo

extract的参数是regex模式。 r'([^-]*)-?(.*)'具有以下含义：

([^-]*)     match 0-or-more characters other than a literal hyphen
-?          match 0-or-1 literal question mark
(.*)        match 0-or-more of any character

用括号括起来的模式定义组，然后由 extract方法。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章