我一直在尝试并行化整个函数,当它在main中被调用时,或者,当你在下面看到的函数的任何部分时,我都不走运,似乎我无法摆脱TypeError: function object is not iterable。谢谢你的建议。你知道吗

from joblib import Parallel, delayed
num_cores = multiprocessing.cpu_count()
parallel = Parallel(n_jobs=num_cores)
from multiprocessing import Pool
p = Pool(4)

def kmean(layerW,cluster):
    weights1d = np.reshape(layerW,-1)

    #Parallelizing Here
    centroids,_ = parallel(delayed(kmeans(weights1d, cluster)))
    idxs,_      = parallel(delayed(vq(weights1d,centroids)))

    #Here, using Parallel
    weights1d_q = parallel(delayed([centroids[idxs[i]] for i in range(len(weights1d))]))

    #OR --- using pool instead
    weights1d_q = p.map([centroids[idxs[i]] for i in range(len(weights1d))])
    weights4d_q  = np.reshape(weights1d_q, np.shape(layerW))
    return weights4d_q

Q : I can't get away with the TypeError: function object is not iterable


TypeError由于语法错误(对joblib.Parallel( delayed( ... ) ... )的调用格式错误,不遵守有文档记录的调用语法构造函数),异常被正确抛出。你知道吗


>>> from joblib import Parallel, delayed
>>> parallel = Parallel( n_jobs = -1 )
>>> import numpy as np
>>> parallel( delayed( np.sqrt ) ( i**2 ) for i in range( 10 ) )
#          ^  ^^^^^^^     ^^^^     ^^^^   |||
#          |  |||||||     ||||     ||||   vvv
#JOBS(-1):-+  |||||||     ||||     ||||   |||
#DELAYED:  -+++++++     ||||     ||||   |||
#FUN( par ):       ++++     ||||   |||
#     |||                          ||||   |||
#     +++-FUN(signature-"decl.") -++++   |||
#     ^^^                                 |||
#     |||                                 |||
#     +++-<<<-<iterator>-<<<-<<<-<<<-<<< +++
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]



>>> from joblib import Parallel, delayed
>>> parallel = Parallel( n_jobs = -1 )
>>> import numpy as np
>>> parallel( delayed( np.sqrt( 10 ) ) )          #### THIS SLOC IS KNOWINGLY WRONG
#          ^  ^^^^^^^     ^^^^(????)  ????   ???  ####
#          |  |||||||     ||||        ||||   vvv  ####
#JOBS(-1):-+  |||||||     ||||        ||||   |||  ####
#DELAYED:  -+++++++     ||||        ||||   |||  #### DELAYED( <float64> )
#FUN( par ):       ++++        ||||   |||  #### GOT NO CALLABLE FUN( par ) 
#     |||                             ||||   |||  ####        BUT A NUMBER
#     +++-FUN(signature-"decl.")   ++++   |||  ####        FUN( signature )
#     ^^^                                    |||  ####        NOT PRESENT
#     |||                                    |||  ####        AND FEEDER
#     +++-<<<-<iterator>-<<<-<<<-<<<-<<<-<<<-+++  #### <ITERATOR> MISSING
#                                                 ####
Traceback (most recent call last):                ####   FOR DETAILS, READ THE O/P
  File "<stdin>", line 1, in <module>             ####   AND EXPLANATION BELOW
  File ".../lib/python3.5/site-packages/joblib/parallel.py", line 947, in __call__
    iterator = iter(iterable)
TypeError: 'function' object is not iterable

结果证实,O/p使用了一种语法,与文档中的joblib.Parallel( delayed(...) ... )
不兼容 Q.E.D


遵循joblib.Parallel( delayed( ... ) ... )记录的语法:

#entroids, _ = parallel( delayed( kmeans(weights1d, cluster)))
#                                 ^^^^^^(..................)
#                                 ||||||(..................)
centroids, _ = parallel( delayed( kmeans ) ( weights1d, cluster ) for ... )
#                                 ^^^^^^     ^^^^^^^^^^^^^^^^^^   |||||||
#                                 ||||||     ||||||||||||||||||   vvvvvvv
# CALLABLE FUN()         ++++++     ||||||||||||||||||   |||||||
#          FUN( <signature> )        ++++++++++++++++++   |||||||
#               ^^^^^^^^^^^                                       |||||||
#               |||||||||||                                       |||||||
#               +++++++++++      <<< feeding-<iterator>  +++++++



joblib.Parallel( n_jobs       = None,   # how many jobs will get instantiated
                 backend      = None,   # a method, how these will get instantiated
                 verbose      = 0,
                 timeout      = None,
                 pre_dispatch = '2 * n_jobs',
                 batch_size   = 'auto',
                 temp_folder  = None,
                 max_nbytes   = '1M',
                 mmap_mode    = 'r',
                 prefer       = None,   # None | { ‘processes’, ‘threads’ }
                 require      = None    # None | ‘sharedmem’ ~CONSTRAINTS backend


      Parallel(  n_jobs = 2 ) ( delayed( sqrt ) ( i ** 2 ) for i in range( 10 ) )
      #          ^              ^^^^^^^  ^^^^     ^^^^^^   |||
      #          |              |||||||  ||||     ||||||   vvv
      #JOBS:  -+              |||||||  ||||     ||||||   |||
      #DELAYED:        -+++++++  ||||     ||||||   |||
      #FUN( par ):           -++++     ||||||   |||
      #     |||                                   ||||||   |||
      #     +++ FUN(-signature-"declaration"-) -++++++   |||
      #     ^^^                                            |||
      #     |||                                            |||
      #     +++-<<<-<iterator>-<<<-<<<-<<<-<<<-<<<-<<<-<<<-+++

      Parallel(  n_jobs = -1 ) ( 
                 delayed( myTupleConsumingFUN ) ( # aFun( aTuple = ( a, b, c, d ) )
                           aTupleOfParametersGeneratingFUN( i ) )
                 for                                        i in range( 10 )


The default backend of joblib will run each function call in isolated Python processes, therefore they cannot mutate a common Python object defined in the main program.

However if the parallel function really needs to rely on the shared memory semantics of threads, it should be made explicit with require='sharedmem'

Keep in mind that relying a on the shared-memory semantics is probably suboptimal from a performance point of view as concurrent access to a shared Python object will suffer from lock contention.




更多信息details on ^{}


