探究Python多进程编程下线程之间变量的共享问题

 1、问题:

群中有同学贴了如下一段代码,问为何 list 最后打印的是空值?

from multiprocessing import Process, Manager
import os

manager = Manager()
vip_list = []
#vip_list = manager.list()

def testFunc(cc):
  vip_list.append(cc)
  print 'process id:', os.getpid()

if __name__ == '__main__':
  threads = []

  for ll in range(10):
    t = Process(target=testFunc, args=(ll,))
    t.daemon = True
    threads.append(t)

  for i in range(len(threads)):
    threads[i].start()

  for j in range(len(threads)):
    threads[j].join()

  print "------------------------"
  print 'process id:', os.getpid()
  print vip_list

其实如果你了解 python 的多线程模型,GIL 问题,然后了解多线程、多进程原理,上述问题不难回答,不过如果你不知道也没关系,跑一下上面的代码你就知道是什么问题了。

python aa.py
process id: 632
process id: 635
process id: 637
process id: 633
process id: 636
process id: 634
process id: 639
process id: 638
process id: 641
process id: 640
------------------------
process id: 619
[]

将第 6 行注释开启,你会看到如下结果:

process id: 32074
process id: 32073
process id: 32072
process id: 32078
process id: 32076
process id: 32071
process id: 32077
process id: 32079
process id: 32075
process id: 32080
------------------------
process id: 32066
[3, 2, 1, 7, 5, 0, 6, 8, 4, 9]

2、python 多进程共享变量的几种方式:
(1)Shared memory:
Data can be stored in a shared memory map using Value or Array. For example, the following code

http://docs.python.org/2/library/multiprocessing.html#sharing-state-between-processes

from multiprocessing import Process, Value, Array

def f(n, a):
  n.value = 3.1415927
  for i in range(len(a)):
    a[i] = -a[i]

if __name__ == '__main__':
  num = Value('d', 0.0)
  arr = Array('i', range(10))

  p = Process(target=f, args=(num, arr))
  p.start()
  p.join()

  print num.value
  print arr[:]

结果:

3.1415927
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]

(2)Server process:

A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies.
A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Queue, Value and Array.
代码见开头的例子。

http://docs.python.org/2/library/multiprocessing.html#managers
3、多进程的问题远不止这么多:数据的同步

看段简单的代码:一个简单的计数器:

from multiprocessing import Process, Manager
import os

manager = Manager()
sum = manager.Value('tmp', 0)

def testFunc(cc):
  sum.value += cc

if __name__ == '__main__':
  threads = []

  for ll in range(100):
    t = Process(target=testFunc, args=(1,))
    t.daemon = True
    threads.append(t)

  for i in range(len(threads)):
    threads[i].start()

  for j in range(len(threads)):
    threads[j].join()

  print "------------------------"
  print 'process id:', os.getpid()
  print sum.value

结果:

------------------------
process id: 17378
97

也许你会问:WTF?其实这个问题在多线程时代就存在了,只是在多进程时代又杯具重演了而已:Lock!

from multiprocessing import Process, Manager, Lock
import os

lock = Lock()
manager = Manager()
sum = manager.Value('tmp', 0)

def testFunc(cc, lock):
  with lock:
    sum.value += cc

if __name__ == '__main__':
  threads = []

  for ll in range(100):
    t = Process(target=testFunc, args=(1, lock))
    t.daemon = True
    threads.append(t)

  for i in range(len(threads)):
    threads[i].start()

  for j in range(len(threads)):
    threads[j].join()

  print "------------------------"
  print 'process id:', os.getpid()
  print sum.value

这段代码性能如何呢?跑跑看,或者加大循环次数试一下。。。
4、最后的建议:

Note that usually sharing data between processes may not be the best choice, because of all the synchronization issues; an approach involving actors exchanging messages is usually seen as a better choice. See also Python documentation: As mentioned above, when doing concurrent programming it is usually best to avoid using shared state as far as possible. This is particularly true when using multiple processes. However, if you really do need to use some shared data then multiprocessing provides a couple of ways of doing so.

5、Refer:

http://stackoverflow.com/questions/14124588/python-multiprocessing-shared-memory

http://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing/

http://docs.python.org/2/library/multiprocessing.html#multiprocessing.sharedctypes.synchronized

(0)

相关推荐

  • python和shell变量互相传递的几种方法

    python -> shell: 1.环境变量 复制代码 代码如下: import os  var=123或var='123'os.environ['var']=str(var)  #environ的键值必须是字符串   os.system('echo $var') 复制代码 代码如下: import os  var=123或var='123'os.environ['var']=str(var)  #environ的键值必须是字符串  os.system('echo $var') 2.字符串连接

  • 从局部变量和全局变量开始全面解析Python中变量的作用域

    理解全局变量和局部变量 1.定义的函数内部的变量名如果是第一次出现, 且在=符号前,那么就可以认为是被定义为局部变量.在这种情况下,不论全局变量中是否用到该变量名,函数中使用的都是局部变量.例如: num = 100 def func(): num = 123 print num func() 输出结果是123.说明函数中定义的变量名num是一个局部变量,覆盖全局变量.再例如: num = 100 def func(): num += 100 print num func() 输出结果是:Unb

  • 详解Python中的变量及其命名和打印

    在程序中,变量就是一个名称,让我们更加方便记忆. cars = 100 space_in_a_car = 4.0 drivers = 30 passengers = 90 cars_not_driven = cars - drivers cars_driven = drivers carpool_capacity = cars_driven * space_in_a_car average_passengers_per_car = passengers / cars_driven print "

  • python中的实例方法、静态方法、类方法、类变量和实例变量浅析

    注:使用的是Python2.7. 一.实例方法 实例方法就是类的实例能够使用的方法.如下: 复制代码 代码如下: class Foo:    def __init__(self, name):        self.name = name    def hi(self):        print self.name if __name__ == '__main__':    foo01 = Foo('letian')    foo01.hi()    print type(Foo)    p

  • 实例讲解Python中global语句下全局变量的值的修改

    Python的全局变量:int string, list, dic(map) 如果存在global就能够修改它的值.而不管这个global是否是存在于if中,也不管这个if是否能够执行到. 但是,如果没有 if bGlobal: global g_strVal; int string 将会报错.而list dic(map)是ok的. #!/usr/bin/dev python import sys import os g_nVal = 0; g_strVal = "aaaa"; g_m

  • python中查看变量内存地址的方法

    本文实例讲述了python中查看变量内存地址的方法.分享给大家供大家参考.具体实现方法如下: 这里可以使用id >>> print id.__doc__ id(object) -> integer Return the identity of an object. This is guaranteed to be unique among simultaneously existing objects. (Hint: it's the object's memory address

  • Python环境变量设置方法

    Alias Maya中的脚本语言是Mel 和 Python,据说Houdini未来也会把Python作为主要的脚本语言,作为影视特效师,掌握Python语言是必备技能:虽然Maya内置了Python运行时,但是,如果要系统学习Python语言,环境变量还是需要配置一下~ 默认情况下,在windows下安装python之后,系统不会自动添加相应的环境变量.此时在命令行输入python命令是不能执行的,配置方法如下: 1. 首先需要在系统中注册python环境变量:假设python的安装路径为c:\

  • 深入理解Python变量与常量

    变量是计算机内存中的一块区域,变量可以存储规定范围内的值,而且值可以改变.基于变量的数据类型,解释器会分配指定内存,并决定什么数据可以被存储在内存中.常量是一块只读的内存区域,常量一旦被初始化就不能被改变. 变量命名字母.数字.下划线组成,不能以数字开头,前文有说不在赘述. 变量赋值 Python中的变量不需要声明,变量的赋值操作即是变量的声明和定义的过程.每个变量在内存中创建都包括变量的标识.名称.和数据这些信息. Python中一次新的赋值,将创建一个新的变量.即使变量的名称相同,变量的标识

  • python实现同时给多个变量赋值的方法

    本文实例讲述了python实现同时给多个变量赋值的方法.分享给大家供大家参考.具体分析如下: python中可以同时给多个变量赋值,下面列举了三种方法 # Assign values directly a, b = 0, 1 assert a == 0 assert b == 1 # Assign values from a list (r,g,b) = ["Red","Green","Blue"] assert r == "Red&q

  • Python 专题六 局部变量、全局变量global、导入模块变量

    定义在函数内的变量有局部作用域,在一个模块中最高级别的变量有全局作用域.本文主要讲述全局变量.局部变量和导入模块变量的方法. 参考:<Python核心编程 (第二版)> 一. 局部变量 声明适用的程序的范围被称为了声明的作用域.在一个过程中,如果名字在过程的声明之内,它的出现即为过程的局部变量:否则出现即为非局部.例: def foo(x): print 'x = ',x x = 200 print 'Changed in foo(), x = ',x x = 100 foo(x) print

随机推荐