多线程与多进程

2018-12-30  本文已影响0人  田小田txt

一、多线程:

模拟多线程爬虫(并发爬取列表页和详情页

import time
import threading

# 爬取详情页
def get_detail_html(url):
    print("get detail html started")
    time.sleep(2)
    print("get detail html end")

 # 从列表页爬取详情页url
 def get_detail_url(url):
      print("get detail url started")
      time.sleep(4)
      print("get detail url end")

  class GetDetailHtml(threading.Thread):
     def __init__(self, name):
          super().__init__(name=name)

      def run(self):
          print("get detail html started")
          time.sleep(2)
          print("get detail html end")

  class GetDetailUrl(threading.Thread):
      def __init__(self, name):
          super().__init__(name=name)

      def run(self):
          print("get detail url started")
          time.sleep(4)
          print("get detail url end")

  if  __name__ == "__main__":
      thread1 = GetDetailHtml("get_detail_html")
      thread2 = GetDetailUrl("get_detail_url")
      start_time = time.time()
      thread1.start()
      thread2.start()

      thread1.join()    # 等待完成后再继续执行下面的
      thread2.join()

      # 当主线程退出的时候,子线程才会杀死
      print ("last time: {}".format(time.time() - start_time))

二、线程池:

使用线程池实现线程重用、状态与返回值管理(使用done方法当一个线程完成的时候主线程能立即知道)
futures包中多线程与多进程接口一致,能减少开发难度
task的返回容器:Future对象(当时未完成,但完成后可以通过对象获取结果)。

  from concurrent.futures import ThreadPoolExecutor
  import time

  def get_html(times):
      time.sleep(times)
      print("get page {} success".format(times))
      return times

  executor = ThreadPoolExecutor(max_workers=2)
  # 通过submit函数提交执行的函数到线程池中, 立即返回
  task1 = executor.submit(get_html, (3))
  task2 = executor.submit(get_html, (2))
  task1.done()            # 获取task1执行状态
  task1.result()          # 获取task1执行结果
  task2.cancel()          # 取消task2执行

三、多进程:

  import time
  from multiprocessing import Process, Queue, Pool

  def producer(queue):
      queue.put("a")
      time.sleep(2)

  def consumer(queue):
      time.sleep(2)
      data = queue.get()
print(data)

  queue = Queue(10)
  my_producer = Process(target=producer, args=(queue,))
  my_consumer = Process(target=consumer, args=(queue,))
  my_producer.start()
  my_consumer.start()
  my_producer.join()

  my_consumer.join()

四、进程池:

  from multiprocessing import Process, Manager

  def producer(queue):
      queue.put("a")
      time.sleep(2)

  def consumer(queue):
      time.sleep(2)
      data = queue.get()
      print(data)

    queue = Manager().Queue(10)
    pool = Pool(2)

    pool.apply_async(producer, args=(queue,))
    pool.apply_async(consumer, args=(queue,))

    pool.close()
    pool.join()

  from queue import Queue                 # 多线程
  from multiprocessing import Queue       # 多进程
  from multiprocessing import Manager     # 进程池
上一篇 下一篇

猜你喜欢

热点阅读