蓄水池算法

2018-04-23  本文已影响0人  Jonddy
问题描述:

给你一个长度为N的链表。N很大,但你不知道N有多大。你的任务是从这N个元素中随机取出k个元素。你只能遍历这个链表一次。你的算法必须保证取出的元素恰好有k个,且它们是完全随机的(出现概率均等)。

该算法是针对从一个序列中随机抽取不重复的k个数,保证每个数被抽取到的概率为k/n这个问题而构建的。做法是:

伪代码:

Init : a reservoir with the size: k
        for    i= k+1 to N
            M=random(1, i);
            if( M < k)
                 SWAP the Mth value and ith value
       end for
证明每个数被取到的概率为k/n:
蓄水池算法证明

代码:

import time
import random
import copy
 
def reservoirSampling(seq, k):
    localSeq = copy.deepcopy(seq)
    N = len(localSeq)  
    for i in range(k, N, 1):
        M = int(random.uniform(0, i))
        if M < k :
            temp = copy.deepcopy(localSeq[M])
            localSeq[M] = copy.deepcopy(localSeq[i])
            localSeq[i] = temp
    return localSeq[0:k]
     
a = [4,5,6,3,4,7,7,4,3,3,2,4,5,5,6,9,5,4,3,45,3,23,44,55,33,5,8]
k = 5
 
print reservoirSampling(a, k)

import random

def sample(iterable, n):
    """
    Returns @param n random items from @param iterable.
    """
    reservoir = []
    for t, item in enumerate(iterable):
        if t < n:
            reservoir.append(item)
        else:
            m = random.randint(0,t)
            if m < n:
                reservoir[m] = item
    return reservoir


转自链接:https://www.jianshu.com/p/858cd54238b2

上一篇 下一篇

猜你喜欢

热点阅读