

2018-08-26  本文已影响6人  orisonchan


1 Spout的可靠性

public interface ISpout extends Serializable {

     * Called when a task for this component is initialized within a worker on the cluster.
     * It provides the spout with the environment in which the spout executes.
     * This includes the:
     * @param conf The Storm configuration for this spout. This is the configuration provided to the topology merged in with cluster configuration on this machine.
     * @param context This object can be used to get information about this task's place within the topology, including the task id and component id of this task, input and output information, etc.
     * @param collector The collector is used to emit tuples from this spout. Tuples can be emitted at any time, including the open and close methods. The collector is thread-safe and should be saved as an instance variable of this spout object.
    void open(Map conf, TopologyContext context, SpoutOutputCollector collector);

     * When this method is called, Storm is requesting that the Spout emit tuples to the 
     * output collector. This method should be non-blocking, so if the Spout has no tuples
     * to emit, this method should return. nextTuple, ack, and fail are all called in a tight
     * loop in a single thread in the spout task. When there are no tuples to emit, it is courteous
     * to have nextTuple sleep for a short amount of time (like a single millisecond)
     * so as not to waste too much CPU.
    void nextTuple();

     * Storm has determined that the tuple emitted by this spout with the msgId identifier
     * has been fully processed. Typically, an implementation of this method will take that
     * message off the queue and prevent it from being replayed.
    void ack(Object msgId);

     * The tuple emitted by this spout with the msgId identifier has failed to be
     * fully processed. Typically, an implementation of this method will put that
     * message back on the queue to be replayed at a later time.
    void fail(Object msgId);


在open中提供了一个参数叫SpoutOutputCollector,该collector是专门用于spout发送消息的,其中提供了一个方法叫List<Integer> emit(List<Object> tuple, Object messageId)。当然使用没有messageId的emit()也可,但是这样就不会触发ack机制。使用带有messageId的emit()方法后,该ID就会随着元组从拓扑传下去。这是ack机制的基础。

2 ack机制详细定义




2.1 实现原理


3 bolt的可靠性

3.1 锚定

上文提到,元组有其衍生的元组。在IBolt的prepare()方法中,有着跟ISpout中open方法类似的一个参数是:OutputCollector collector,该collector不同于SpoutOutputCollector,是bolt用于发送元组的collector,在该collector中的emit()方法其中有一种是:List<Integer> emit(Tuple anchor, List<Object> tuple)。其中,anchor是该bolt在execute()方法中接收到的元组。而list则是要发送的元组数据。这样就会将anchor和将要发送到下个bolt的元组联系起来,称之为锚定



上一篇 下一篇

