Battle of the Queues

Beanstalkd

Sarah Braden

@ifmoonwascookie

DesertPy - 25 February 2015

We are not here to talk about

a) AWS Elastic Beanstalk: http://aws.amazon.com/elasticbeanstalk/

b) Git workflow thingy: http://beanstalkapp.com/

Why Beanstalkd?

Philotic, Inc. developed beanstalk to improve the response time (reducing the latency of page views) for the Causes on Facebook application (with over 9.5 million users)

Designed from the ground up to be a work queue running tasks asynchronously

http://kr.github.io/beanstalkd/

Client Libraries

In 23 different languages https://github.com/kr/beanstalkd/wiki/client-libraries

Python

pybeanstalk

beanstalkc - The one I use

beanstalkt - An async beanstalkd client for Tornado (a Python web framework and asynchronous networking library)

beanstalkc

A simple beanstalkd client library for Python: https://github.com/earl/beanstalkc

Caveat: beanstalkc is currently only supported on Python 2 and automatically tested against Python 2.6 and 2.7. Python 3 is not (yet) supported.

How to install

Debian and Ubuntu

sudo apt-get install beanstalkd

Homebrew for Mac

brew install beanstalkd

Windows

no

Overview

Beanstalkd Terminology

Worker - a client which connects to the message server to reserve, delete and bury jobs.

Provider - a client which connects to the message server to create jobs.

Persistency

Redundancy is handled on the client side and if a server goes down you will lose jobs.

Beanstalkd does include an option to store jobs in a binary log, by launching Beanstalkd with the -b option.

You can restoring the queue manually by restarting beanstalkd with the -b option and it will recover the contents of the log (requires access to the server disks).

Communication

Event based networking - can handle lots of incoming connections.

Communicates via PUSH sockets providing communication between providers and workers.

When a provider enqueues a job, a worker can reserve it immediately if it is connected and ready.

Jobs are reserved until a worker has sent a response (delete, bury, etc.)

Fidelity

Queues are FIFO (first in, first out).

A reserved job reaches only 1 worker. If the connection is lost or the work returns it, only then it will be available to other workers to be reserved. If a worker finished the job, it will delete it.

Unless a worker has timed out, two or more workers will never run the same job in parallel.

Distribution

The beanstalkd server doesn’t know anything about other beanstalkd instances that are running.

Prioritization

Jobs with higher importance can be prioritised which will affect the order in which jobs are dequeued.

Supports TTR (time-to-run)

If a job takes more than the defined TTR, it will be available to other consumers even if the original consumer didn't finish yet.

When a job has been reserved. a timer starts counting down from the job's TTR (default = 120 seconds).

  • If the timer reaches zero, the job gets put back in the ready queue for another task to run it.
  • If the job is buried, deleted, or released before the timer runs out, the timer ceases to exist.
  • If the job is 'touched' before the timer reaches zero, the timer starts over counting down from TTR.
  • Message State Terminology

    Buried Status

    Puts a job in a failed state. The job cannot be reprocessed until it is manually kicked back into the queue. No auto retry.

    "Kicking" (a job) returns a previously buried job to the queue ready for workers to pick up.

    Reserved Status

    Delivers a job to a worker and locks it from being delivered to another worker.

    Delayed status

    Defers a job from being sent to a worker for a predetermined amount of time.

    In []:
       put with delay               release with delay
      ----------------> [DELAYED] <------------.
                            |                   |
                            | (time passes)     |
                            |                   |
       put                  v     reserve       |       delete
      -----------------> [READY] ---------> [RESERVED] --------> *poof*
                           ^  ^                |  |
                           |   \  release      |  |
                           |    `-------------'   |
                           |                      |
                           | kick                 |
                           |                      |
                           |       bury           |
                        [BURIED] <---------------'
                           |
                           |  delete
                            `--------> *poof*
    

    Run beanstalkd

    Type into the terminal to start it (default port = 11300)

    beanstalkd

    Beanstalkc Example

    Besides having beanstalkc installed, you'll typically also need PyYAML.

    pip install pyyaml

    In [1]:
    import beanstalkc
    

    Set up a connection to an (already running) beanstalkd server

    In [20]:
    beanstalk = beanstalkc.Connection(host='localhost', port=11300)
    

    Enqueue a job

    In [21]:
    beanstalk.put('hello!')
    
    Out[21]:
    2
    

    Request a job

    In [22]:
    job = beanstalk.reserve()
    

    Get job body so you can do tasks

    In [23]:
    job.body
    
    Out[23]:
    'hello!'
    

    Once you are done with processing a job, you have to mark it as done, otherwise jobs are re-queued by beanstalkd after a "time to run" (120 seconds, per default) is surpassed. A job is marked as done, by calling delete.

    In [24]:
    job.delete()
    

    Tubes

    Different queues are called tubes

    In [25]:
    beanstalk.tubes()
    
    Out[25]:
    ['default']
    

    Which tube are you using?

    In [26]:
    beanstalk.using()
    
    Out[26]:
    'default'
    

    If you decide to use a tube, that does not yet exist, the tube is automatically created by beanstalkd

    In [27]:
    beanstalk.use('foo_tube')
    beanstalk.use('another_tube')
    
    Out[27]:
    'another_tube'
    

    Tubes that don't have any client using or watching, vanish automatically. a beanstalkd client (aka beanstalk c) can choose many tubes to reserve jobs from. These tubes are "watched" by the client. To see what tubes you are currently watching:

    In [28]:
    beanstalk.watch('default')
    
    Out[28]:
    1
    
    In [29]:
    beanstalk.watching()
    
    Out[29]:
    ['default']
    
    In [30]:
    beanstalk.watch('foo_tube')
    
    Out[30]:
    2
    
    In [31]:
    beanstalk.watching()
    
    Out[31]:
    ['default', 'foo_tube']
    

    To stop watching a tube

    In [32]:
    beanstalk.ignore('default')
    
    Out[32]:
    1
    
    In [33]:
    beanstalk.watching()
    
    Out[33]:
    ['foo_tube']
    

    Close the connection

    In []:
    beanstalk.close()
    

    Monitoring Queues

    Use linux 'watch' command and queueit cli (https://github.com/chexov/queueit)

    watch -n5 QUEUEIT_HOST=hostname.com q-stat
    

    Other Tools available: https://github.com/kr/beanstalkd/wiki/Tools

    Slides generated using reveal.js

    ipython nbconvert beanstalk_talk.ipynb --to slides --reveal-prefix "http://cdn.jsdelivr.net/reveal.js/2.6.2" --post serve