Battle of the Queues

Beanstalkd

Sarah Braden

@ifmoonwascookie

DesertPy - 25 February 2015

We are not here to talk about

a) AWS Elastic Beanstalk: http://aws.amazon.com/elasticbeanstalk/

b) Git workflow thingy: http://beanstalkapp.com/

Why Beanstalkd?

Philotic, Inc. developed beanstalk to improve the response time (reducing the latency of page views) for the Causes on Facebook application (with over 9.5 million users)

Designed from the ground up to be a work queue running tasks asynchronously

http://kr.github.io/beanstalkd/

Client Libraries

In 23 different languages https://github.com/kr/beanstalkd/wiki/client-libraries

Python

pybeanstalk

beanstalkc - The one I use

beanstalkt - An async beanstalkd client for Tornado (a Python web framework and asynchronous networking library)

beanstalkc

A simple beanstalkd client library for Python: https://github.com/earl/beanstalkc

Caveat: beanstalkc is currently only supported on Python 2 and automatically tested against Python 2.6 and 2.7. Python 3 is not (yet) supported.

How to install

Debian and Ubuntu

sudo apt-get install beanstalkd

Homebrew for Mac

brew install beanstalkd

Windows

no

Overview

Beanstalkd Terminology

Worker - a client which connects to the message server to reserve, delete and bury jobs.

Provider - a client which connects to the message server to create jobs.

Persistency

Redundancy is handled on the client side and if a server goes down you will lose jobs.

Beanstalkd does include an option to store jobs in a binary log, by launching Beanstalkd with the -b option.

You can restoring the queue manually by restarting beanstalkd with the -b option and it will recover the contents of the log (requires access to the server disks).

Communication

Event based networking - can handle lots of incoming connections.

Communicates via PUSH sockets providing communication between providers and workers.

When a provider enqueues a job, a worker can reserve it immediately if it is connected and ready.

Jobs are reserved until a worker has sent a response (delete, bury, etc.)

Fidelity

Queues are FIFO (first in, first out).

A reserved job reaches only 1 worker. If the connection is lost or the work returns it, only then it will be available to other workers to be reserved. If a worker finished the job, it will delete it.

Unless a worker has timed out, two or more workers will never run the same job in parallel.

Distribution

The beanstalkd server doesn’t know anything about other beanstalkd instances that are running.

Prioritization

Jobs with higher importance can be prioritised which will affect the order in which jobs are dequeued.

Supports TTR (time-to-run)

If a job takes more than the defined TTR, it will be available to other consumers even if the original consumer didn't finish yet.

When a job has been reserved. a timer starts counting down from the job's TTR (default = 120 seconds).

If the timer reaches zero, the job gets put back in the ready queue for another task to run it.

If the job is buried, deleted, or released before the timer runs out, the timer ceases to exist.

If the job is 'touched' before the timer reaches zero, the timer starts over counting down from TTR.

Message State Terminology

Buried Status

Puts a job in a failed state. The job cannot be reprocessed until it is manually kicked back into the queue. No auto retry.

"Kicking" (a job) returns a previously buried job to the queue ready for workers to pick up.

Reserved Status

Delivers a job to a worker and locks it from being delivered to another worker.

Delayed status

Defers a job from being sent to a worker for a predetermined amount of time.

In []:

   put with delay               release with delay
  ----------------> [DELAYED] <------------.
                        |                   |
                        | (time passes)     |
                        |                   |
   put                  v     reserve       |       delete
  -----------------> [READY] ---------> [RESERVED] --------> *poof*
                       ^  ^                |  |
                       |   \  release      |  |
                       |    `-------------'   |
                       |                      |
                       | kick                 |
                       |                      |
                       |       bury           |
                    [BURIED] <---------------'
                       |
                       |  delete
                        `--------> *poof*

Run beanstalkd

Type into the terminal to start it (default port = 11300)

beanstalkd

Beanstalkc Example

Besides having beanstalkc installed, you'll typically also need PyYAML.

pip install pyyaml

In [1]:

import beanstalkc

Set up a connection to an (already running) beanstalkd server

In [20]:

beanstalk = beanstalkc.Connection(host='localhost', port=11300)

Enqueue a job

In [21]:

beanstalk.put('hello!')

Out[21]:

Request a job

In [22]:

job = beanstalk.reserve()

Get job body so you can do tasks

In [23]:

job.body

Out[23]:

'hello!'

Once you are done with processing a job, you have to mark it as done, otherwise jobs are re-queued by beanstalkd after a "time to run" (120 seconds, per default) is surpassed. A job is marked as done, by calling delete.

In [24]:

job.delete()

Tubes

Different queues are called tubes

In [25]:

beanstalk.tubes()

Out[25]:

['default']

Which tube are you using?

In [26]:

beanstalk.using()

Out[26]:

'default'

If you decide to use a tube, that does not yet exist, the tube is automatically created by beanstalkd

In [27]:

beanstalk.use('foo_tube')
beanstalk.use('another_tube')

Out[27]:

'another_tube'

Tubes that don't have any client using or watching, vanish automatically. a beanstalkd client (aka beanstalk c) can choose many tubes to reserve jobs from. These tubes are "watched" by the client. To see what tubes you are currently watching:

In [28]:

beanstalk.watch('default')

Out[28]:

In [29]:

beanstalk.watching()

Out[29]:

['default']

In [30]:

beanstalk.watch('foo_tube')

Out[30]:

In [31]:

beanstalk.watching()

Out[31]:

['default', 'foo_tube']

To stop watching a tube

In [32]:

beanstalk.ignore('default')

Out[32]:

In [33]:

beanstalk.watching()

Out[33]:

['foo_tube']

Close the connection