Sarah Braden
@ifmoonwascookie
DesertPy - 25 February 2015
a) AWS Elastic Beanstalk: http://aws.amazon.com/elasticbeanstalk/
b) Git workflow thingy: http://beanstalkapp.com/
Philotic, Inc. developed beanstalk to improve the response time (reducing the latency of page views) for the Causes on Facebook application (with over 9.5 million users)
Designed from the ground up to be a work queue running tasks asynchronously
In 23 different languages https://github.com/kr/beanstalkd/wiki/client-libraries
pybeanstalk
beanstalkc - The one I use
beanstalkt - An async beanstalkd client for Tornado (a Python web framework and asynchronous networking library)
A simple beanstalkd client library for Python: https://github.com/earl/beanstalkc
Caveat: beanstalkc is currently only supported on Python 2 and automatically tested against Python 2.6 and 2.7. Python 3 is not (yet) supported.
sudo apt-get install beanstalkd
brew install beanstalkd
no
Worker - a client which connects to the message server to reserve, delete and bury jobs.
Provider - a client which connects to the message server to create jobs.
Redundancy is handled on the client side and if a server goes down you will lose jobs.
Beanstalkd does include an option to store jobs in a binary log, by launching Beanstalkd with the -b option.
You can restoring the queue manually by restarting beanstalkd with the -b option and it will recover the contents of the log (requires access to the server disks).
Event based networking - can handle lots of incoming connections.
Communicates via PUSH sockets providing communication between providers and workers.
When a provider enqueues a job, a worker can reserve it immediately if it is connected and ready.
Jobs are reserved until a worker has sent a response (delete, bury, etc.)
Queues are FIFO (first in, first out).
A reserved job reaches only 1 worker. If the connection is lost or the work returns it, only then it will be available to other workers to be reserved. If a worker finished the job, it will delete it.
Unless a worker has timed out, two or more workers will never run the same job in parallel.
The beanstalkd server doesn’t know anything about other beanstalkd instances that are running.
Jobs with higher importance can be prioritised which will affect the order in which jobs are dequeued.
If a job takes more than the defined TTR, it will be available to other consumers even if the original consumer didn't finish yet.
When a job has been reserved. a timer starts counting down from the job's TTR (default = 120 seconds).
Puts a job in a failed state. The job cannot be reprocessed until it is manually kicked back into the queue. No auto retry.
"Kicking" (a job) returns a previously buried job to the queue ready for workers to pick up.
Delivers a job to a worker and locks it from being delivered to another worker.
Defers a job from being sent to a worker for a predetermined amount of time.
put with delay release with delay
----------------> [DELAYED] <------------.
| |
| (time passes) |
| |
put v reserve | delete
-----------------> [READY] ---------> [RESERVED] --------> *poof*
^ ^ | |
| \ release | |
| `-------------' |
| |
| kick |
| |
| bury |
[BURIED] <---------------'
|
| delete
`--------> *poof*
Type into the terminal to start it (default port = 11300)
beanstalkd
Besides having beanstalkc installed, you'll typically also need PyYAML.
pip install pyyaml
import beanstalkc
Set up a connection to an (already running) beanstalkd server
beanstalk = beanstalkc.Connection(host='localhost', port=11300)
Enqueue a job
beanstalk.put('hello!')
2
Request a job
job = beanstalk.reserve()
Get job body so you can do tasks
job.body
'hello!'
Once you are done with processing a job, you have to mark it as done, otherwise jobs are re-queued by beanstalkd after a "time to run" (120 seconds, per default) is surpassed. A job is marked as done, by calling delete.
job.delete()
Different queues are called tubes
beanstalk.tubes()
['default']
Which tube are you using?
beanstalk.using()
'default'
If you decide to use a tube, that does not yet exist, the tube is automatically created by beanstalkd
beanstalk.use('foo_tube')
beanstalk.use('another_tube')
'another_tube'
Tubes that don't have any client using or watching, vanish automatically. a beanstalkd client (aka beanstalk c) can choose many tubes to reserve jobs from. These tubes are "watched" by the client. To see what tubes you are currently watching:
beanstalk.watch('default')
1
beanstalk.watching()
['default']
beanstalk.watch('foo_tube')
2
beanstalk.watching()
['default', 'foo_tube']
To stop watching a tube
beanstalk.ignore('default')
1
beanstalk.watching()
['foo_tube']
Close the connection
beanstalk.close()
Use linux 'watch' command and queueit cli (https://github.com/chexov/queueit)
watch -n5 QUEUEIT_HOST=hostname.com q-stat
Other Tools available: https://github.com/kr/beanstalkd/wiki/Tools
Slides generated using reveal.js
ipython nbconvert beanstalk_talk.ipynb --to slides --reveal-prefix "http://cdn.jsdelivr.net/reveal.js/2.6.2" --post serve