PyPy and RPython


22 October 2014

Tim Hochberg

This talk with involve a brief discussion of PyPy, a Python interpreter implemented in Python, followed by a longer discussion of the RPython toolchain which makes this implementation not only possible, but faster than the standard C-Python interpreter. The discussion of RPython will be illustrated using a simple interpreter implemented in Python, which is then translated to native code using the RPython toolchain.

Also, there will be Mandelbrot sets.

What is PyPy

  1. An implementation of Python written in Python because...
  1. Python allows easier experimentation with new features and approaches
    • Stackless (microthreads)
    • Sandboxing
    • Software Transactional Memory
    • ...
  1. Also, over 6× faster than CPython on average!

What's the Catch?

  • PyPy relies on a JIT for speed; not effective for all code.
  • The quoted speedups are based on benchmarks. YMMV.
    • However, everything I timed was faster using PyPy.
  • Not all modules based on C (or other languages) are working.
    • There is a list of compatible modules at
    • Notably, NumPy only partially working. Also, slow.

PyPy Example

In []:

def iterate_z(re0, im0, re, im, n):
    for i in range(n):
        # Z = Z**2 + Z0
        temp = re*re - im*im + re0
        im = 2*re*im + im0
        re = temp
        if hypot(re, im) > r_max:
            return i
    return n
def grey_value(re, im):
    return int(scale * iterate_z(re, im, 0, 0, n))
def pixel(i, j):
    return grey_value((x_offset + pixel_size * i), (y_offset + pixel_size * j))
def generate_pgm():
    write("%s\n" % i_max)
    write("%s\n" % j_max)
    write("%s\n" % colour_max)
    for j in range(j_max):
        for i in range(i_max):
            write("%s\n" % pixel(i,j))
# Setup code and boilerplate stuff
# ....


  • Python 2.7.7: 15.4 seconds
  • PyPy 2.3.1 (Python 2.7.6) : 3.0 seconds

5x Faster

  • In line with benchmarks.

How is this possible?

PyPy (Python) ➡ [Magic Translator] ➡ PyPy (native code with JIT added automagically)

  • If it's possible to translate Python to fast native code, why not always do it?
  • PyPy is really written in RPython
    • RPython is a subset of Python
    • Hints are added to help the the translator

PyPy (RPython with hints) ➡ [RPython Toolchain] ➡ PyPy (native code with JIT)


  • Whole programs; not modules
  • Not fully documented
    • The exact definition is “RPython is everything that our translation toolchain can accept”
  • Many of the dynamic features of Python unavailable after module execution
In []:
def works():
    a = "five"
    b = "ten"
    c1 = a + b
    a = 5
    b = 10
    c2 = a + b
    return c1, c2

def fails(a, b):
    return a + b

fails("five", "ten")
fails(5, 10)
In []:
[translation:info] Error:
   [...about 50 lines of traceback...]
[translation:ERROR] UnionError:
[translation:ERROR] Offending annotations:
[translation:ERROR]   SomeString(const='five', no_nul=True)
[translation:ERROR]   SomeInteger(const=5, knowntype=int, nonneg=True,unsigned=False)
[translation:ERROR] Occurred processing the following simple_call:
[translation:ERROR]       (KeyError getting at the binding!)
[translation:ERROR]     v0 = simple_call((function fails), (5), (10))
[translation:ERROR] In <FunctionGraph of (mandelbrot:91)main at 0x108091088>:
[translation:ERROR] Happened at file line 96
[translation:ERROR]         c1, c2 = works()
[translation:ERROR]         write(c1)
[translation:ERROR]         write(str(c2))
[translation:ERROR]         c1 = fails("five", "ten")
[translation:ERROR] ==>     c2 = fails(5, 10)
[translation:ERROR]         write(c1)
[translation:ERROR]         write(str(c2))
[translation:ERROR]         generate_pgm()
[translation:ERROR] Known variable annotations:
[translation:ERROR] Processing block:
[translation:ERROR]  block@6 is a <class 'rpython.flowspace.flowcontext.EggBlock'>
[translation:ERROR]  in (mandelbrot:91)main
[translation:ERROR]  containing the following operations:
[translation:ERROR]        v2 = getitem(v1, (0))
[translation:ERROR]        v3 = getitem(v1, (1))
[translation:ERROR]        v4 = simple_call((function write), v2)
[translation:ERROR]        v5 = str(v3)
[translation:ERROR]        v6 = simple_call((function write), v5)
[translation:ERROR]        v7 = simple_call((function fails), ('five'), ('ten'))
[translation:ERROR]        v0 = simple_call((function fails), (5), (10))
[translation:ERROR]        v8 = simple_call((function write), v7)
[translation:ERROR]        v9 = str(v0)
[translation:ERROR]        v10 = simple_call((function write), v9)
[translation:ERROR]        v11 = simple_call((function generate_pgm))
[translation:ERROR]  --end--

Interpreter Example

  • First attempt was a basic lisp-like interpreter based on Peter Norvig's
    • Only partially successful
  • Second attempt operated on an assembly-like language.
    • Uglier
    • More successful as an experiment

(Ugly) Interpreter Example

  • Portion of assembly language program for generating the Madelbrot set
  • Equivalent to earlier iterate_z function

      label iterate_z
      #     for k in range(n):
      set k 0
      set re 0.0
      set im 0.0
      label k_loop
      #         temp = re*re - im*im + re0
      exec temp mul re re
      exec temp1 mul im im
      exec temp  sub temp temp1
      exec temp  add temp re0
      #         im = 2*re*im + im0
      exec   im   mul 2.0 im
      exec   im   mul im re
      exec   im   add im im0
      #         re = temp
      set re  temp
      #         if hypot(re, im) > r_max:
      exec  mag  hypot re im
      exec      flag gt   mag  r_max
      branchif  flag  return_from_iterate
      # end of loop
      exec   k add k 1
      exec      flag   lt k n
      branchif  flag    k_loop
      #             return n
      set k n
      jump return_from_iterate

Main Loop

In []:
def execute(program):
    code, mem = compile(program)
    pc = 0
        while pc < len(program):
            jitdriver.jit_merge_point(pc=pc, mem=mem, code=code, program=program)
            op = opcode(code, pc)
            if op == C_SET:
                a = opcode(code, pc+1)
                b = opcode(code, pc+2)
                mem[a] = mem[b]
                pc += 3
            elif op == C_EXEC_1:
                symbol = opcode(code, pc+1)
                op = opcode(code, pc+2)
                a = mem[opcode(code, pc+3)]
                    mem[symbol] = monops[op](a)
                except KeyError:
                    raise ValueError("illegal unary op: %s" % op)
                pc += 4
            elif op == C_EXEC_2:
                symbol = opcode(code, pc+1)
                op = opcode(code, pc+2)
                a = mem[opcode(code, pc+3)]
                b = mem[opcode(code, pc+4)]
                    mem[symbol] = binops[op](a, b)
                except KeyError:
                    raise ValueError("illegal binary op: %s" % op)
                pc += 5
            elif op == C_BRANCHIF:
                x = opcode(code, pc+1)
                loc = opcode(code, pc+2)
                if int_value(mem[x]): 
                    pc = loc
                    pc += 3
            elif op == C_JUMP:
                loc = opcode(code, pc+1)
                pc = loc
            elif op == C_DISPLAY:
                arg = opcode(code, pc+1)
                os.write(stdout_fd, '%s\n' % mem[arg].as_text())
                pc += 2
            elif op == C_END:    
                pc = len(code)
                raise ValueError("unknown command")
        return 0
        os.write(stderr_fd, "ERROR at PC: %s\n" % pc)
        return 1

How Does This Perform With Python?

$ time python mandelbrot.mach > img.pgm

jeopardy theme

real 91m46.877s
user 26m4.570s
sys 0m3.152s

Over 350 times worse than our original Python program!

How Does This Perform With PyPy?

$ time pypy mandelbrot.mach > img.pgm

real 2m33.314s
user 2m29.942s
sys 0m2.939s

Now only 10 times worse than our original program

Translating using RPython

$ pypy rpython

Forty seconds and many messages to stdout later we have an executable machine-c.

$ time ./machine-c mandelbrot.mach > img.pgm

real 0m6.137s
user 0m4.528s
sys 0m1.605s

Suddenly our simple interpreter is running faster than CPython!

Adding a JIT

This involves about 5 lines of hints and boilerplate and passing the --opt==jit option to RPython.

In []:
jitdriver = JitDriver(greens=['pc', 'code', 'program'], reds=['mem'])

def jitpolicy(driver):
    from rpython.jit.codewriter.policy import JitPolicy
    return JitPolicy()

# And, at the top of the event loop (we've seen this before)
            jitdriver.jit_merge_point(pc=pc, mem=mem, code=code, program=program)

$ pypy rpython --opt=jit

Four minutes and many, many messages later we have a new machine-c executable.

JIT Improvement

$ time ./machine-c mandelbrot.mach > img.pgm

real 0m3.155s
user 0m1.561s
sys 0m1.589s

  • Another factor of two improvement:
  • almost as fast as running on PyPy
  • 5× faster than running on CPython
  • 160× faster than mandelbrot.mach running on running on PyPy
  • 1700× faster than mandelbrot.mach on running on CPython

Speed Summary
Python 2.7 15.4 5506 45095
PyPy 3.0 153 532
RPython 2.3 6.1 98
RPython JIT 3.2 504


  • PyPy
    • Fast
    • Missing some modules
  • RPython
    • Harder than writing a interpreter in Python.
    • Much easier than writing a fast interpreter in Python (or any other language that I know of).
    • Might also be useful for writing some standalone applications.


Tim Hochberg