Codon IR: a framework for optimizing Python code

Table of contents

Compilers are complex pieces of software that involve many stages, from parsing code to type checking to performing analyses and transformations, and ultimately generating code for a given target. Intermediate representations (IRs for short) are a key ingredient in almost any modern compiler. IRs help represent programs in a structured and simplified way so as to make analyses and optimizations easier. In this blog post, we’ll delve into Codon’s intermediate representation—Codon IR—including why we created it, how it works, what it can do, how it can be extended through custom IR passes, and ultimately how it serves as a framework for optimizing Python code.

Why we needed a new IR

Codon uses LLVM in its backend to perform general-purpose optimizations and ultimately generate the final code, be it for CPU, GPU or anything else. Originally, the picture looked something like this:

In other words, user code is parsed to create an abstract syntax tree, which is then converted to native code using LLVM. Now, LLVM has an IR of its own along with an expansive suite of analyses and optimizations; many standard compiler optimizations like constant propagation and dead-code elimination are left to LLVM to be performed at the LLVM IR level. LLVM also performs tasks like register allocation, which are necessary to produce the final code.

So, if LLVM has its own IR along with a broad range of utilities and features, why not just use that? Why do we need a new IR at all? The reason is due to levels of abstraction: by the time we get to LLVM IR, a lot of information about the source program is lost. For example, LLVM IR is built on the concept of basic blocks connected by branches — information about higher-level control-flow constructs like if-statements, while-loops or for-loops is lost. Similarly, type information is largely lost as well, as everything gets distilled down to fundamental types like integers and pointers, meaning it’s difficult (or potentially even impossible) to reconstruct the original semantic meaning of a given piece of generated LLVM IR.

In Codon’s optimization framework, we often found the need to reason about higher-level Python constructs, which wasn’t feasible through LLVM IR. For instance, one optimization Codon performs pertains to removing redundant dictionary accesses, e.g. as found in this common code pattern:

d[k] = d.get(k, 0) + 1  # increment count of key `k` in dict `d`

The original code performs two lookups of key k, but only one lookup is required. To address this, we want to find patterns of the form d[k] = d.get(k, i) + j and replace them with an implementation that performs just one lookup of k. It turns out that this is nearly impossible to do at the LLVM IR level as dict.__setitem__() (which corresponds to the assignment d[k] = ...) and dict.get() both compile to large, complex LLVM IR that we’d have no hope of recognizing and making sense of after the fact — in other words, the semantic meaning of the original code (i.e. “increment the count of k in d”) is effectively lost.

To solve this problem, we created Codon IR, which sits between the abstract syntax tree and LLVM IR:

Codon IR is much simpler than the AST, but still retains enough information to reason about semantics and higher-level constructs. Identifying the dictionary pattern above, for example, amounts to simply identifying a short sequence of IR that resembles dict.__setitem__(d, k, dict.get(d, k, i) + j) — we’ll get into much more detail about this later.

Codon IR

Now that we’ve discussed the motivation for Codon IR, let’s discuss what it actually is. At the most fundamental level, Codon IR is made up of various objects of different types, called nodes. Nodes represent elements from a given program, from variables to functions to control-flow like if-statements and so on. The key difference between the AST and Codon IR is that the latter tries to use as few node types as possible while still preserving the properties we want, i.e. representing the semantics of the original program, being easy to work with for optimizations and transformations etc.

Here are the different node types in Codon IR, and how they map to an actual piece of Python code:

Module – a special container node that represents the entire program, i.e. all the functions needed by the program as well as the main code that is executed when running the program (in practice, we simply put this code into a new implicit “main” function).
Type – a kind of node that is attached to all other nodes and represents the given node’s data type, be it an integer, float, string etc. One important aspect of Codon IR is that it’s fully typed, meaning we never deal with ambiguous or unknown types; type checking is done on the AST before Codon IR is generated.
Var – represents a variable in a program. Variables are usually produced from assignments like x = y, but they can also be produced from control-flow statements like for i in range(10), which creates variable i.
Func – a type of Var that represents a function.
Value – a generic node that represents values, which can be anything from constants to variable references or function calls and so on.
Instr – a type of Value that represents a specific operation, such as a function call, return statement, conditional expression (i.e. x if y else z) and so on.
Flow – represents control-flow statements like if-statements, while-loops, try-except etc. As shown in the picture, these are first-class values in Codon IR.

And that’s it — those are all the ingredients we need to represent any program with Codon IR! Let’s look at a real example of how a simple function gets translated to Codon IR; consider the following code for a simple Fibonacci function:

def fib(n):
    if n < 2:
        return 1
    else:
        return fib(n - 1) + fib(n - 2)

Here’s what the generated Codon IR looks like — note that Codon IR is represented textually using S-expressions in a Lisp-like format:

(bodied_func
  '"fib[int]"
  (type '"fib[int]")
  (args (var '"n" (type '"int") (global false)))
  (vars)
  (series
    (if (call '"int.__lt__[int,int]" '"n" 2)
      (series (return 1))
      (series
        (return
          (call
            '"int.__add__[int,int]"
            (call
              '"fib[int]"
              (call '"int.__sub__[int,int]" '"n" 1))
            (call
              '"fib[int]"
              (call '"int.__sub__[int,int]" '"n" 2))))))))

A few points to note about this IR:

bodied_func is a type of Func node that corresponds to a function defined in the program in the usual way, i.e. with def. There are other types of functions in Codon IR too, like external C functions, which correspond to a different type of Func.
series corresponds to a type of Flow that represents a sequence of instructions executed sequentially. Hence, the body of the function, as well as the true- and false-branches of the if-statement, are represented by series nodes.
if is a type of Flow that corresponds to an if-statement in the code, and contains a true-branch as well as a false-branch as children nodes.
return is a type of Instr that corresponds to a return-statement in the code.

Another observation we can make from this output is that operators like + and - are gone, which highlights an important fact about Codon IR: there is no concept of operators! Instead, these are all replaced with function calls corresponding to the appropriate Python magic method that represents the given operator. For instance, in this example, n - 1 becomes int.__sub__(n, 1) — all other operators are similarly replaced, including unary/binary operators, indexing operators as in x[i] and so on. This design decision simplifies the IR by minimizing the different types of nodes we have to take into account, without sacrificing the flexibility to recognize patterns involving operators at the IR level (e.g. to recognize a + b, we simply look for a corresponding __add__ call in the IR).

A natural question to ask about this would be: how are the primitive operators like int.__add__ or float.__sub__ implemented themselves? For that, we resort to another feature of Codon and Codon IR: inline LLVM functions. In addition to bodied_func, there’s another Func subclass called llvm_func that corresponds to functions written in LLVM IR via Codon’s @llvm annotation. For example, here’s the implementation of int.__add__:

@llvm
def __add__(self: int, other: int) -> int:
   %res = add i64 %self, %other
   ret i64 %res

The body of this function is written in LLVM IR, and Codon’s parser knows how to handle it once it sees the @llvm annotation. This allows Codon IR to be entirely self-contained — all operators and functions are defined within the IR itself.

Passes, analyses and optimizations

If you recall, our whole motivation behind Codon IR was to facilitate transformations and analyses that weren’t viable at the LLVM IR level. So, how do we accomplish this? Codon IR has a number of features and utilities for writing and executing passes — a pass is a term for an operation on the IR that involves traversing IR nodes and potentially modifying or rewriting them. Remember the dictionary lookup optimization we looked at above? Well, that’s implemented as a pass that searches for that particular IR pattern and replaces it with a call to a new function that is semantically equivalent but more efficient by performing only one key lookup.

A pass can either be a transformation or an analysis. Transformations change the IR in some way (e.g. the dictionary optimization is a transformation), whereas analyses don’t change the IR but instead deduce things or reason about it. For instance, we might want to generate a control-flow graph from an IR function, which wouldn’t involve changing the IR itself but would instead amount to traversing it to generate the CFG, which potentially could itself be consumed by subsequent transformations or analyses. The concepts of passes, transformations and analyses are not unique to Codon IR — many modern compilers (including LLVM) employ this same model.

Codon IR is implemented in C++ — each of the node types like Flow and Instr correspond to an abstract base class that is subclassed by concrete implementations like IfFlow and ReturnInstr. This structure allows us to use the visitor pattern to implement passes; in other words, a pass is simply implemented as a visitor that can be “accepted” by the different IR nodes. If you’re not familiar with the visitor pattern, don’t worry: conceptually it largely amounts to dynamically dispatching operations on the different types of IR nodes in a streamlined and self-contained way. To make this clear, let’s look at an example: the following is a simple pass that looks for patterns of the form <const> + <const> and replaces them with a single constant denoting the sum.

#include "codon/cir/transform/pass.h"
#include "codon/cir/util/irtools.h"

using namespace codon::ir;

class MyAddFolder : public transform::OperatorPass {
public:
  static const std::string KEY;
  std::string getKey() const override { return KEY; }

  void handle(CallInstr *v) override {
    auto *f = util::getFunc(v->getCallee());
    if (!f || f->getUnmangledName() != "__add__" || v->numArgs() != 2)
        return;

    auto *lhs = cast<IntConst>(v->front());
    auto *rhs = cast<IntConst>(v->back());

    if (lhs && rhs) {
      long sum = lhs->getVal() + rhs->getVal();
      v->replaceAll(v->getModule()->getInt(sum));
    }
  }
};

const std::string MyAddFolder::KEY = "my-add-folder";

Let’s break this down:

OperatorPass is a utility class that exposes various handle() methods for the different node types. To create our pass, we simply create a new subclass MyAddFolder of OperatorPass that overrides handle(CallInstr) — this handler will be called on every CallInstr in our program (CallInstr is an Instr that corresponds to a function or method call).

Every pass has a string field KEY that provides a unique name for the pass. In this case, we’ll call our pass “my-add-folder”.

Inside the handler, we first check if the called function is in fact __add__ by inspecting the function that is called by the CallInstr. If it is, then we examine the arguments to see if they’re both integer constants; we do this by attempting to cast the argument nodes to IntConst, which is a type of IR Const.

If everything checks out, we add the constant values of the arguments and replace the CallInstr with a constant integer representing the result. Note that every node in Codon IR has a method replaceAll() that replaces all references to the node with a new one. To create the new constant node, we need to get a reference to the IR Module, which itself has methods for creating new nodes such as constants, e.g. getInt() in this case, which returns a new Const node.

So, in just a few lines of code, we’ve created a simple constant folding optimization pass! One of the nice features of Codon IR is that even complex transformations or analyses can usually be expressed very succinctly; there are many examples of different passes in codon/cir, ranging from standard compiler passes like constant folding to library-specific passes like operator fusion for NumPy.

Bidirectionality – a new kind of IR

Recall that in the dictionary optimization example, we said we’d replace the dictionary key increment pattern with a new implementation that does only one key lookup. But where does that new implementation come from? Presumably it wouldn’t be present in the IR module beforehand, and generating new IR for it on the fly would be cumbersome. Wouldn’t it be nice if we could just implement this as a function that we could generate IR for as we’re doing the optimization? Well, that’s exactly what Codon IR supports, and it turns out to be tremendously useful and generalizable to many IR use cases.

We call this feature bidirectionality, because it essentially allows us to go “backwards” in the compilation process to the AST / type checking stage from the Codon IR stage. We introduced this concept formally in the Codon paper that appeared in the 2023 Compiler Construction conference. The actual API is pretty straightforward, and comes in the form of three new methods added to our IR Module:

getOrRealizeFunc(name, args): returns a Func node corresponding to the function with the given name and argument types.
getOrRealizeMethod(type, name, args): returns a Func node corresponding to the method of the given type with the given name and argument types.
getOrRealizeType(type, generics): returns a Type node corresponding to the type with the given name and given generic type parameters.

Let’s look at an example to make this concrete. Consider the following code:

def foo(x):
    return x*3 + x

def validate(x, y):
    assert y == x*4

a = foo(10)
b = foo(1.5)
c = foo('a')

Imagine we want to insert a call to validate() after every call to foo(). We can write a Codon IR pass that does this in the following way:

#include "codon/cir/transform/pass.h"
#include "codon/cir/util/irtools.h"

using namespace codon::ir;

class ValidateFoo : public transform::OperatorPass {
public:
  static const std::string KEY;
  std::string getKey() const override { return KEY; }

  void handle(AssignInstr *v) override {
    auto *M = v->getModule();
    auto *var = v->getLhs();
    auto *call = cast<CallInstr>(v->getRhs());
    if (!call)
      return;

    auto *foo = util::getFunc(call->getCallee());
    if (!foo || foo->getUnmangledName() != "foo")
      return;

    auto *arg1 = call->front();         // argument of 'foo' call
    auto *arg2 = M->Nr<VarValue>(var);  // result of 'foo' call
    auto *validate =
      M->getOrRealizeFunc("validate", {arg1->getType(), arg2->getType()});
    if (!validate)
      return;

    auto *validateCall = util::call(validate, {arg1, arg2});
    insertAfter(validateCall);  // call 'validate' after 'foo'
  }
};

const std::string ValidateFoo::KEY = "validate-foo";

A lot of the boilerplate is similar to the previous pass we wrote, but there are a few new aspects of this one:

This time, our handler takes an AssignInstr, which represents a variable assignment in the program.
Once we verify that the assignment has the form var = foo(...), we use getOrRealizeFunc() to instantiate the validate() function with argument types corresponding to the type of the argument to foo() and the type of the assigned variable. Note that this will actually create three separate instances of validate() in our example code, since each foo() call has a different argument type.
Finally we call insertAfter() (a utility method of OperatorPass) to insert a call to our newly-created validate() function directly after the AssignInstr we’re processing.

Bidirectionality is a very useful feature and is used heavily in Codon IR passes. The reasons for this, as mentioned in the linked paper, are:

It allows large portions of complex IR transformations to be implemented directly in Codon. For instance, generating IR manually for the validate() calls in our example above, combined with the fact that we needed to call the function on different input types, would have been a lot more involved than using the simple getOrRealizeFunc() API.

New instantiations of user-defined data types can be generated on demand. For instance, an optimization that requires the use of a dictionary can instantiate the dict type for the appropriate key and value types. Instantiating types or functions is a non-trivial process that requires a full re-invocation of the type checker due to cascading realizations and specializations.

The IR can take full advantage of Codon’s high-level type system. IR passes can also themselves be generic, using Codon’s expressive type system to operate on a variety of types — for instance, in our validate() example, the pass was agnostic to the argument types.

Extending Codon IR with plugins

Codon supports plugins that can be loaded dynamically. Plugins can contain IR passes that can be inserted into the Codon IR pass pipeline. To illustrate this, let’s write a simple plugin for our validate pass above — the first step is to create a plugin.toml file that provides some general information about the plugin:

[about]
name = "MyValidate"
description = "my validation plugin"
version = "0.0.1"
url = "https://example.com"
supported = ">=0.18.0"

[library]
cpp = "build/libmyvalidate"

Most of this is pretty self-explanatory: the “about” section has general information like the plugin name, version, supported Codon versions etc. The “library” section lists the C++ and Codon libraries included as part of the plugin. When we build the C++ code containing the IR pass, we’ll generate a shared library build/libmyvalidate.so, which will be dynamically loaded by Codon when loading the plugin.

The next step is to incorporate the pass implementation we wrote above. We can use the same pass code, but how do we tell Codon to load this pass and invoke it during the compilation process? We can achieve this by creating a subclass of the codon::DSL class and adding a function load() to our library that returns an instance of it — the load() function is invoked by Codon automatically when it loads a plugin from a shared library:

class MyValidate : public codon::DSL {
public:
  void addIRPasses(transform::PassManager *pm, bool debug) override {
    std::string insertBefore = debug ? "" : "core-folding-pass-group";
    pm->registerPass(std::make_unique<ValidateFoo>(), insertBefore);
  }
};

extern "C" std::unique_ptr<codon::DSL> load() {
  return std::make_unique<MyValidate>();
}

The DSL class has methods for adding Codon IR passes, LLVM IR passes and even new syntax features like keywords (hence the name DSL, since this class effectively enables the creation of domain-specific languages within Codon). In this case, we insert our pass into the Codon IR pass manager before the standard folding pass.

We also need a CMakeLists.txt which specifies how to build the plugin with CMake. This, along with all the code for this example, can be found in this GitHub repo — feel free to give it a try or extend it with new functionality.

In summary, plugins are a convenient way to make Codon IR extensible without having to recompile Codon itself, and offer a flexible way to write and test IR passes for Codon.

Applications of Codon IR

Now that we’ve gone into depth about Codon IR, let’s look at a few applications that showcase some of its capabilities.

Optimizing Python code patterns

The most immediate use case that we’ve already alluded to is optimizing common yet inefficient patterns we find in Python code. One example is the dictionary key increment we saw above. Another example has to do with list concatenations like this:

a = [1, 2, 3]
b = [4, 5]
c = b + a[1:-1] + [42,] + b[::-1]

While Python’s syntax for manipulating and concatenating lists is clean and concise, executing this code naively would leave a lot of performance on the table, especially if the operation is being performed in a loop. The reason for this is the multitude of intermediate, temporary lists we’d be creating: each component of the sum on the last line would be a new list object we’d be creating, adding elements to, and then never using again after that line of code is executed.

So, as we did for the dictionary optimization, we can create a Codon IR pass to recognize and optimize this pattern. You can actually view the source of ListAdditionOptimization (the name of this optimization pass) in codon/cir/transform/pythonic/list.cpp in the Codon repo. The pass works by recognizing expressions of the form <list_1> + <list_2> + … + <list_N> and then classifying each of the list_i expressions as one of the following:

Slice of the form x[a:b:c]
Literal of the form [a_1, a_2, …, a_M]
Any other list expression x

Then, instead of creating intermediate list objects, the pass will generate new code that pre-allocates an empty list of the correct size and simply appends elements to it as appropriate:

For slices: for i in range(a, b, c): result.append(x[i])
For literals: result.append(a_1); …; result.append(a_M)
For general list expressions: for a in x: result.append(a)

And as a result, no intermediate objects need to be created! One real-world example where this comes up is in our set partition benchmark, which makes heavy use of list slicing and concatenation. This optimization gives upwards of a 70% performance improvement on that benchmark.

Of course, the story doesn’t end with just lists and dictionaries; there are many patterns that have corresponding Codon IR passes, including ones related to I/O, string operations and so on. The important takeaway is that Codon IR gives us a unified, systematic way to approach these optimization problems.

Optimizing NumPy code

We went into great detail on Codon’s NumPy-specific compiler optimizations in our last blog post, so you can read that to get all the context. To summarize, Codon IR makes it easy for us to implement NumPy-specific optimizations like operator fusion, which allows us to take code like this:

# `x` and `y` are equal-length ndarrays of floats in [0, 1)
# pi ~= 4 x (fraction of points in circle)
pi = ((x-1)**2 + (y-1)**2 < 1).sum() * (4 / len(x))

and optimize it so the array operations are performed in a single pass instead of sequentially, which not only produces a performance improvement but also prevents us from having to allocate memory for temporary arrays. The result is a 6x speedup on this simple code! This pass is implemented in codon/cir/transform/numpy.

Creating domain-specific languages

Domain-specific languages often make use of domain knowledge about data types and operations thereon in order to perform optimizations that aren’t possible with general-purpose compilers. In many ways, you can think of NumPy as a domain-specific language for array computing, and the NumPy-specific optimizations above are only possible because we have knowledge about semantics of different operations on arrays that the compiler can leverage to perform said optimizations, i.e. how they interact with one another, how to compose them, etc. — a general-purpose compiler wouldn’t perform many of these optimizations.

The same is true for many other domains. In fact, Codon’s predecessor was initially developed to address performance and scalability challenges in the fields of genomics and bioinformatics, and that effort lives on in the form of a Codon plugin. The central datatype in many genomics applications is the sequence, be it a DNA, RNA or protein sequence. Operations on sequences such as slicing, hashing, reverse complement, alignment and several others are common, so the plugin includes a number of passes that recognize and optimize these operators, be it individually or when they’re used sequentially.

For example, alignment is often performed in a loop such as:

for t in target_sequences:
    for q in query_sequences:
        a = q.align(t)

It turns out that we can do this much more efficiently by batching query-target pairs and doing all the alignments at once via a SIMD-accelerated Smith-Waterman implementation. The transformation that enables this is implemented as an “inter-sequence alignment” compiler pass in Codon IR — doing this transformation by hand would require a lot of code re-organization, batching sequences, invoking the alignment kernel etc. It’s much nicer when the compiler can do it automatically!

Another common pattern is:

for s in sequences:
    info = index[s]  # `index` could be a hash-table, FM-index etc.

Again, for many different types of indices, we can do better — this time by leveraging software prefetching and transforming the loop so that useful work can be completed while our prefetches are in flight. This is again made possible by a Codon IR pass, accompanied by a new magic method __prefetch__ that we add to the index type which mirrors __getitem__, but instead of actually retrieving the item, simply prefetches it.

These are relatively simple and high-level explanations of what are actually fairly involved code transformations. This 2021 Nature Biotechnology paper goes into a lot more detail and includes extensive benchmarks on real-world applications.

Wrapping up

We’ve talked about the motivation behind Codon IR, what it is and what it can do. We also discussed some of the features that make it unique, like bidirectionality, as well as how to extend it with plugins. The features and methods we highlighted serve as the foundation for the many compiler passes that Codon uses, from standard optimizations like constant folding or dead code elimination, to Python- or library-specific optimizations to passes pertaining to parallelization or multithreading — all of them use the elements and building blocks we showcased here. Codon IR offers a powerful and flexible framework for reasoning about and optimizing Python code; if you’re interested to learn more or contribute, feel free to take a look at the GitHub repo and/or reach out to us!

‍

Codon IR: a framework for optimizing Python code

Why we needed a new IR

Codon IR

Passes, analyses and optimizations

Bidirectionality – a new kind of IR

Extending Codon IR with plugins

Applications of Codon IR

Optimizing Python code patterns

Optimizing NumPy code

Creating domain-specific languages

Wrapping up

More articles

Codon 0.19 and road to 1.0

Codon in 2025: New compiler-optimized NumPy implementation. Switching to an open source license.

Mapping Python to LLVM

Stay in the loop