The Future of the Stan Math Library and the Bottlenecks in Development

The Stan Math Library

Currently, we have a stable Math library used to write functions, evaluate the functions at different inputs, and get the gradients of functions with respect to those inputs. The Math library provides the back-end for the Stan language; for every function in Stan, there’s a corresponding Math library function. The Math library is broader in scope than any other C++ library for automatic differentiation (as far as I know; see arXiv paper). On the technical side, it was designed to be a header-only C++ library, non-threaded and non-GPU application. (It still is… if you don’t use integrate_ode_bdf().) For this purpose, it’s as fast or faster than every other C++ library for automatic differentiation out there.

But, we’re really not leveraging modern computing architecture well. Every computing device has multiple cores. A lot of laptop and desktop computers have GPUs. HPCs definitely have these things. This is how other libraries, under certain conditions, can be faster than Stan Math for the same operations.

In broad strokes, there are really two sensible directions for the Math library to progress:

  1. breadth, meaning adding more functionality
  2. speed, meaning wall time.

I think the next innovation in the Math library is going to be in the speed department. We should be able to leverage multiple threads and GPUs to get the most out of our computing power. Imagine Stan models being able to go 10x, 100x, 1000x faster, all without having to change the Stan model itself.

How do we get speed

  1. more threads; split computation across multiple cores (and not just at the map_rect() level)
  2. more GPU; offload computation to a device designed to do a simple job, but fast
  3. analytic gradients; skipping using automatic differentiation to compute things we already know how to compute
  4. forward mode automatic differentiation; instead of computing a gradient, compute a directional derivative. This doesn’t scale well in general, but it’s embarrassingly parallel.

Bottlenecks to developing speed in Math

  1. Technical debt We’ve amassed enough technical debt in Math where it’s hard to move quickly. We need to pay this down in order to enable developers to move faster. Some specific things that we could do: - testing framework to make it easier to validate new and existing code; without it, refactoring is a true pain. (We’ll need this sort of thing to tackle more threading and everything.) - documentation and testing of the internal functions to make it easier for new developers to be on-boarded and existing developers from reinventing the wheel - benchmarking suite - simplify the code base - remove dead code

     This stuff is hard to get volunteer effort on. I'm super-grateful for the people tackling this right now. We're really indebted to these developers (and I'm probably missing a few more): @increasechief, @rok_cesnovar, @Stevo15025, @seantalts, @bob_carpenter, @bbbales2. If you're looking for a way to help, this is really super-important and much appreciated, even though it has less visible impact than adding new features.
    
  2. Expertise To really be an effective Math developer, you need a combination of skills: C++ templating, math (calculus), numerical computation, software design. And often that’s not enough… there’s also threading, MPI, building and linking executables, GPU, ODE integration, numerical approximation, just to name a few. These things don’t usually all live in a single human being. We often need to collaborate on features to get them in. When we don’t, we get contributions that lead to more technical debt that we have to pay down later.

    An example I’ve been thinking about lately is threading. I do believe this will play a big part in making the Math library faster. It’s taking us a while to get @wds15’s prototype into the Math library (which is really a lot of work and has the potential to be awesome). I feel bad that it’s taking this long, but we want this to be right. The reason we want to take our time is because once we have a reasonable design for threading, we can have other developers that don’t have threading expertise help with features because the patterns have been laid out clearly. If we don’t take the time and we introduce something that’s not laid out well, we limit other developers from helping, including those with threading expertise.

    This is true for other features too.

  3. Computation costs for testing We’re just resource-limited here. It would help to have more resources so we can just test at will and fast too.

If you want to help

We would really appreciate the help. If you’re looking to help, just reach out. I’ll try to keep a running list of active projects on Math where we could use the help.

Of course, you can go and build new features. That’s necessary and appreciated too.

Ideally, we can start paying down the technical debt so the Math library start progressing more quickly.


Cross posted on the Stan forums.