The Future of the Stan Math Library and the Bottlenecks in Development

The Stan Math Library

Currently, we have a stable Math library used to write functions, evaluate the functions at different inputs, and get the gradients of functions with respect to those inputs. The Math library provides the back-end for the Stan language; for every function in Stan, there’s a corresponding Math library function. The Math library is broader in scope than any other C++ library for automatic differentiation (as far as I know; see arXiv paper). On the technical side, it was designed to be a header-only C++ library, non-threaded and non-GPU application. (It still is… if you don’t use integrate_ode_bdf().) For this purpose, it’s as fast or faster than every other C++ library for automatic differentiation out there.

continue / comment

Slides from "How Stan computes the posterior distribution"

I spoke at the Bayesian Data Analysis meetup in NYC last month. It was an interactive session where we worked through how Stan computes lp__, the evaluation of the log probability used by the MCMC sampler to generate proposals.

Here are links to slides and materials:

continue / comment

Tonight at DTUT, 8-10 PM

I’m back at DTUT (1744 2nd Ave) tonight. 8-10 PM. Come through, hang out.

I’ve been listening to a lot of Dilla lately, so I’ll definitely be playing some stuff by Dilla, some collaborations, and other things influenced by his production. And lots of other stuff too. It’s been busy with the rebrand of our company, Generable, and putting in a lot of work.

If you want to hear some live recordings, stream here.

continue / comment

Tools

🔨⚙️🔧

I rely on a lot of tools to get by. Here are some of the ones I stand by.

Statistical modeling

Stan

I’m biased towards Stan, but for good reason. Here are a few that come to mind:

  • It’s currently in active development.
  • Stan is its own language with its own grammar and it makes sense, thanks to Bob Carpenter.
  • There are models that I am interested that can’t be expressed in other high-level languages (including PyMC3 and TensorFlow).
  • Communication of statistical models is unambiguous. You can contrast that with BUGS / JAGS where it’s unclear what are data and what are parameters.
  • Debugging is pretty straightforward.
  • It’s fast when you need it to be.

Don’t get me wrong: Stan isn’t perfect. There’s still a lot of room for improvement, but it’s the best tool for statistical models that matter at the moment.

Honorable mention

PyMC3. If you’re completely pythonic, this is probably the tool for you.

Software

Text / Code editor

I’ve been using Emacs since 1996. I can’t shake it. I’ve tried moving to other text editors and IDEs. I always come back.

continue / comment

Wednesday at DTUT

I’m djing at DTUT (1744 2nd Ave) on Wednesday, February 1, from 7 - 10 PM.

It’ll be me and my QFO rocking funk and hip hop for a few hours. These past few weeks have been crazy and awesome:

  • Stan Conference 2017 was amazing. We had over 150 people show up to support the statistical modeling language I’ve been working on for the last 5 years. If you want to see video, check out Youtube.
  • Stan Group, which I’m the CTO and founder, is in the Techstars NYC Winter 2017 class. We’re out to bring more visibility to Stan.

Hope to see some of you at DTUT on Wednesday.

continue / comment

Recordings Moved

My live mixes are now hosted on SoundCloud. They were all recorded live with no editing. I really need to go back and balance out the sound at some point; levels in a live venue when you’re running your own sound is variable.

Oh yeah, and with all things live, there are mistakes and I do get warm after about 20 minutes in.

P.S. The old host, Mixcrate, was taken down. Hopefully SoundCloud is stable.

continue / comment

DJing on Thursday, 10/6, at DTUT

I’m djing at DTUT on Thursday from about 6 - 9 pm. It’s been a while. I’m planning to play funk and stuff from my childhood. It’ll get weird!

Here’s the Facebook event, if you’re into that type of stuff.

continue / comment

Stan as a language

I had a wonderful time teaching Stan in Paris last week. It was a 3-day course focused on pharmacometrics organized by Stan Group and our friends at INSERM, France Mentré and Julie Bertrand.

The day after the course, there was a 1-day workshop: Stan for Pharmacometrics Day. It was phenomenal seeing the different applications of Stan; especially since I haven’t been involved with any of the projects presenting. And yes, Stan is used for modeling the effects drugs on pediatric cancer.

My lanugage skills

Spending time in another country with limited communication in the local language for an extended period of time got me reflecting on learning Stan as a language.

French

I don’t speak French, but I can get by with context. (and the same with Spanish and Italian)

When it’s clear that you don’t speak the local language, it narrows the interaction to a subset of the language. At the boulangeries, it was really limited to me exchanging minimal pleasantries, choosing things that were in the display case, and paying for the order. Fortunately, Paris has wonderful bakeries and I couldn’t really go wrong. But… if I spoke the language, I might have asked what just came out of the oven. Or asked what local specialty they made. Or if they could direct me to the best creperie in the area.

In no way am I fluent with French.

While sitting at dinners overhearing the prosody of conversations all around, I started reflecting on what it means to really understand a language. There are clearly different levels of comprehension. There’s also the difference in expression and comprehension. These ideas also extend to programming languages and Stan.

Programming languages

Programming languages are a different story. I’m fluent in enough of these. The ones I’ll actively claim now are C++, Stan, and R, but I’ve used a few more. More importantly, I’ve seen enough of these to know what to look for when I navigate a different language. As I was in Paris reflecting on the language, it dawned on me that there are similarities between natural languages and programming languages.

Disclaimer: I’m not a languages person. This is just me making observations. I do know Stan, though.

Stan as a language

Stan is meets all the criteria of a programming language. The BNF grammar can be found in the user manual. And it’s Turing complete.

But Stan’s not meant to be a general purpose computing language. It’s main use is in specifying statistical models.

Comparing languages

At the beginning stages of learning a language or having to use it, it really is all about figuring out how to express what you know in one language in the other. It’s usually a very narrow scope of things you want to express. For me, it’s in a boulangerie figuring out how to ask “could I please have two croissants?”

I don’t evaluate languages based on the initial, narrow scope of things I want to express. We have a lot of users coming from other programs (some not full languages) that are designed to do specific tasks. Comparing the difficulty expressing that initial, narrowly focused query isn’t the right task to use when comparing languages.

Just like natural language, I’m looking for where the language is more expressive, more efficient, and more elegant in describing certain things. These differences are what makes languages special and give you a small window into the culture. In French, there are lots of different types of breads with their own names–and all the names are used! I really wish I could read Le Petit Prince and not just transliterate it, but really understand the nuance in how it’s expressed.

With computer languages, it’s really a matter of lining up your particular use case with what the language + implementation is good for. With any Turing complete language, you can theoretically compute whatever you want. In practice, though, you wouldn’t use C++ for a stats graphics library, R for experimenting with lambda calculus, Lisp for fast MCMC code, etc.

There is no one language for everything.

Don’t judge the Stan language based on how many lines it takes to write a toy example

In R, the lmer() function from package lme4 does magic. It’s really a one-liner to fit linear mixed models pretty robustly. An equivalent Stan program may take a tens of lines to write. And beyond to make matters worse, that one Stan program won’t have as much flexibility as the formula specification in lmer!

My advice: if you know lme4 implements exactly what you want, use it.

One nice aspect to Stan is that you’re explicit in what you’re describing is the statistical language. Describing the statistical program in math is cleaner, but you lose out on the ability to “run” math or be unambigous about notation (practically).

So what can Stan express efficiently?

Stan really excels in expressing statistical models that are just beyond the limits of your favorite package.

The language separates the data, the parameters, and the joint probability distribution function that ties everything together. The original motivation was purely computational, but I think we got the abstraction right. As I’ve been getting more fluent in the language (even though I’ve been with Stan since pre v1.0) and figuring out how to teach it, it’s become clear that this language has the benefit of being able to express almost any statistical model and while having the pragmatism of being an actual programming language. It forces the user to be unambigous and specific about how to compute the log joint probability distribution function in a way that is hidden in many other computational frameworks. In this sense, it clarifies what a statistician may think about a statistical model.

Another benefit to the language is that it treats the statistical model as a first-class citizen. Users don’t specify how inference is going to happen. Why does this matter? The machine learning and computer science communities have often blurred the lines between implementation of inference algorithms and the underlying statistical models and called the joint unit an “algorithm.” (It’s not just these communities – stats is guilty too, especially in the papers that include hacks in derivations of Gibbs samplers.) I understand the appeal for treating the two things as one unit. In practice, a statistical model isn’t useful without an implementation and an algorithm really can’t be made efficient without knowing the structure of the underlying statistical model. Without an abstraction other than a general computing language (R, Python, Matlab, Julia, C++, etc.), it takes an immense amount of discipline and foresight to implement inference for a statistical model in such a way that the two pieces are completely separated, allowing for replacement of the statistical model or the inference algorithm independently. Approximations are often necessary and implementations often cross across these boundaries. When only concerned with a single algorithm for a single model, it is much less overhead not worrying about the separation. With Stan, the separation of the statistical model from inference algorithms are explicit.

The main purpose of Stan’s language is to specify statistical models and the language is expressive enough to cover a large set of these statistical models.

Evaluating languages

As you’re evaluating a new language, think beyond how to express what you already know, but think about what you want to accomplish in the language and whether it’s easier to express it there. For statistical models that go beyond the package you’re used to using, it’s often easier in Stan than in a general language, but not always.

continue / comment

Stan courses: Alaska in August, Paris in September

I’m teaching two Stan short courses in the next month. We’ve been getting requests to teach these courses more frequently and it’s always good to introduce more people to Stan.

Anchorage. 8/23-24

This is a two-day short course organized by the Alaska chapter of the ASA. I’ll cover Stan and RStan with a slight focus on wildlife examples. For information and registration: ASA Alaska Chapter

Paris. 9/19-21

Me, Michael, and Bob are teaching a three-day short course on Stan with a focus on phamacometrics. Over the past two years, we’ve really extended Stan’s capabilities and now we’re able to specify models with ordinary differential equations (ODE) in the Stan language. Using an ODE to specify the model is still slower than an analytic solution, but it does allow users to write general compartment models more naturally in the Stan language.

A lot of this work has been inspired and done by our collaborator Sebastian Weber at Novartis Pharma. He’s really driving a lot of the insights, code, and capabilities.

There’s still time to register for the course.

Paris. 9/22

In addition to the short course, there’s a workshop on Bayesian Pharamcometric models. This one’s free! Register.


If you’re going to be at any of these events and want to talk, please reach out. I’m always happy talking Stan or stats in general.

continue / comment

DJing on Wednesday, 7/13, at DTUT

I’m back at DTUT on Wednesday. Here’s the Facebook event. I’ll be spinning funk and random stuff.

And in case you missed the last one, here’s what it sounded like.

continue / comment