Stan as a language

I had a wonderful time teaching Stan in Paris last week. It was a 3-day course focused on pharmacometrics organized by Stan Group and our friends at INSERM, France Mentré and Julie Bertrand.

The day after the course, there was a 1-day workshop: Stan for Pharmacometrics Day. It was phenomenal seeing the different applications of Stan; especially since I haven’t been involved with any of the projects presenting. And yes, Stan is used for modeling the effects drugs on pediatric cancer.

My lanugage skills

Spending time in another country with limited communication in the local language for an extended period of time got me reflecting on learning Stan as a language.


I don’t speak French, but I can get by with context. (and the same with Spanish and Italian)

When it’s clear that you don’t speak the local language, it narrows the interaction to a subset of the language. At the boulangeries, it was really limited to me exchanging minimal pleasantries, choosing things that were in the display case, and paying for the order. Fortunately, Paris has wonderful bakeries and I couldn’t really go wrong. But… if I spoke the language, I might have asked what just came out of the oven. Or asked what local specialty they made. Or if they could direct me to the best creperie in the area.

In no way am I fluent with French.

While sitting at dinners overhearing the prosody of conversations all around, I started reflecting on what it means to really understand a language. There are clearly different levels of comprehension. There’s also the difference in expression and comprehension. These ideas also extend to programming languages and Stan.

Programming languages

Programming languages are a different story. I’m fluent in enough of these. The ones I’ll actively claim now are C++, Stan, and R, but I’ve used a few more. More importantly, I’ve seen enough of these to know what to look for when I navigate a different language. As I was in Paris reflecting on the language, it dawned on me that there are similarities between natural languages and programming languages.

Disclaimer: I’m not a languages person. This is just me making observations. I do know Stan, though.

Stan as a language

Stan is meets all the criteria of a programming language. The BNF grammar can be found in the user manual. And it’s Turing complete.

But Stan’s not meant to be a general purpose computing language. It’s main use is in specifying statistical models.

Comparing languages

At the beginning stages of learning a language or having to use it, it really is all about figuring out how to express what you know in one language in the other. It’s usually a very narrow scope of things you want to express. For me, it’s in a boulangerie figuring out how to ask “could I please have two croissants?”

I don’t evaluate languages based on the initial, narrow scope of things I want to express. We have a lot of users coming from other programs (some not full languages) that are designed to do specific tasks. Comparing the difficulty expressing that initial, narrowly focused query isn’t the right task to use when comparing languages.

Just like natural language, I’m looking for where the language is more expressive, more efficient, and more elegant in describing certain things. These differences are what makes languages special and give you a small window into the culture. In French, there are lots of different types of breads with their own names–and all the names are used! I really wish I could read Le Petit Prince and not just transliterate it, but really understand the nuance in how it’s expressed.

With computer languages, it’s really a matter of lining up your particular use case with what the language + implementation is good for. With any Turing complete language, you can theoretically compute whatever you want. In practice, though, you wouldn’t use C++ for a stats graphics library, R for experimenting with lambda calculus, Lisp for fast MCMC code, etc.

There is no one language for everything.

Don’t judge the Stan language based on how many lines it takes to write a toy example

In R, the lmer() function from package lme4 does magic. It’s really a one-liner to fit linear mixed models pretty robustly. An equivalent Stan program may take a tens of lines to write. And beyond to make matters worse, that one Stan program won’t have as much flexibility as the formula specification in lmer!

My advice: if you know lme4 implements exactly what you want, use it.

One nice aspect to Stan is that you’re explicit in what you’re describing is the statistical language. Describing the statistical program in math is cleaner, but you lose out on the ability to “run” math or be unambigous about notation (practically).

So what can Stan express efficiently?

Stan really excels in expressing statistical models that are just beyond the limits of your favorite package.

The language separates the data, the parameters, and the joint probability distribution function that ties everything together. The original motivation was purely computational, but I think we got the abstraction right. As I’ve been getting more fluent in the language (even though I’ve been with Stan since pre v1.0) and figuring out how to teach it, it’s become clear that this language has the benefit of being able to express almost any statistical model and while having the pragmatism of being an actual programming language. It forces the user to be unambigous and specific about how to compute the log joint probability distribution function in a way that is hidden in many other computational frameworks. In this sense, it clarifies what a statistician may think about a statistical model.

Another benefit to the language is that it treats the statistical model as a first-class citizen. Users don’t specify how inference is going to happen. Why does this matter? The machine learning and computer science communities have often blurred the lines between implementation of inference algorithms and the underlying statistical models and called the joint unit an “algorithm.” (It’s not just these communities – stats is guilty too, especially in the papers that include hacks in derivations of Gibbs samplers.) I understand the appeal for treating the two things as one unit. In practice, a statistical model isn’t useful without an implementation and an algorithm really can’t be made efficient without knowing the structure of the underlying statistical model. Without an abstraction other than a general computing language (R, Python, Matlab, Julia, C++, etc.), it takes an immense amount of discipline and foresight to implement inference for a statistical model in such a way that the two pieces are completely separated, allowing for replacement of the statistical model or the inference algorithm independently. Approximations are often necessary and implementations often cross across these boundaries. When only concerned with a single algorithm for a single model, it is much less overhead not worrying about the separation. With Stan, the separation of the statistical model from inference algorithms are explicit.

The main purpose of Stan’s language is to specify statistical models and the language is expressive enough to cover a large set of these statistical models.

Evaluating languages

As you’re evaluating a new language, think beyond how to express what you already know, but think about what you want to accomplish in the language and whether it’s easier to express it there. For statistical models that go beyond the package you’re used to using, it’s often easier in Stan than in a general language, but not always.

continue / comment

Stan courses: Alaska in August, Paris in September

I’m teaching two Stan short courses in the next month. We’ve been getting requests to teach these courses more frequently and it’s always good to introduce more people to Stan.

Anchorage. 8/23-24

This is a two-day short course organized by the Alaska chapter of the ASA. I’ll cover Stan and RStan with a slight focus on wildlife examples. For information and registration: ASA Alaska Chapter

Paris. 9/19-21

Me, Michael, and Bob are teaching a three-day short course on Stan with a focus on phamacometrics. Over the past two years, we’ve really extended Stan’s capabilities and now we’re able to specify models with ordinary differential equations (ODE) in the Stan language. Using an ODE to specify the model is still slower than an analytic solution, but it does allow users to write general compartment models more naturally in the Stan language.

A lot of this work has been inspired and done by our collaborator Sebastian Weber at Novartis Pharma. He’s really driving a lot of the insights, code, and capabilities.

There’s still time to register for the course.

Paris. 9/22

In addition to the short course, there’s a workshop on Bayesian Pharamcometric models. This one’s free! Register.

If you’re going to be at any of these events and want to talk, please reach out. I’m always happy talking Stan or stats in general.

continue / comment

DJing on Wednesday, 7/13, at DTUT

I’m back at DTUT on Wednesday. Here’s the Facebook event. I’ll be spinning funk and random stuff.

And in case you missed the last one, here’s what it sounded like.

continue / comment

Live from DTUT. 6/15.

On Wednesday, I brought out my QFO LE and set up at my local coffeeshop, DTUT, on the Upper East Side. This mix was recorded live from 8 - 10 pm while drinking some whiskey.

I played mostly funk and soul. I had a lot of fun with it.

P.S. I think this is the first mix of mine that’s floating around the interwebs. It’s a shame cause this one’s pretty sloppy. Some techincal reasons for that, but really – I’m out of practice.

continue / comment

Iterating over Statistical Models: NCAA Edition.


On Saturday, April 9, I spoke at the New York R Conference. The slides from my talk are available: 2016-04-ny-r-conference.pdf. Video was recorded and will be posted soon.

My Talk

Naturally, I talked about Stan. The point I wanted to get across was that statsitical modeling should be treated as a discipline. On the stan-users list and what I know to be common practice, I see people embedding statistical models within scripts. This makes it hard to collaborate and the software world has figured this out with tools like git.

We collaborate all the time. For the statistical models built for the Machine Madness Kaggle competition, we built models, checked them in, discussed them. If Rob Trangucci wasn’t around, we woudln’t have competed this year.

The Conference

Jared Lander, Jessica Lin, and the two crews from Lander Analytics and Work-Bench did a great job of organizing the conference. Each speaker was given 20 minutes to talk; no questions. It ran pretty smoothly and I picked up a lot of information about R’s development.

For me, highlights included:

  • JJ’s talk on RStudio’s new features and
  • Alp’s talk on ADVI
  • Andrew’s talk on social penumbras
  • Drew’s talk on social aspects of data science
  • Vivian’s talk about evoking emotions through data
  • Bas’s talk on program analysis
  • Josh’s talk on building packages for use at New York Times

A recurring theme this year was testing, which is a big step in the right direction.

For a running commentary of the conference, see #rstatsnyc on twitter.


If you have questions about my talk, feel free to reach out.

continue / comment