Stan as a language
27 Sep 2016I had a wonderful time teaching Stan in Paris last week. It was a 3-day course focused on pharmacometrics organized by Stan Group and our friends at INSERM, France Mentré and Julie Bertrand.
The day after the course, there was a 1-day workshop: Stan for Pharmacometrics Day. It was phenomenal seeing the different applications of Stan; especially since I haven’t been involved with any of the projects presenting. And yes, Stan is used for modeling the effects drugs on pediatric cancer.
My lanugage skills
Spending time in another country with limited communication in the local language for an extended period of time got me reflecting on learning Stan as a language.
French
I don’t speak French, but I can get by with context. (and the same with Spanish and Italian)
When it’s clear that you don’t speak the local language, it narrows the interaction to a subset of the language. At the boulangeries, it was really limited to me exchanging minimal pleasantries, choosing things that were in the display case, and paying for the order. Fortunately, Paris has wonderful bakeries and I couldn’t really go wrong. But… if I spoke the language, I might have asked what just came out of the oven. Or asked what local specialty they made. Or if they could direct me to the best creperie in the area.
In no way am I fluent with French.
While sitting at dinners overhearing the prosody of conversations all around, I started reflecting on what it means to really understand a language. There are clearly different levels of comprehension. There’s also the difference in expression and comprehension. These ideas also extend to programming languages and Stan.
Programming languages
Programming languages are a different story. I’m fluent in enough of these. The ones I’ll actively claim now are C++, Stan, and R, but I’ve used a few more. More importantly, I’ve seen enough of these to know what to look for when I navigate a different language. As I was in Paris reflecting on the language, it dawned on me that there are similarities between natural languages and programming languages.
Disclaimer: I’m not a languages person. This is just me making observations. I do know Stan, though.
Stan as a language
Stan is meets all the criteria of a programming language. The BNF grammar can be found in the user manual. And it’s Turing complete.
But Stan’s not meant to be a general purpose computing language. It’s main use is in specifying statistical models.
Comparing languages
At the beginning stages of learning a language or having to use it, it really is all about figuring out how to express what you know in one language in the other. It’s usually a very narrow scope of things you want to express. For me, it’s in a boulangerie figuring out how to ask “could I please have two croissants?”
I don’t evaluate languages based on the initial, narrow scope of things I want to express. We have a lot of users coming from other programs (some not full languages) that are designed to do specific tasks. Comparing the difficulty expressing that initial, narrowly focused query isn’t the right task to use when comparing languages.
Just like natural language, I’m looking for where the language is more expressive, more efficient, and more elegant in describing certain things. These differences are what makes languages special and give you a small window into the culture. In French, there are lots of different types of breads with their own names–and all the names are used! I really wish I could read Le Petit Prince and not just transliterate it, but really understand the nuance in how it’s expressed.
With computer languages, it’s really a matter of lining up your particular use case with what the language + implementation is good for. With any Turing complete language, you can theoretically compute whatever you want. In practice, though, you wouldn’t use C++ for a stats graphics library, R for experimenting with lambda calculus, Lisp for fast MCMC code, etc.
There is no one language for everything.
Don’t judge the Stan language based on how many lines it takes to write a toy example
In R, the lmer()
function from package lme4
does magic. It’s really a one-liner to fit linear mixed models pretty robustly. An equivalent Stan program may take a tens of lines to write. And beyond to make matters worse, that one Stan program won’t have as much flexibility as the formula specification in lmer!
My advice: if you know lme4
implements exactly what you want, use it.
One nice aspect to Stan is that you’re explicit in what you’re describing is the statistical language. Describing the statistical program in math is cleaner, but you lose out on the ability to “run” math or be unambigous about notation (practically).
So what can Stan express efficiently?
Stan really excels in expressing statistical models that are just beyond the limits of your favorite package.
The language separates the data, the parameters, and the joint probability distribution function that ties everything together. The original motivation was purely computational, but I think we got the abstraction right. As I’ve been getting more fluent in the language (even though I’ve been with Stan since pre v1.0) and figuring out how to teach it, it’s become clear that this language has the benefit of being able to express almost any statistical model and while having the pragmatism of being an actual programming language. It forces the user to be unambigous and specific about how to compute the log joint probability distribution function in a way that is hidden in many other computational frameworks. In this sense, it clarifies what a statistician may think about a statistical model.
Another benefit to the language is that it treats the statistical model as a first-class citizen. Users don’t specify how inference is going to happen. Why does this matter? The machine learning and computer science communities have often blurred the lines between implementation of inference algorithms and the underlying statistical models and called the joint unit an “algorithm.” (It’s not just these communities – stats is guilty too, especially in the papers that include hacks in derivations of Gibbs samplers.) I understand the appeal for treating the two things as one unit. In practice, a statistical model isn’t useful without an implementation and an algorithm really can’t be made efficient without knowing the structure of the underlying statistical model. Without an abstraction other than a general computing language (R, Python, Matlab, Julia, C++, etc.), it takes an immense amount of discipline and foresight to implement inference for a statistical model in such a way that the two pieces are completely separated, allowing for replacement of the statistical model or the inference algorithm independently. Approximations are often necessary and implementations often cross across these boundaries. When only concerned with a single algorithm for a single model, it is much less overhead not worrying about the separation. With Stan, the separation of the statistical model from inference algorithms are explicit.
The main purpose of Stan’s language is to specify statistical models and the language is expressive enough to cover a large set of these statistical models.
Evaluating languages
As you’re evaluating a new language, think beyond how to express what you already know, but think about what you want to accomplish in the language and whether it’s easier to express it there. For statistical models that go beyond the package you’re used to using, it’s often easier in Stan than in a general language, but not always.