Continuing from my more contemplative last post, it's finally time to talk about Quint! In the last couple weeks I've been writing a lot of Quint, and I've been really enjoying it.
It fulfills a big need that I tended to satisfy by just starting coding, not realizing that there was a better option. I can't say yet whether it will "transform" my programming, and maybe it will fall by the wayside eventually, but whatever comes of it long term, I'm really excited about it now.
I'm apparently in deep-dive mode this week so get ready.
While my previous post touches on some of the backstory leading up to my discovery of Quint, the reason I looked into it was quite simple: I saw that I might need to write a distributed algorithm, and I wasn't confident I could do it without some way to check my work.
If you want more background on what formal methods are and the different categories of them I've discovered so far, you can check out "What Are Formal Methods" below. It was just a section of this blog before I moved it to a separate page since this post was already super long. 😄
What Are Formal Methods?
Quint is one of several different tools that can help you apply formal methods to software development. Wikipedia has a pretty good short description of "formal methods":
In computer science, formal methods are mathematically rigorous techniques for the specification, development, analysis, and verification of software and hardware systems. The use of formal methods for software and hardware design is motivated by the expectation that, as in other engineering disciplines, performing appropriate mathematical analysis can contribute to the reliability and robustness of a design.
Essentially they are ways to use the power of math to get strong guarantees about how your software behaves. They are often used for checking the safety or liveness properties of a system.
An example of a safety property could be something like "it is impossible for two servers to both think that they are the leader at the same time". A liveness property could be something like "as long as messages are still being sent and received, it's impossible for there to be a deadlock, and the system will still make progress".
These kinds of properties are extremely important in critical production systems.
There is a whole spectrum of formal methods. It's actually quite confusing, with all the different strategies, to figure out what each thing is doing and what it's pros and cons are.
I definitely haven't gotten it all figured out, but I think I've got the general idea behind a lot of the options now so I'll try to give an overview.
Proof Assistants
The most powerful assertion that we can get from formal methods is a proof of some property. Proofs are also, by far, the hardest kind of assertion to craft.
It's difficult to understand what writing a proof is like until you do it. In a funny way, it's almost like literally walking through every possible situation and showing the computer that, "look this is OK because of that", until you have left no possibility unjustified.
The interesting things is that when you do this recursively, you can potentially prove things about an infinite number of possibilities, even with a finite proof. This ability to deal with infinity can be an advantage over model checkers ( though model checkers still have a trick up their sleeve ).
Tools like Lean, Rocq, Idris, and Agda are proof assistants. They are like special programming languages that let you model your system or algorithm, and then write and validate proofs about it.
I've tried Lean and Rocq and Lean is by far my favorite, though Rocq is still cool. Lean is actually a pretty nice programming language and can compile to C code too, which is very useful and not normal for proof assistants.
Most of the time you will have to model and prove your algorithm in one language, and then write the actual implementation in another language, but that's not necessarily the case with Lean.
Again, the big caveat with proof assistants is that writing proofs is extremely difficult and time consuming. A proof is often 15-20x larger than the code for the algorithm itself!
There has been talk of AI making it much easier to write proofs. And unlike lots of other AI programming ideas, having an AI write a proof for a property I want to check would be genuinely useful.
I'm still not quite ready to depend on AI to write my proofs as a general-purpose strategy for software verification. If it works, great, but I'll still need other techniques available.
Model Checkers
Model checkers are quite a bit different than proof assistants. When using a model checker you are often defining a state machine in some language like TLA⁺ or Quint, and the model checker will go try as many different states as it can and validate your properties in those states.
This is essentially a kind of brute force test. While a proof can be checked relatively quickly after it's been written, model checking can take a lot of time, CPU, and memory to try every single state, or a significant portion of them. Even still, the trade-off is often worth it, because the model checker can automatically find bugs in your system, without you having to write a detailed proof that your system is correct. This can also give you a much quicker feedback loop while you are actually developing the system.
TLC and Apalache are both model checkers that check TLA⁺. Quint compiles to TLA⁺ so you can use it with both of them. FizzBee is another cool model checker that lets you write models in an easy Python dialect.
As far as I've seen, model checkers are by far the most commonly used formal methods for real software projects. AWS in particular is big on using TLA⁺, and I've seen TLA⁺ come up a lot in distributed systems / database design.
As I mentioned earlier, proofs have an advantage over model checkers because they can deal with infinity without taking an infinite amount of time. Adding just a couple of rounds to a distributed system model can result in a massive growth of states that need to be checked, making it potentially infeasible to check even 5 rounds compared to 3, for example0.
But there is a clever trick that model checkers can use called an inductive invariant. If you can proof that
Your model is correct for every possible starting state, and
Every state you can get to from a correct state is also correct
Then basically you know that every state, no matter how many times it changes, has to be correct.
Writing an inductive invariant can take creativity, similar to a proof, and it still can take lots of time and memory, like model checking. It's no silver bullet, but it can be very useful and help you understand why your system correct. I still need to get a better understanding of this. I think this is a good article, but I haven't properly read through it yet:
Constraint Solvers & Optimizers
Constraint solvers & optimizers are also interesting formal methods. Minizinc, Z3, and SMT / SAT solvers in general all fall into this category.
These tools are a kind of equation solver. They can be used to find answers to equations of a particular kind, prove that there is no answer, or, sometimes, find the answer that results in the largest or smallest value of another variable.
I'm honestly still wrapping my head around how these work and what they can do. I haven't found a direct use-case in software development for them yet, but they still count as formal methods and I was trying to figure out how they fit into things earlier, so it seemed worth including them here.
As far as I can tell, they aren't good for software modeling by themselves, but they can be very useful when combined as a part of a bigger tool. For example, Z3 is used heavily in the Apalache model checker to find violations of your system properties.
I recommend trying out Minizinc for something user friendly if you want to play with these kinds of solvers. It's pretty cool, and Z3 looks like it's good if you are looking for including a solver in a program instead of just writing models by hand.
Settling on Quint
Quint definitely wasn't the first thing I tried.
Trying Out FizzBee
Before I settled on Quint I had actually just found FizzBee and thought that it was really cool.
What I like about FizzBee is that it's extremely easy to get your idea out in the language if you are a programmer. It just easily reflects the way you think about code, and I think that can make formal model checking more approachable to a lot of people.
I was really excited about it, started learning how Paxos1 worked, and then implemented a FizzBee model for it.
After I finished my first Paxos implementation with FizzBee I realized that I had no idea how long model checking could take. And the way that FizzBee works by trying to reach every single possible state to validate that your properties hold meant that it couldn't tell me how close it was to finishing.
In FizzBee it's also easier than, for example, TLA⁺ to automatically inject more faults throughout your implementation, which can be good because it lets you test more possible failures, but it also multiplies the states you have to check. I didn't really have any idea how many states I had or whether I needed to simplify my model, or reduce the number of actions so that the test would finish. It was bugging me that I had no idea whether I should expect to wait hours, days, or weeks for it to finish.
After looking around, I found that it can be normal for these things to take days or weeks to check larger models and I wasn't so sure if this was a good idea anymore. I was imagining that it should only take minutes and that model checking was a way to get quick feedback, but that's not exactly what I was seeing.
I hadn't yet grasped the enormity of the state spaces that can exist in software. For example, it would take you 585 years just to go through every possible state of a single 64 bit integer, if you were trying out 1 billion states per second. And that is for one, single number! 😲
No wonder writing software is so hard.
Ruling Out Lean
That's when I was like, well… can I just write proofs instead? How hard is it really to write a proof? A proof can just be checked without trying every single combination.
So I started working on a Lean proof for the "Die Hard Jug Problem"2. I modeled the scenario and after lots of effort wrote a proof that it was impossible to fill the 3 gallon jug with more than 3 gallons of water.
That should be kind of obvious. I intentionally chose an easy proof, but even writing a proof for that was very time consuming. And there's not even hardly any logic going on here!
As cool as Lean was, writing proofs was as hard as people make it out to be, so that wasn't going to work.
Apalache
That's when I was thinking back to Apalache, which I had found through Quint a while ago. Apalache was interesting because it had a different way of checking through all the states than FizzBee and TLC.
It meant that Apalache could often find a bug faster than TLC, even though it could still take a long time to prove that there were no bugs. At this point that sounded good to me, and probably an advantage over FizzBee.
This is a good comparison between the strategies of TLC and Apalache:
I was realizing by now that formal methods were more nuanced than I had previously thought and I wanted to make sure I checked out the alternatives before settling on anything.
TLA⁺
At this point I started looking into TLA⁺, which was the language used by TLC and Apalache.
It was invented by Leslie Lamport who has earned a lot of respect from me over the course of this investigation.
I had tried to learn TLA⁺ a couple of times before, but didn't really have the time, and it was foreign enough not to really stick. But this time around, I was watching videos of Lamport and he was saying all of these things along the lines of what I had been thinking about for the last two weeks.
One of my favorite things I heard him say was in this talk:
Coding should be the easiest, lest important part of programming.
If you're having trouble writing a piece of code you're doing something wrong.
And what you're doing wrong is thinking at the code level and trying to solve the problem that you should be trying to solve above the code level.
Lamport is always pointing out the importance of abstraction. Learning to ignore the details unnecessary for understanding and reasoning about a particular problem or solution. This was exactly what I was realizing thinking about the last year of Roomy development.
What Lamport was saying, though, was that having something like TLA⁺, where you can write down a formal description of your algorithm that is not a programming language, is actually a really important tool for thought.
TLA⁺ gives you something that is not ambiguous English, but also not "real code".
"Real code" needs to be able to be executed, and it needs to deal with all the nuances of the real world, but the fact of the matter is that our algorithms don't deal with most of those details. We make algorithms that can ignore, or abstract away from all of these complicating nuances, and focus only on what is necessary for a particular goal.
Our code can't be right, if our algorithm is wrong, so we need to make sure we carefully design and test our algorithms, and that is a separate concern from writing the code.
Anyway, I was really loving the ideas and philosophy I was seeing.
Quint
Now the TLA⁺ syntax and tooling are pretty dated feeling and a bit annoying to use. It's capabilities are mostly spot on, but there is a desire for a more modern iteration on it. That's where Quint comes in.
Quint is a language that compiles to TLA⁺ and comes with modern tooling, as well as integrations with Apalache and TLC.
It is a thoughtfully designed language that really seems to have made a beneficial iteration on TLA⁺. The syntax is more familiar to programmers, but keeps to the same core concepts underlying TLA⁺. In my opinion, they did this just right.
Quint also comes with a simulator, a REPL, and a type checker. These are all incredibly useful.
The simulator lets you run thousands of random tests per second to check your model. This doesn't give you formal guarantees of having checked every possible execution for a certain number of steps, but it is a fast sanity check. It is great for finding instances where your model does something you might be looking for, good or bad.
The REPL lets you execute steps and explore state interactively, which is great for trying things out and understanding your system
The Type Checker is super appreciated and makes it much easier to verify that you spec itself doesn't have bugs or oversights.
It even has "Runs" which can be used like unit tests and let you execute particular scenarios to make sure things behave like you expected. This is something that you don't have in TLA⁺ and it has a lot of practical value.
Using Quint for Roomy
As soon as I settled on Quint I started trying it out for anything I could. When I started, we were in the middle of evaluating existing private data systems for ATProto and I used a Quint file like a notepad just to put which features each supported, and which features we needed. Then I used the set datatype to calculate the features that we needed that were missing from all the existing options.
That wasn't really that useful, but it did help me start to get a grip on the Quint syntax and data types.
Specifying the Arbiter
The first real use-case I had for quint was specifying the arbiter that we plan on using for Roomy.
I think diagrams are often a good first step, kind of like a pre-specification, so I started with that.
This had more than the arbiter in it, because we were trying to plan out how Roomy would work with the permissioned data proposal. After iterating over the diagram it was easier to tell what data the arbiter was going to need.
Then I started working on the Quint specification.
As I started adding the data types and state that we would need I realized something awesome. Quint was letting me do one of my favorite things that I tended to do in Rust. It let me get all the different kinds of data I would need out on the table so I could start to put them together.
Rust is great for this because it has a powerful type system, but it's also not as simple and abstract as Quint. Quint doesn't make me think about whether I forgot to derive Clone or PartialEq on my Rust structs so that I can put them in a HashMap. Quint lets me focus just on the actual logic.
This is incredibly exciting to me! There are so many times in the past where to really understand a problem, I needed to start writing it down, unambiguously, to find all the holes in my thinking. The best option I had at the time was "normal" programming code, preferably in Rust, because it had great type checking and validation that would help me find flaws in my thinking earlier.
That is what Quint was letting me do, but it was designed specifically for this kind of use case. I could model abstract problems more easily and with less typing and distraction. It was no more complicated than necessary. In fact, it has serious limitations that really helped to make sure I understood the problem.
So far, I think this has saved me days, at least, working on the arbiter. The recursive role resolution in the arbiter is surprisingly nuanced. There were access levels I thought I needed that I actually didn't, and then there were ones that I didn't think of that I found we needed later. The only way to find all of these nuances is to work through the whole problem.
What data do you need? What are all the things you can do? You need to actually walk through it to really know. Quint makes that easy to do, while almost forcing you to think about it all.
Iterating on the Quit specification is quite fast and easy. You are able to ignore things that programming languages don't let you ignore, freeing you to think about the problem. You can catch errors earlier, and fix them quicker.
Model Checking Isn't the Only Advantage
It's interesting to note that all the advantages I've experienced so far have not been from having a model checker. I've been using the simulator and the REPL mostly to explore how the system behaves.
I'll run the simulator and it will try a bunch of different random actions and show me how the state of the system changed with each action. This alone has been handy because you can often see a weird behavior while you're in the middle of iterating on the spec. Or maybe it will find a crash because you accessed a record that didn't exist or something.
I can also run the REPL and interactively call specific actions to see how things work out. Though just running the simulator and getting a random run was actually how I ended up doing things most of the time. It's surprisingly handy.
The model checker is mostly used when you need to make sure some crucial property is satisfied for every possible state of the system. For the arbiter, so far, most of the properties that I need are hard to express as invariants over the state of the system. There are a few, like that the $admin role always exists and there is always at least one Owner.
But most of the properties that I need are going to end up in test cases, probably, that will run example scenarios and test that users get the access that we want them to have in those scenarios. The reason for this is that most of the "rules" are built around our "human" understanding of "in this situation, I want this to happen". Not as much like a distributed consensus algorithm when we say, "we need to make sure there are never two nodes thinking they are leaders in any possible state".
Forcing You to Think About Async State
One of the most interesting things to me has been something that Quint inherits from TLA⁺. Models are atomic state machines made up of two things:
Global state
The actions that can change that state
What this does is force you to think about async execution and states that you didn't even know you had.
For example, in JavaScript we might do something like this:
const resp = await fetch('https://bsky.app')That's pretty simple, right? We fetch a web request and wait for the result. You aren't allowed to do this in Quint.
All actions must be atomic and synchronous. So what does that mean for a web fetch? It means that we have to have a new state where we are waiting for that web request.
This is a real-life state that you have in your app, but you aren't thinking about it! Every single time you type await, or, in JavaScript, really every time you call a function involving promises, you are introducing a new state and you don't know what will happen next in what order.
Quint makes you track this and be explicit. What state does your algorithm actually need? This is stuff you can't miss when designing distributed systems. And honestly, in Roomy's frontend, we have suffered a lot of async JavaScript code that really was hard to keep track of. As soon as you deal with async code you are dealing with a concurrent system that can quickly get difficult to reason about.
Being forced to think about these things is great and makes you want to reduce the amount of state and async tasks as much absolutely possible. This is great, and similar to how Rust's borrow checker makes you want to simplify and write code that isn't as hard to make reliable.
Making you explicitly define the "waiting" state also makes you realize how there are all these other things that might happen while you are in the middle of waiting for something. The request might time out, it might come back OK, it might come back with an error, etc.
I'm also considering writing the Rust implementation of the arbiter in this pure, synchronous state machine style. I could wrap the synchronous implementation in nice async Rust functions if I wanted to.
But it's actually really cool to me how portable a synchronous state machine is. I could compile it to WASM and wrap it in JS in the browser to have a hardened arbiter core that has all the guarantees of my Quint model. I could do the same with a C API to make it usable from practically any other programming language, too.
The Quint state machine has essentially forced me to make the most minimal, portable API I possibly could.
Model-based Testing & Trace Validation
Once you've written, tested, and verified your model, you still have to make an actual implementation. And if your implementation is separate from your model, then you want to make sure it matches.
There are two cool options for making sure your implementation matches the spec: model-based testing and trace validation.
In model-based testing you use your model to generate potentially thousands of test cases and you run those against your implementation. The tests will make sure that your implementation returns the same values as the Quint specification, with the test harness orchestrating and converting data types for your implementation as necessary3.
With trace validation you record state transitions in production, and export those as traces. You can then use the Quint simulator to make sure that the transitions in the traces match the specification. This allows you to catch cases where your production deployment violates an expected state transition, and comes with the added benefit of tracking all the states that led up to the failure, too.
These strategies are similar in some ways to Fuzz testing, but can be far more powerful because they focus on entire behaviors, not just random inputs.
I've experimented a little bit with model-based testing already, and I will get more into it as soon as I start writing the Rust implementation of the arbiter.
Quint & AI
As I was learning Quint an interesting idea hit me. This "spec driven development" idea that some people have about AI, really doesn't work, in my opinion, because of two major flaws:
English specifications are ambiguous and have gaps. Every gap will be filled in by the AI and will depend on its judgment.
When writing such an ambiguous spec, not required to formalize your thinking, you aren't given a chance to let the spec change your own ideas. It is a fully top-down process like the waterfall model which has been shown to fail time and time again.
But if your specification was written in Quint, both of those issues largely go away. Thinking back to Lamport's "coding should be easy", what if we made sure that the coding was easy, and then only asked the LLMs to do the easy step?
To be completely frank, so far, AI coding has not caught on for me. I don't really like it. A huge part of that is that I need to write! I need to let the code change me as much as I need to change the code. That can't happen if I don't write.
I have to know when the coding is hard, because that means that my abstractions are wrong. If I don't write the code I can't feel the problem and let it shape my understanding.
If I am writing Quint, I get to write. I get to reason. The most important part of programming will probably happen in Quint at this point. Coding should be easy. Programming remains hard, and I still need to be writing to do that hard part.
Another Lamport quote:
“If you’re thinking without writing, you only think you’re thinking.”
Programmers need to write. They need to think! Otherwise, despite what you might feel like, there is a chance that the AI is doing the thinking for you.
Now, truly, I am not criticizing people who use AI to code. Like really, I'm not.
This is a personal perspective, and using AI appropriately for coding is a personal thing. It's up to what you think, what your goals are, and what your workflows are4. I'm not trying to challenge anyone's practices.
But for me personally, my entire growth as an engineer has been built on top of doing the things. It's why I have the skills I have today. It's something I'm very precious about, and I'm protecting that.
This isn't about typing code. It's about truly understanding.
But again, back to Quint, it's letting me do the real programming, and it's the first time I've thought to myself that maybe this is actually a healthy way to integrate AI more deeply into my development workflow. I write the Quint, I learn from the problem, and at that point I'm OK if the AI does the easy part, and I will have automated tests that ensure its behavior matches my specification.
In fact, in this last week, after I started using Quint, they announced that Quint now has it's own company that is spinning up and will provide for this kind of use-case:
The timing is uncanny. They basically read my mind. 🙂 I'm really excited for them, and hope that they get more resources to help make Quint better, whether you're using AI or not.
I'm not sure whether or not this will "unlock" AI dev for me. There's still things I'm hesitant about, but I will try it out.
Making Roomy Bulletproof
We've got a lot of new plans taking shape for Roomy, and it's very important to me that it come out reliable and functional. The reliability isn't just about keeping it working, but keeping it working while we change it.
I've really only barely started using Quint, but I haven't enjoyed programming this much in a while. That might be partially 'cause I've spent a year writing tons of TypeScript running in browsers. 😆
Maybe Quint is not as big a deal as it feels like right now. Feelings come and go. But from what I can tell so far, it may be one of the most important discoveries that I've made in a while.
AI coding, for better or worse, is also kind of forcing people to start using these kinds of tools, so it's kind of nice to feel a little ahead of the curve on that maybe.
Either way, I'm so excited for the future of Roomy! Now that I've written over 11k words in blog posts about it, it's time to get back to it. See you 'round the web!
Oh, and if you're interested, here's the work-in-progress Quint specification for the Roomy arbiter!
Learning Resources
As a bonus, here are some learning resources to check out if you want to look more into things!
This blog post is a great overview and gives a sense of the techniques and timings that might be involved in model checking. There are also got lots of other cool Quint and formal methods stuff on that site, so you might want to browse through there.
The Apalache model checker site has some good links.
The TLA⁺ website, with Lamport's video courses. They have some good justification early on for why this kind of thing is useful.
I just found this SMT solver tutorial while writing this post and I think I'll go through it later.