PyTorch Developer Podcast

Dynamo - VariableTracker

Episode Summary

I talk about VariableTracker in Dynamo. VariableTracker is Dynamo's representation of the Python. I talk about some recent changes, namely eager guards and mutable VT. I also tell you how to find the functionality you care about in VariableTracker (https://docs.google.com/document/d/1XDPNK3iNNShg07jRXDOrMk2V_i66u1hEbPltcsxE-3E/edit#heading=h.i6v7gqw5byv6).

Episode Notes

Episode Transcription

Hello everyone, and welcome to the PyTorch Dev podcast.

Today, I want to talk about variable trackers in Dynamo.

What is a variable tracker? Well, to explain the concept, we first have to think about what it is that Dynamo is trying to do. Dynamo is trying to take your Python program, and without actually running it, it wants to simulate the execution of every single operation that happened in your program, so they can find where all the torch operations happen, put them in a graph, and then send them off to the rest of the PyTorch compiler, so that we can compile them into efficient code.

So in order to do this, we need to run the code, but without actually running it. Depending on how complicated your program is, that may require us to do a lot of stuff. Let's say that within your model, you're creating a dictionary, you're putting things into the dictionary, you're taking things out. In order for us to step through every line of code in this situation, we have to actually model this dictionary in some way. But we can't use an actual dictionary. Well, actually, we can, but sometimes these data structures have side effects. They actually do things, print to your terminal, write to other things. So we can't actually run, use the actual data structures in a lot of situations. What we actually need to do, is we need to be able to maintain some parallel universe in Dynamo, which is sort of like the Dynamo-ified universe of all the state in your Python heap, which we can go ahead and do operations on.

For example, if you have a global dictionary, and insert your model code, you're writing, you're incrementing a counter on it, when we symbolically evaluate it to extract out a Dynamo graph, we can't actually mutate that global dictionary. We have to do it in our sort of local universe, that is our simulation. And only when we are done, have some actual code which replays this effect onto the real thing.

So variable trackers are essentially our way of representing the Python heap in a way that Dynamo can work with it, can do analysis on it, without actually having to touch the real Python value.

So if you're working on the Dynamo code base, and you're thinking, hey, where exactly should I implement some logic for how I should be able to update some state, when I do some operation, or how do I model some data structure that someone has written that I need to do some special handling for? Chances are, it's going to live in the variable tracker, one way or another. So variable trackers have a lot of purposes, right? So they encapsulate a state that we can't actually get our hands at by just directly looking at bytecode.

So most frequently, that's because the structure is implemented in C. Like, you know, if you're talking about something like a dictionary or a set from C Python, there is no quote unquote Python implementation, right? It's natively provided by C Python. So anything that is natively provided is not an actual Python bytecode. We have to implement by hand inside Dynamo, so that, you know, we can basically replicate the logic that is living inside the C code, because we're not tracing through the C code. The C code is opaque. There's no way we can look at it.

If you have a user defined class, and that class is written entirely in Python, chances are we don't have to write a variable tracker for that, because in that case, the variable tracker, in that case, it's built out of some simpler parameters, which we have written variable trackers for, but then all the operations, the constructor, you know, accessors, those are all Python bytecode, and we can just step through them in the normal way to actually process them.

Okay, so that's what variable trackers are in a nutshell, and what are some things that are useful to know about them? So we've actually made some changes recently to the variable tracker. We've got the PyTorch two paper coming soon in ASPLOS, but it's got the old version of how variable trackers work, and so I think this podcast is going to be one of the first places where we actually say, besides in GitHub issues, what has changed.

So there are two big things that I want to convey. So the first is that variable trackers are in charge of doing guards. Do you remember a guard is how we tell, hey, you know, this execution, this symbolic evaluation that we did, requires certain aspects of the Python heap to be some way, right, like if I do a conditional on a Boolean and it goes down one path the first time, well, the next time I go, I need to make sure that Boolean is the same way, because if it changes, if it goes false, I'll go down this different path, but that's not the path I traced down. I'm not parsing the program, I'm just looking at a particular sequence of execution. So variable trackers are very important for keeping track of guards because we have all these values floating around. If we actually poke at these values, it actually matters what the value is. And so we typically need to do a guard. But we don't want to immediately say, well, the exact state of every single object in the Python heap has to be exactly this way when we, you know, start our analysis, because chances are we're not gonna touch most of these things. We're not gonna touch most of the variables that we actually model in Dynamo. So we only want to actually do guards on things when they actually matter for the execution in hand.

And so the old version of Dynamo, the way it worked was essentially any variable tracker had a set of guards on it, basically saying, if you use this variable tracker in a non-trivial way, here are the guards that you need to use, you need to put into maybe the global guard state that's actually getting installed, or maybe some other variable tracker, which was derived off of the original variable tracker so that, you know, all the things you looked at on the variable tracker are valid in the same way.

Turns out there were two problems with this. So one is that it was a lot of pain to like do all this propagation logic, 'cause every time you did something with a variable tracker, you needed to, you know, make sure you didn't forget to collect off all the guards off each of them, you know, blob them together into one giant set and then put that on your new variable tracker. Very easy to forget, you know, very hard to test because to actually test that, you know, you've actually done this right, you have to, you know, set up some program and then change the thing that would have been guarded and make sure things actually get rewritten. And, you know, most of the time, people are just writing tests that are just testing. We can actually get through some code one way to another. So like writing very good tests that test that we are guarding enough, it actually takes a lot of care. And so, you know, it's pretty difficult.

The second problem is that maintaining these sets of all these guards is actually really expensive. Like, you know, Python is not a spring chicken, and then if you have these giant sets with tons and tons of objects that, you know, need to be hashed every time, you're putting them in the set, it actually was materially making an impact on how quickly you could dino trace through things. And so on our like open source benchmark suite, our, you know, tracing times are not so bad, but we've been using PyTorch 2 on a lot of internal workloads, and these workloads have tons and tons of Python code. And there's like, sometimes it would take hours for Dynamo, just Dynamo, not even the compiler, not even Inductor or Trident, just Dynamo to get through all of that code. And part of it was we were just, you know, shunting all these guards around, you know, kind of difficult to deal with.

So Jason Ansel did a patch to make us not have to do this, and the new world order is this. When you have a variable tracker, we have guarded on it. That's it, right? So if you have your variable tracker in your hands, there's all sorts of things you can access on it, and we're just going to assume that we have already guarded on everything needed on the variable tracker in that case. So there's no propagation needed, right? Once you've got the variable tracker, we can assume that we already have the guards in question.

Now, sometimes this lazy behavior that we had before is good, right? Like say I have a bunch of arguments to your function, I don't want to actually guard in all of them exactly. So there's some amount of laziness that we have for some variable trackers, which is that you can have a variable tracker, which doesn't actually exist yet. We haven't actually populated it into an honest to goodness variable tracker. The first time you poke at it and you're like, "Hey, tell me what this attribute is. "Tell me what the value of the Boolean is." Then we actually populated it into a real variable tracker and install all the guards. So there's specific laziness in various parts of the code base. The new structure, I think, is very nice. It reduced a lot of the administrative burden we had to do, and it made stuff a lot faster. So, all right, so variable trackers, right? If you've got a variable tracker, the obvious thing, which is that you can access it, however you like, that will work. And when you create a new variable tracker, you're responsible for making sure at that point in time you install all the guards you need.

Okay, so I talked about how guards work with variable trackers. There's actually another update, which is pretty nice, and this is from Michael Lazos, landed in December, and this is what we're calling mutable variable tracker. So another thing that you may not have realized about variable tracker in the old days is that variable trackers were actually implemented as immutable data structures. The Haskeller in me is like, "Hooray, why were they implemented as immutable data structures?" Well, the motivating reason for making them immutable was to support this checkpointing thing that we do in Dynamo.

So let me explain what's going on with checkpointing. So with checkpointing, the reason why we need a checkpoint in Dynamo is that sometimes we will be symbolically executing some code, and we will be like, "Oh no, we messed up. We need to rewind the state of our execution back to some point, some earlier point in time, where we can actually go ahead and insert a graph break." And the canonical example of this is if you're inlining a function, right? So if I've got some code in Dynamo and I'm tracing through it happily, I hit a function call, I start inlining the function call, and then inside that function call, I have a graph break. What do I do in this situation? Well, if I had some sort of fancy multi-call frame reconstruction logic, the way I could deal with this is just by doing a graph right right then and there. But we actually don't have this logic. Someone should implement it, by the way, this would be great. So because we don't have this logic, what I had to do is I have to rewind execution back to when I was about to call into the function I inline. And at that point in time, I do the graph break. So how can I do this rewinding?

Well, if I have a checkpointing mechanism, whenever I start an inline function call, I can just checkpoint the state of all the variables in my demo program and then just throw out anything else, throw out the new state and reuse my checkpointing state. And so immutable variable trackers make this easier to do in this situation. But there is a cost, right? The cost this is that we actually have to do these as immutable data structures. And that means that simple operations, like let's say you have a list and you're appending to it, normally these appends are O of one. But if you have an immutable variable tracker, then I have to create a new copy of the list every time. And so this ends up being an n squared operation to insert an elements onto the list.

Now, of course, once again, with my Haskell hat on, why don't you just use a more efficient functional data structure? And the answer is yes, you could, but C Python doesn't have very good support for this sort of thing. Because most people in Python are just doing mutable lists, like whatever, like that's the normal thing to do. So you would be in this situation where if you just wanted to make this go faster, you would have to write a big library full of all sorts of immutable data structures. And also it's kind of like a bad idea in a reference-counted language, like Python. Because every time you generate garbage, you generate these new copies of nodes then throw out immediately because you're continuously revving this immutable data structure. You have to spend all this time incrementing and decrementing the ref counts. It's not like in a garbage-collected language where the more garbage you make, the faster your garbage collector runs. Because remember, a garbage collector only needs to traverse the live roots of your object.

So what do we do? So we said, okay, fine, checkpointing is cool, but we actually don't need it. And the reason we don't need checkpointing is remember this thing, right? Dynamo is working in this alternate universe. It is, you know, symbolic evaluating your program without modifying the original program state. So we have an ultimate checkpoint, which is at the very beginning of your program, that basically tells you what all the state is when we haven't touched that at all. So we don't have to, like, actually checkpoint midway through. If we need to rerun Dynamo, we can just rewind all the way back to the beginning and then run again from the start. And so we don't need immutable value chuckles. So my allows us got rid of a mutil of variable charters. They're now mutable. You can mutate them in the normal way you expect. And, you know, life is good. And this also made some of our internal tracing a lot faster.

Okay, so I told you about why variable tracker exist and some of the changes that went on. One more thing I want to say is how to find your way around variable trackers in the Dynamo code base. So there are a lot of variable trackers and sometimes it can be a bit bewildering to try to figure out, like, which variable tracker should I use? And we don't have really that good of a organization for the variable trackers, but there is some logic to it, right? So in particular, we're trying to organize basically the chunks of C code that we are simulating in Dynamo into sort of, you know, various logical things instead of just blogging them into one giant thing. So if you think about it that way, this will tell you about sort of where things are.

So in particular, if you have a completely immutable state in Python, like, you know, if it's a literal, like an integer or float, we typically model these as constant variables, right? There's also an enum variable for doing enum specifically. If you have some state which is immutable, so you can't actually modify it, then we tend to organize the variable tracker subclass based on where it comes from, right? If it comes from PyTorch, then it's a torch variable. If it comes from C Python, it's a built-in variable. If it comes from NumPy, it's a NumPy variable, right? Like, we basically say, where does the code live? And then we just go ahead and put the code in those locations. Now, these are giant variables, right? Because, you know, like, think about torch variable, right? Like, we have tons and tons of stateless C code because every single function in the PyTorch API counts as, you know, something we need to model in variables. So these classes tend to be very big, but, you know, at a high level, the organization is based on, you know, where you can find it.

Similarly, if you have some state, and, sorry, something in C, and it is stateful, then that's the situation in time where you get the normal thing where you have a dedicated variable per object. So we have a tensor variable, we have a list variable, we have a set variable, we have a dict variable. You need to introduce anything else like that. You're probably gonna make a new variable subclass because state needs to be handled specially. You need to, you know, write logic for how to replay changes to the state back to the original variables, stuff like that. And then finally, for things that are implemented in Python and we can inline into them, we have a big pile of, you know, user-blah variables, like user function variable, user-defined class variable, user-defined object variable. These tend to be actually relatively simple because we don't need any special smarts, right? We just are going to plan to inline the bytecode for them to actually implement them.

So that's everything I wanted to say about variable tracker today. See you next time.