PyTorch Developer Podcast

Batching

Episode Summary

PyTorch operates on its input data in a batched manner, typically processing multiple batches of an input at once (rather than once at a time, as would be the case in typical programming). In this podcast, we talk a little about the implications of batching operations in this way, and then also about how PyTorch's API is structured for batching (hint: poorly) and how Numpy introduced a concept of ufunc/gufuncs to standardize over broadcasting and batching behavior. There is some overlap between this podcast and previous podcasts about TensorIterator and vmap; you may also be interested in those episodes.

Episode Notes

Further reading.

ufuncs and gufuncs https://numpy.org/doc/stable/reference/ufuncs.html and https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html
A brief taxonomy of PyTorch operators by shape behavior http://blog.ezyang.com/2020/05/a-brief-taxonomy-of-pytorch-operators-by-shape-behavior/
Related episodes on TensorIterator and vmap https://pytorch-dev-podcast.simplecast.com/episodes/tensoriterator and https://pytorch-dev-podcast.simplecast.com/episodes/vmap