PyTorch Developer Podcast

Anatomy of a domain library

Episode Summary

What's a domain library? Why do they exist? What do they do for you? What should you know about developing in PyTorch main library versus in a domain library? What's cool about working on domain libraries?

Episode Notes

What's a domain library? Why do they exist? What do they do for you? What should you know about developing in PyTorch main library versus in a domain library? How coupled are they with PyTorch as a whole? What's cool about working on domain libraries?

Further reading.

The classic trio of domain libraries is https://pytorch.org/audio/stable/index.html https://pytorch.org/text/stable/index.html and https://pytorch.org/vision/stable/index.html

Line notes.

why do domain libraries exist? lots of domains specific gadgets,
inappropriate for PyTorch
what does a domain library do
- operator implementations (old days: pure python, not anymore)
  - with autograd support and cuda acceleration
  - esp encoding/decoding, e.g., for domain file formats
    - torchbind for custom objects
    - takes care of getting the dependencies for you
  - esp transformations, e.g., for data augmentation
- models, esp pretrained weights
- datasets
- reference scripts
- full wheel/conda packaging like pytorch
- mobile compatibility
separate repos: external contributors with direct access
- manual sync to fbcode; a lot easier to land code! less
  motion so lower risk
coupling with pytorch? CI typically runs on nightlies
- pytorch itself tests against torchvision, canary against
  extensibility mechanisms
- mostly not using internal tools (e.g., TensorIterator),
  too unstable (this would be good to fix)
closer to research side of pytorch; francesco also part of papers