GIL in Python Will Be No More

# GIL in Python Will Be No More ## Metadata - Author: [[@dzhulgakov on Twitter]] - Full Title: GIL in Python Will Be No... - Category: #tweets - URL: https://twitter.com/dzhulgakov/status/1685667015800066048 ## Highlights - GIL in Python will be no more. Huge win for AI ecosystem. Congrats to @colesbury - it took 4+ years of amazing engineering and advocacy. Many parts of @PyTorch could become simpler: DataLoader, Multi-gpu support (DDP), Python deployment (torch::deploy), … Here’s why. 🧵 https://t.co/Sb7qi4YIEA ([View Tweet](https://twitter.com/dzhulgakov/status/1685667015800066048)) - GIL has been in Python from the beginning. It ensures that only a single Python instruction can execute at any given time. Very convenient and safe - Python just can’t crash on memory corruption from a data race like C++ often does. But pure Python gets barely faster with threads ([View Tweet](https://twitter.com/dzhulgakov/status/1685667017607757824)) - GIL often hasn’t been an issue for AI compute like @numpy and @pytorch as they release the GIL for the duration of the operation. Thus two torch.matmul’s can parallelize. But it still adds up. ([View Tweet](https://twitter.com/dzhulgakov/status/1685667019109134336)) - GPUs keep getting faster and keeping even a single one busy from python is often a challenge (hence torch.compile and friends). This made driving many GPUs from a process slow: DataParallel -> DistributedDataParallel switch https://t.co/XFOV6XQC2c ([View Tweet](https://twitter.com/dzhulgakov/status/1685667020875108353)) - Data preprocessing is often heavy on Python, that’s why DataLoader in PyTorch uses multiple processes for num_workers > 0. But it comes with its own complexities like https://t.co/JFNg2Ka7sN ([View Tweet](https://twitter.com/dzhulgakov/status/1685667022942973954)) - GIL makes deployment for inference hard. Even if model is fast enough for serving in Python, some level of concurrency is needed. And multiprocessing is very intrusive. We built torch::deploy/multiply to deal with it, but it needs separate Py build https://t.co/veCyL4rvW9 ([View Tweet](https://twitter.com/dzhulgakov/status/1685667024503259136)) - In short, no-GIL allows Python leverage modern hardware much more naturally. The implementation of this PEP is brilliant too: biased reference counting is neat. Definitely worth a read: https://t.co/PmmAa48v7W ([View Tweet](https://twitter.com/dzhulgakov/status/1685667026038370305)) - In other words, AGI is even closer now as it doesn’t have to suffer from thread contention :) https://t.co/RdJ4GvmnQy ([View Tweet](https://twitter.com/dzhulgakov/status/1685667840043700224))