- Python 3.12 is out! It includes new features and performance improvements ? some contributed by Meta ? that we believe will benefit all Python users.
- We’re sharing details about these new features that we worked closely with the Python community to develop.
This week’s release of
Python 3.12
marks a milestone in our efforts to make our work developing and scaling Python for Meta’s use cases
more accessible to the broader Python community
. Open source at Meta is an important part of how we work and share our learnings with the community.
For several years, we have been sharing our work on Python and CPython through our open source Python runtime,
Cinder
. We have also been working closely with the Python community to introduce new features and optimizations to improve Python’s performance and to allow third parties to experiment with Python runtime optimization more easily.
For the Python 3.12 release, we proposed and implemented features in several areas:
- Immortal Objects
- Type system improvements
- Performance optimizations
- New benchmarks
- Cinder hooks?
Immortal Objects
Immortal Objects – PEP 683
makes it possible to create Python objects that don’t participate in
reference counting
, and will live until Python interpreter shutdown. The
original motivation
for this feature was to reduce memory use in the forking Instagram web-server workload by reducing copy-on-writes triggered by reference-count updates.
Immortal Objects are also an important step towards truly immutable Python objects that can be shared between Python interpreters with no need for locking, for example, via the global interpreter lock (GIL) This can enable improved Python single-process parallelism, whether via
multiple sub-interpreters
or
GIL-free multi-threading
.
Type system improvements
The engineering team behind
Pyre
, an open source Python type-checker, authored and implemented
PEP 698
to add a
@typing.override
decorator, which helps avoid bugs when refactoring class inheritance hierarchies that use method overriding.?
Python developers can apply this new decorator to a subclass method that overrides a method from a base class. As a result, static type checkers will be able to warn developers if the base class is modified such that the overridden method no longer exists. Developers can avoid accidentally turning a method override into dead code. This improves confidence in refactoring and helps keep the code more maintainable.??
Performance optimizations
Faster comprehensions
In previous Python versions, all comprehensions were compiled as nested functions, and every execution of a comprehension allocated and destroyed a single-use Python function object.
In Python 3.12,
PEP 709
inlines all list, dict, and set comprehensions for better performance (up to two times better in the best case).
The implementation and debugging of PEP 709 also uncovered a pre-existing bytecode compiler bug that could result in silently wrong code execution in Python 3.11, which we
fixed
.
Eager asyncio tasks
While Python’s asynchronous programming support enables single-process concurrency, it also has noticeable runtime overhead. Every call to an async function creates an extra coroutine object, and the standard asyncio library will often bring additional overhead in the form of
Task
objects and event loop scheduling.
We observed that, in practice, in a fully async codebase, many async functions are often able to return a result immediately, with no need to suspend. In these cases, if the result of the function is immediately awaited, the coroutine/Task objects and event loop scheduling can be unnecessary overhead.
Cinder eliminates this overhead via eager async execution. If an async function call is awaited immediately, it may return a result directly, without creating a coroutine object. If an
asyncio.gather()
is immediately awaited, and all the async functions it gathers are able to return immediately, there’s no need to ever create a
Task
? or schedule it to the event loop.?
Fully eager async execution would be an invasive (and breaking) change to Python, and doesn’t work as well with the new Python 3.11+
TaskGroup
API for managing concurrent tasks. So in Python 3.12 we added a simpler version of the feature:
eager asyncio tasks
. With eager tasks, coroutine and Task objects are still created when a result is available immediately, but we can sometimes avoid scheduling the task to the event loop and instead resolve it right away.
This is more efficient, but it is a semantic change, so this feature is
opt-in via a custom task factory
.
Other asyncio improvements
We also landed a faster
C implementation of asyncio.current_task
and an
optimization to async task creation
that shows a
win of up to 5 percent on asyncio benchmarks
.
Faster
super()
calls
The new
LOAD_SUPER_ATTR opcode
optimizes code of the form
super().attr
and
super().method(…)
. Such code previously had to allocate, and then throw away, a single-use “super” object each time it ran. Now it has little more overhead than an ordinary method call or attribute access.
Other performance optimizations
We also landed two
hasattr
optimizations
and a
3.8x performance improvement to unittest.mock.Mock
.
New benchmarks
When we optimize Python for internal use at Meta, we are usually able to test and validate our optimizations directly against our real-world workloads. Optimization work on open-source Python doesn’t have such a production workload to test against and needs to be effective (and avoid regression) on a variety of different workloads.
The
Python Performance Benchmark suite
is the standard set of benchmarks used in open-source Python optimization work. During the 3.12 development cycle, we contributed several new benchmarks to it so that it more accurately represents workload characteristics we see at Meta.
We added:
Cinder hooks
Some parts of Cinder (our
JIT compiler
and
Static Python
) wouldn’t make sense as part of upstream CPython (because of limited platform support, C versus C++, semantic changes, and just the size of the code), so our goal is to package these as an independent extension module, CinderX.
This requires a number of new hooks in the core runtime. We landed many of these hooks in Python 3.12:
These improvements will be useful to anyone building a third party JIT compiler or runtime optimizer for CPython. There are also plans to use the watchers internally in core CPython.?
Beyond Python 3.12
Python plays a significant role at Meta. It’s an important part of our infrastructure, including the
Instagram server stack
. And it’s the lingua franca for
our AI/ML work
, highlighted by our development of
PyTorch
, a machine learning framework for a wide range of use cases including computer vision, natural language processing, and more.
Our work with the Python community doesn’t end with the 3.12 release. We are currently discussing a new proposal,
PEP 703
, with the Python Steering Council to remove the GIL and allow Python to run in multiple threads in parallel. This update could greatly help anyone using Python in a multi-threaded environment.?
Meta’s involvement with the Python community also goes beyond code. In 2023, we continued supporting the
Developer in Residence program for Python
and sponsored events like
PyCon US
. We also shared our learnings in talks like “
Breaking Boundaries: Advancements in High-Performance AI/ML through PyTorch’s Python Compiler
” and
posts on the Meta Engineering blog
.?
We are grateful to be a part of this open source community and look forward to working together to move the Python programming language forward.
Acknowledgements
The author would like to acknowledge the following people for their work in contributing to all of these new features: Eddie Elizondo, Vladimir Matveev, Itamar Oren, Steven Troxler, Joshua Xu, Shannon Zhu, Jacob Bower, Pranav Thulasiram Bhat, Ariel Lin, Andrew Frost, and Sam Gross.