2

I'm aware of the CPython implementation that holds a small integer cache in the [-5, 256] range, so I understand that a=2 and b=2 will refer to the same memory address (thus causing a is b to return true. Also, if I store a number higher than 256 I should obtain different memory addresses, as follows:

>>> x=500
>>> y=500
>>> x is y
False

However, this is where I get confused:

>>> x,y=500,500
>>> x is y
True

Can anyone explain why this happens, or at least what's different when storing values separately as opposed to storing them both at once?

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Floella
  • 1,279
  • 1
  • 22
  • 41
  • 1
    My unconfirmed guess is that when they're on the same line the interpreter detects that the constants are the same and optimizes the bytecode to only have a single constant. When they're typed on separate lines there are two compilation steps and no such optimization. – John Kugelman Dec 09 '22 at 00:04
  • 2
    `x = 500; y = 500` all on one line also leads to `x is y`. – John Coleman Dec 09 '22 at 00:06
  • Try this one; `print(id(500))` / `a=500` / `print(id(500))`. The two "500"s are two different objects. This was not meant for mortal minds to understand. – Tim Roberts Dec 09 '22 at 00:10
  • 2
    Same question? [The `is` operator behaves unexpectedly with non-cached integers](/q/34147515/4518341). Or this? [What's with the integer cache maintained by the interpreter?](/q/15171695/4518341) – wjandrea Dec 09 '22 at 00:13
  • 1
    `x, y = (1,2),(1,2)` also leads to `x is y`, which suggests that this isn't simply integer caching. – John Coleman Dec 09 '22 at 00:15
  • 1
    @wjandrea the accepted answer to that second question seems to answer this one as well. – John Coleman Dec 09 '22 at 00:19
  • "Also, if I store a number higher than 256 I should obtain different memory addresses" **no no no**. That is *not a valid inference at all*. There is *nothing* you can infer about that – juanpa.arrivillaga Jan 03 '23 at 19:46

1 Answers1

1

Whenever in one of the compiling steps the language finds out an easy optimization to take place, it does so.

In this case, when compiling the line a, b = 500, 500 it will create the 500 as a constant in the current context, in the same compilation step - when it sees the second 500 - it is a constant that already exists, so the compiler just reuses it.

Note that if the same optimization was in another layer of the compiler, it would trigger across the number being input in different lines - this behavior should not be asserted as something to be relied upon.

Just as in this case, there are lots of small optimizations that can take place, but are not part of the language specification: just as the small integer cache, this is an implementation specific optimization.

In particular, one should never write code that relies on these optimizations - comparing numbers should always be done with ==, even in the case of the well-known small integers, as they not only will be different across different Python implementations, but also, they could change without notice in the same implementation.

jsbueno
  • 99,910
  • 10
  • 151
  • 209