micropython

Annotated docs/concepts.txt

247:862f407f999c
2009-07-10 Paul Boddie Added notes about matrix and list representations of the tables, along with a discussion of element sizes and fields.
paul@199 1
Concepts
paul@199 2
========
paul@199 3
paul@201 4
This document describes the underlying concepts employed in micropython.
paul@201 5
paul@201 6
  * Namespaces and attribute definition
paul@199 7
  * Contexts and values
paul@200 8
  * Tables, attributes and lookups
paul@199 9
  * Objects and structures
paul@200 10
  * Parameters and lookups
paul@200 11
  * Instantiation
paul@222 12
  * Register usage
paul@245 13
  * List and tuple representations
paul@199 14
paul@201 15
Namespaces and Attribute Definition
paul@201 16
===================================
paul@201 17
paul@201 18
Namespaces are any objects which can retain attributes.
paul@201 19
paul@201 20
  * Module attributes are defined either at the module level or by global
paul@201 21
    statements.
paul@201 22
  * Class attributes are defined only within class statements.
paul@201 23
  * Instance attributes are defined only by assignments to attributes of self
paul@201 24
    within __init__ methods.
paul@201 25
paul@201 26
These restrictions apply because such attributes are thus explicitly declared,
paul@201 27
permitting the use of tables (described below). Module and class attributes
paul@201 28
can also be finalised in this way in order to permit certain optimisations.
paul@201 29
paul@243 30
An additional restriction required for the current implementation of tables
paul@243 31
(as described below) applies to class definitions: each class must be defined
paul@243 32
using a unique name; repeated definition of classes having the same name is
paul@243 33
thus not permitted. This restriction arises from the use of the "full name" of
paul@243 34
a class as a key to the object table, where the full name is a qualified path
paul@243 35
via the module hierarchy ending with the name of the class.
paul@243 36
paul@201 37
See rejected.txt for complicating mechanisms which could be applied to
paul@201 38
mitigate the effects of these restrictions on optimisations.
paul@201 39
paul@199 40
Contexts and Values
paul@199 41
===================
paul@199 42
paul@199 43
Values are used as the common reference representation in micropython: as
paul@199 44
stored representations of attributes (of classes, instances, modules, and
paul@199 45
other objects supporting attribute-like entities) as well as the stored values
paul@199 46
associated with names in functions and methods.
paul@199 47
paul@199 48
Unlike other implementations, micropython does not create things like bound
paul@199 49
method objects for individual instances. Instead, all objects are referenced
paul@199 50
using a context, reference pair:
paul@199 51
paul@199 52
Value Layout
paul@199 53
------------
paul@199 54
paul@199 55
    0           1
paul@199 56
    context     object
paul@199 57
    reference   reference
paul@199 58
paul@199 59
Specific implementations might reverse this ordering for optimisation
paul@199 60
purposes.
paul@199 61
paul@199 62
Rationale
paul@199 63
---------
paul@199 64
paul@199 65
To reduce the number of created objects whilst retaining the ability to
paul@199 66
support bound method invocations. The context indicates the context in which
paul@199 67
an invocation is performed, typically the owner of the method.
paul@199 68
paul@199 69
Usage
paul@199 70
-----
paul@199 71
paul@199 72
The context may be inserted as the first argument when a value is involved in
paul@199 73
an invocation. This argument may then be omitted from the invocation if its
paul@199 74
usage is not appropriate.
paul@199 75
paul@199 76
See invocation.txt for details.
paul@199 77
paul@237 78
Context Value Types
paul@237 79
-------------------
paul@237 80
paul@237 81
The following types of context value exist:
paul@237 82
paul@237 83
    Type            Usage                           Transformations
paul@237 84
    ----            -----                           ---------------
paul@237 85
paul@237 86
    Replaceable     With functions (not methods)    May be replaced with an
paul@237 87
                                                    instance or a class when a
paul@237 88
                                                    value is stored on an
paul@237 89
                                                    instance or class
paul@237 90
paul@237 91
    Placeholder     With classes                    May not be replaced
paul@237 92
paul@237 93
    Instance        With instances (and constants)  May not be replaced
paul@237 94
                    or functions as methods
paul@237 95
paul@237 96
    Class           With functions as methods       May be replaced when a
paul@237 97
                                                    value is loaded from a
paul@237 98
                                                    class attribute via an
paul@237 99
                                                    instance
paul@237 100
paul@199 101
Contexts in Acquired Values
paul@199 102
---------------------------
paul@199 103
paul@237 104
There are four classes of instructions which provide values:
paul@199 105
paul@199 106
    Instruction         Purpose                 Context Operations
paul@199 107
    -----------         -------                 ------------------
paul@199 108
paul@237 109
1)  LoadConst           Load module, constant   Use loaded object with itself
paul@237 110
                                                as context
paul@199 111
paul@237 112
2)  LoadFunction        Load function           Combine replaceable context
paul@237 113
                                                with loaded object
paul@223 114
paul@237 115
3)  LoadClass           Load class              Combine placeholder context
paul@237 116
                                                with loaded object
paul@237 117
paul@237 118
4)  LoadAddress*        Load attribute from     Preserve or override stored
paul@201 119
    LoadAttr*           class, module,          context (as described in
paul@201 120
                        instance                assignment.txt)
paul@199 121
paul@199 122
In order to comply with traditional Python behaviour, contexts may or may not
paul@199 123
represent the object from which an attribute has been acquired.
paul@199 124
paul@199 125
See assignment.txt for details.
paul@199 126
paul@199 127
Contexts in Stored Values
paul@199 128
-------------------------
paul@199 129
paul@223 130
There are two classes of instruction for storing values:
paul@199 131
paul@223 132
    Instruction         Purpose                 Context Operations
paul@223 133
    -----------         -------                 ------------------
paul@199 134
paul@223 135
1)  StoreAddress        Store attribute in a    Preserve context; note that no
paul@223 136
                        known object            test for class attribute
paul@223 137
                                                assignment should be necessary
paul@223 138
                                                since this instruction should only
paul@223 139
                                                be generated for module globals
paul@199 140
paul@223 141
    StoreAttr           Store attribute in an   Preserve context; note that no
paul@223 142
                        instance                test for class attribute
paul@223 143
                                                assignment should be necessary
paul@223 144
                                                since this instruction should only
paul@223 145
                                                be generated for self accesses
paul@199 146
paul@223 147
    StoreAttrIndex      Store attribute in an   Preserve context; since the index
paul@223 148
                        unknown object          lookup could yield a class
paul@223 149
                                                attribute, a test of the nature of
paul@223 150
                                                the nature of the structure is
paul@223 151
                                                necessary in order to prevent
paul@223 152
                                                assignments to classes
paul@199 153
paul@223 154
2)  StoreAddressContext Store attribute in a    Override context if appropriate;
paul@237 155
                        known object            if the value has a replaceable
paul@237 156
                                                context, permit the target to
paul@237 157
                                                take ownership of the value
paul@199 158
paul@199 159
See assignment.txt for details.
paul@199 160
paul@200 161
Tables, Attributes and Lookups
paul@200 162
==============================
paul@199 163
paul@199 164
Attribute lookups, where the exact location of an object attribute is deduced,
paul@199 165
are performed differently in micropython than in other implementations.
paul@199 166
Instead of providing attribute dictionaries, in which attributes are found,
paul@199 167
attributes are located at fixed places in object structures (described below)
paul@199 168
and their locations are stored using a special representation known as a
paul@199 169
table.
paul@199 170
paul@199 171
For a given program, a table can be considered as being like a matrix mapping
paul@199 172
classes to attribute names. For example:
paul@199 173
paul@199 174
    class A:
paul@200 175
        # instances have attributes x, y
paul@199 176
paul@199 177
    class B(A):
paul@200 178
        # introduces attribute z for instances
paul@199 179
paul@199 180
    class C:
paul@200 181
        # instances have attributes a, b, z
paul@199 182
paul@200 183
This would provide the following table, referred to as an object table in the
paul@200 184
context of classes and instances:
paul@199 185
paul@199 186
    Class/attr      a   b   x   y   z
paul@199 187
paul@199 188
    A                       1   2
paul@199 189
    B                       1   2   3
paul@199 190
    C               1   2           3
paul@199 191
paul@199 192
A limitation of this representation is that instance attributes may not shadow
paul@199 193
class attributes: if an attribute with a given name is not defined on an
paul@199 194
instance, an attribute with the same name cannot be provided by the class of
paul@199 195
the instance or any superclass of the instance's class.
paul@199 196
paul@199 197
The table can be compacted using a representation known as a displacement
paul@200 198
list (referred to as an object list in this context):
paul@199 199
paul@199 200
                Classes with attribute offsets
paul@199 201
paul@199 202
    classcode   A
paul@199 203
    attrcode    a   b   x   y   z
paul@199 204
paul@199 205
                        B
paul@199 206
                        a   b   x   y   z
paul@199 207
paul@199 208
                                            C
paul@199 209
                                            a   b   x   y   z
paul@199 210
paul@199 211
    List        .   .   1   2   1   2   3   1   2   .   .   3
paul@199 212
paul@199 213
Here, the classcode refers to the offset in the list at which a class's
paul@199 214
attributes are defined, whereas the attrcode defines the offset within a
paul@199 215
region of attributes corresponding to a single attribute of a given name.
paul@199 216
paul@200 217
Attribute Locations
paul@200 218
-------------------
paul@200 219
paul@200 220
The locations stored in table/list elements are for instance attributes
paul@200 221
relative to the location of the instance, whereas those for class attributes
paul@200 222
and modules are absolute addresses (although these could also be changed to
paul@242 223
object-relative locations). Thus, each occupied table cell has the following
paul@242 224
structure:
paul@242 225
paul@242 226
    attrcode, uses-absolute-address, address (or location)
paul@200 227
paul@247 228
This could be given instead as follows:
paul@247 229
paul@247 230
    attrcode, is-class-or-module, location
paul@247 231
paul@247 232
Since uses-absolute-address corresponds to is-class-or-module, and since there
paul@247 233
is a need to test for classes and modules to prevent assignment to attributes
paul@247 234
of such objects, this particular information is always required.
paul@247 235
paul@247 236
Comparing Tables as Matrices with Displacement Lists
paul@247 237
----------------------------------------------------
paul@247 238
paul@247 239
Although displacement lists can provide reasonable levels of compaction for
paul@247 240
attribute data, the element size is larger than that required for a simple
paul@247 241
matrix: the attribute code (attrcode) need not be stored since each element
paul@247 242
unambiguously refers to the availability of an attribute for a particular
paul@247 243
class or instance of that class, and so the data at a given element need not
paul@247 244
be tested for relevance to a given attribute access operation.
paul@247 245
paul@247 246
Given a program with 20 object types and 100 attribute types, a matrix would
paul@247 247
occupy the following amount of space:
paul@247 248
paul@247 249
    number of object types * number of attribute types * element size
paul@247 250
  = 20 * 100 * 1 (assuming that a single location is sufficient for an element)
paul@247 251
  = 2000
paul@247 252
paul@247 253
In contrast, given a compaction to 40% of the matrix size (without considering
paul@247 254
element size) in a displacement list, the amount of space would be as follows:
paul@247 255
paul@247 256
    number of elements * element size
paul@247 257
  = 40% * (20 * 100) * 2 (assuming that one additional location is required)
paul@247 258
  = 1600
paul@247 259
paul@247 260
Consequently, the principal overhead of using a displacement list is likely to
paul@247 261
be in the need to check element relevance when retrieving values from such a
paul@247 262
list.
paul@247 263
paul@199 264
Objects and Structures 
paul@199 265
======================
paul@199 266
paul@199 267
As well as references, micropython needs to have actual objects to refer to.
paul@199 268
Since classes, functions and instances are all objects, it is desirable that
paul@199 269
certain common features and operations are supported in the same way for all
paul@199 270
of these things. To permit this, a common data structure format is used.
paul@199 271
paul@215 272
    Header....................................................  Attributes.................
paul@200 273
paul@215 274
    Identifier  Identifier  Address     Identifier  Size        Object      Object      ...
paul@199 275
paul@215 276
    0           1           2           3           4           5           6           7
paul@215 277
    classcode   attrcode/   invocation  funccode    size        __class__   attribute   ...
paul@215 278
                instance    reference                           reference   reference
paul@215 279
                status
paul@199 280
paul@206 281
Classcode
paul@206 282
---------
paul@206 283
paul@206 284
Used in attribute lookup.
paul@206 285
paul@199 286
Here, the classcode refers to the attribute lookup table for the object (as
paul@200 287
described above). Classes and instances share the same classcode, and their
paul@200 288
structures reflect this. Functions all belong to the same type and thus employ
paul@200 289
the classcode for the function built-in type, whereas modules have distinct
paul@200 290
types since they must support different sets of attributes.
paul@199 291
paul@206 292
Attrcode
paul@206 293
--------
paul@206 294
paul@206 295
Used to test instances for membership of classes (or descendants of classes).
paul@206 296
paul@242 297
Since, in traditional Python, classes are only ever instances of some generic
paul@242 298
built-in type, support for testing such a relationship directly has been
paul@207 299
removed and the attrcode is not specified for classes: the presence of an
paul@242 300
attrcode indicates that a given object is an instance. In addition, support
paul@242 301
has also been removed for testing modules in the same way, meaning that the
paul@242 302
attrcode is also not specified for modules.
paul@206 303
paul@215 304
See the "Testing Instance Compatibility with Classes (Attrcode)" section below
paul@215 305
for details of attrcodes.
paul@214 306
paul@213 307
Invocation Reference
paul@213 308
--------------------
paul@213 309
paul@213 310
Used when an object is called.
paul@213 311
paul@213 312
This is the address of the code to be executed when an invocation is performed
paul@213 313
on the object.
paul@213 314
paul@215 315
Funccode
paul@215 316
--------
paul@213 317
paul@215 318
Used to look up argument positions by name.
paul@213 319
paul@215 320
The strategy with keyword arguments in micropython is to attempt to position
paul@215 321
such arguments in the invocation frame as it is being constructed.
paul@215 322
paul@215 323
See the "Parameters and Lookups" section for more information.
paul@215 324
paul@215 325
Size
paul@215 326
----
paul@215 327
paul@219 328
Used to indicate the size of an object including attributes.
paul@213 329
paul@209 330
Attributes
paul@209 331
----------
paul@209 332
paul@209 333
For classes, modules and instances, the attributes in the structure correspond
paul@209 334
to the attributes of each kind of object. For functions, however, the
paul@209 335
attributes in the structure correspond to the default arguments for each
paul@209 336
function, if any.
paul@209 337
paul@206 338
Structure Types
paul@206 339
---------------
paul@206 340
paul@199 341
Class C:
paul@199 342
paul@215 343
    0           1           2           3           4           5           6           7
paul@215 344
    classcode   (unused)    __new__     funccode    size        class type  attribute   ...
paul@215 345
    for C                   reference   for                     reference   reference
paul@215 346
                                        instantiator
paul@199 347
paul@199 348
Instance of C:
paul@199 349
paul@215 350
    0           1           2           3           4           5           6           7
paul@215 351
    classcode   attrcode    C.__call__  funccode    size        class C     attribute   ...
paul@215 352
    for C       for C       reference   for                     reference   reference
paul@215 353
                            (if exists) C.__call__
paul@199 354
paul@200 355
Function f:
paul@199 356
paul@215 357
    0           1           2           3           4           5           6           7
paul@215 358
    classcode   attrcode    code        funccode    size        class       attribute   ...
paul@215 359
    for         for         reference                           function    (default)
paul@215 360
    function    function                                        reference   reference
paul@200 361
paul@200 362
Module m:
paul@200 363
paul@215 364
    0           1           2           3           4           5           6           7
paul@219 365
    classcode   attrcode    (unused)    (unused)    (unused)    module type attribute   ...
paul@215 366
    for m       for m                                           reference   (global)
paul@215 367
                                                                            reference
paul@200 368
paul@200 369
The __class__ Attribute
paul@200 370
-----------------------
paul@200 371
paul@200 372
All objects support the __class__ attribute and this is illustrated above with
paul@200 373
the first attribute.
paul@200 374
paul@200 375
Class: refers to the type class (type.__class__ also refers to the type class)
paul@200 376
Function: refers to the function class
paul@200 377
Instance: refers to the class instantiated to make the object
paul@200 378
paul@203 379
Lists and Tuples
paul@203 380
----------------
paul@203 381
paul@203 382
The built-in list and tuple sequences employ variable length structures using
paul@203 383
the attribute locations to store their elements, where each element is a
paul@203 384
reference to a separately stored object.
paul@203 385
paul@214 386
Testing Instance Compatibility with Classes (Attrcode)
paul@200 387
------------------------------------------------------
paul@200 388
paul@200 389
Although it would be possible to have a data structure mapping classes to
paul@200 390
compatible classes, such as a matrix indicating the subclasses (or
paul@200 391
superclasses) of each class, the need to retain the key to such a data
paul@200 392
structure for each class might introduce a noticeable overhead.
paul@200 393
paul@200 394
Instead of having a separate structure, descendant classes of each class are
paul@200 395
inserted as special attributes into the object table. This requires an extra
paul@200 396
key to be retained, since each class must provide its own attribute code such
paul@200 397
that upon an instance/class compatibility test, the code may be obtained and
paul@200 398
used in the object table.
paul@200 399
paul@200 400
Invocation and Code References
paul@200 401
------------------------------
paul@200 402
paul@200 403
Modules: there is no meaningful invocation reference since modules cannot be
paul@200 404
explicitly called.
paul@200 405
paul@200 406
Functions: a simple code reference is employed pointing to code implementing
paul@200 407
the function. Note that the function locals are completely distinct from this
paul@200 408
structure and are not comparable to attributes. Instead, attributes are
paul@200 409
reserved for default parameter values, although they do not appear in the
paul@200 410
object table described above, appearing instead in a separate parameter table
paul@200 411
described below.
paul@200 412
paul@200 413
Classes: given that classes must be invoked in order to create instances, a
paul@200 414
reference must be provided in class structures. However, this reference does
paul@200 415
not point directly at the __init__ method of the class. Instead, the
paul@200 416
referenced code belongs to a special initialiser function, __new__, consisting
paul@200 417
of the following instructions:
paul@200 418
paul@200 419
    create instance for C
paul@200 420
    call C.__init__(instance, ...)
paul@200 421
    return instance
paul@200 422
paul@200 423
Instances: each instance employs a reference to any __call__ method defined in
paul@200 424
the class hierarchy for the instance, thus maintaining its callable nature.
paul@200 425
paul@200 426
Both classes and modules may contain code in their definitions - the former in
paul@200 427
the "body" of the class, potentially defining attributes, and the latter as
paul@200 428
the "top-level" code in the module, potentially defining attributes/globals -
paul@200 429
but this code is not associated with any invocation target. It is thus
paul@200 430
generated in order of appearance and is not referenced externally.
paul@200 431
paul@200 432
Invocation Operation
paul@200 433
--------------------
paul@200 434
paul@200 435
Consequently, regardless of the object an invocation is always done as
paul@200 436
follows:
paul@200 437
paul@200 438
    get invocation reference from the header
paul@200 439
    jump to reference
paul@200 440
paul@200 441
Additional preparation is necessary before the above code: positional
paul@200 442
arguments must be saved in the invocation frame, and keyword arguments must be
paul@200 443
resolved and saved to the appropriate position in the invocation frame.
paul@200 444
paul@200 445
See invocation.txt for details.
paul@200 446
paul@200 447
Parameters and Lookups 
paul@200 448
======================
paul@200 449
paul@200 450
Since Python supports keyword arguments when making invocations, it becomes
paul@200 451
necessary to record the parameter names associated with each function or
paul@200 452
method. Just as object tables record attributes positions on classes and
paul@200 453
instances, parameter tables record parameter positions in function or method
paul@200 454
parameter lists.
paul@200 455
paul@200 456
For a given program, a parameter table can be considered as being like a
paul@200 457
matrix mapping functions/methods to parameter names. For example:
paul@200 458
paul@200 459
    def f(x, y, z):
paul@200 460
        pass
paul@200 461
paul@200 462
    def g(a, b, c):
paul@200 463
        pass
paul@200 464
paul@200 465
    def h(a, x):
paul@200 466
        pass
paul@200 467
paul@200 468
This would provide the following table, referred to as a parameter table in
paul@200 469
the context of functions and methods:
paul@200 470
paul@200 471
    Function/param  a   b   c   x   y   z
paul@200 472
paul@200 473
    f                           1   2   3
paul@200 474
    g               1   2   3
paul@200 475
    h               1           2
paul@200 476
paul@233 477
Confusion can occur when functions are adopted as methods, since the context
paul@233 478
then occupies the first slot in the invocation frame:
paul@233 479
paul@233 480
    def f(x, y, z):
paul@233 481
        pass
paul@233 482
paul@233 483
    f(x=1, y=2, z=3) -> f(<context>, 1, 2, 3)
paul@233 484
                     -> f(1, 2, 3)
paul@233 485
paul@233 486
    class C:
paul@233 487
        f = f
paul@233 488
paul@233 489
        def g(x, y, z):
paul@233 490
            pass
paul@233 491
paul@233 492
    c = C()
paul@233 493
paul@233 494
    c.f(y=2, z=3) -> f(<context>, 2, 3)
paul@233 495
    c.g(y=2, z=3) -> C.g(<context>, 2, 3)
paul@233 496
paul@200 497
Just as with parameter tables, a displacement list can be prepared from a
paul@200 498
parameter table:
paul@200 499
paul@200 500
                Functions with parameter (attribute) offsets
paul@200 501
paul@200 502
    funccode    f
paul@200 503
    attrcode    a   b   c   x   y   z
paul@200 504
paul@200 505
                                        g
paul@200 506
                                        a   b   c   x   y   z
paul@200 507
paul@200 508
                                                    h
paul@200 509
                                                    a   b   c   x   y   z
paul@200 510
paul@200 511
    List        .   .   .   1   2   3   1   2   3   1   .   .   2   .   .
paul@200 512
paul@200 513
Here, the funccode refers to the offset in the list at which a function's
paul@200 514
parameters are defined, whereas the attrcode defines the offset within a
paul@200 515
region of attributes corresponding to a single parameter of a given name.
paul@200 516
paul@200 517
Instantiation
paul@200 518
=============
paul@200 519
paul@200 520
When instantiating classes, memory must be reserved for the header of the
paul@200 521
resulting instance, along with locations for the attributes of the instance.
paul@200 522
Since the instance header contains data common to all instances of a class, a
paul@200 523
template header is copied to the start of the newly reserved memory region.
paul@222 524
paul@222 525
Register Usage
paul@222 526
==============
paul@222 527
paul@222 528
During code generation, much of the evaluation produces results which are
paul@222 529
implicitly recorded in the "active value" register, and various instructions
paul@222 530
will consume the active value. In addition, some instructions will consume a
paul@222 531
separate "active source value" from a register, typically those which are
paul@222 532
assigning the result of an expression to an assignment target.
paul@222 533
paul@222 534
Since values often need to be retained for later use, a set of temporary
paul@222 535
storage locations are typically employed. However, optimisations may reduce
paul@222 536
the need to use such temporary storage where instructions which provide the
paul@222 537
"active value" can be re-executed and will produce the same result.
paul@245 538
paul@245 539
List and Tuple Representations
paul@245 540
==============================
paul@245 541
paul@245 542
Since tuples have a fixed size, the representation of a tuple instance is
paul@245 543
merely a header describing the size of the entire object, together with a
paul@245 544
sequence of references to the object "stored" at each position in the
paul@245 545
structure. Such references consist of the usual context and reference pair.
paul@245 546
paul@245 547
Lists, however, have a variable size and must be accessible via an unchanging
paul@245 548
location even as more memory is allocated elsewhere to accommodate the
paul@245 549
contents of the list. Consequently, the representation must resemble the
paul@245 550
following:
paul@245 551
paul@245 552
    Structure header for list (size == header plus special attribute)
paul@245 553
    Special attribute referencing the underlying sequence
paul@245 554
paul@245 555
The underlying sequence has a fixed size, like a tuple, but may contain fewer
paul@245 556
elements than the size of the sequence permits:
paul@245 557
paul@245 558
    Special header indicating the current size and allocated size
paul@245 559
    Element
paul@245 560
    ...             <-- current size
paul@245 561
    (Unused space)
paul@245 562
    ...             <-- allocated size
paul@245 563
paul@245 564
This representation permits the allocation of a new sequence when space is
paul@245 565
exhausted in an existing sequence, with the new sequence address stored in the
paul@245 566
main list structure. Since access to the contents of the list must go through
paul@245 567
the main list structure, underlying allocation activities may take place
paul@245 568
without the users of a list having to be aware of such activities.