micropython

Annotated docs/lowlevel.txt

804:c967b47fada4
2014-06-01 Paul Boddie Merged branches. syspython-as-target
paul@766 1
Low-level Implementation Details
paul@766 2
================================
paul@766 3
paul@766 4
Although micropython delegates the generation of low-level program code and
paul@766 5
data to syspython, various considerations of how an eventual program might be
paul@766 6
structured have been used to inform the way micropython represents the details
paul@766 7
of a program. This document describes these considerations and indicates how
paul@766 8
syspython or other technologies might represent a working program.
paul@766 9
paul@768 10
 * Objects and structures 
paul@768 11
 * Instantiation
paul@768 12
 * List and tuple representations
paul@768 13
 * Instruction evaluation model
paul@769 14
 * Exception handling
paul@768 15
paul@766 16
Objects and Structures 
paul@766 17
======================
paul@766 18
paul@766 19
As well as references, micropython needs to have actual objects to refer to.
paul@766 20
Since classes, functions and instances are all objects, it is desirable that
paul@766 21
certain common features and operations are supported in the same way for all
paul@766 22
of these things. To permit this, a common data structure format is used.
paul@766 23
paul@766 24
    Header....................................................  Attributes.................
paul@766 25
paul@766 26
    Identifier  Identifier  Address     Identifier  Size        Object      ...
paul@766 27
paul@766 28
    0           1           2           3           4           5           6
paul@766 29
    classcode   attrcode/   invocation  funccode    size        attribute   ...
paul@766 30
                instance    reference                           reference
paul@766 31
                status
paul@766 32
paul@766 33
Classcode
paul@766 34
---------
paul@766 35
paul@766 36
Used in attribute lookup.
paul@766 37
paul@766 38
Here, the classcode refers to the attribute lookup table for the object (as
paul@766 39
described in concepts.txt). Classes and instances share the same classcode,
paul@766 40
and their structures reflect this. Functions all belong to the same type and
paul@766 41
thus employ the classcode for the function built-in type, whereas modules have
paul@766 42
distinct types since they must support different sets of attributes.
paul@766 43
paul@766 44
Attrcode
paul@766 45
--------
paul@766 46
paul@766 47
Used to test instances for membership of classes (or descendants of classes).
paul@766 48
paul@766 49
Since, in traditional Python, classes are only ever instances of some generic
paul@766 50
built-in type, support for testing such a relationship directly has been
paul@766 51
removed and the attrcode is not specified for classes: the presence of an
paul@766 52
attrcode indicates that a given object is an instance. In addition, support
paul@766 53
has also been removed for testing modules in the same way, meaning that the
paul@766 54
attrcode is also not specified for modules.
paul@766 55
paul@766 56
See the "Testing Instance Compatibility with Classes (Attrcode)" section below
paul@766 57
for details of attrcodes.
paul@766 58
paul@766 59
Invocation Reference
paul@766 60
--------------------
paul@766 61
paul@766 62
Used when an object is called.
paul@766 63
paul@766 64
This is the address of the code to be executed when an invocation is performed
paul@766 65
on the object.
paul@766 66
paul@766 67
Funccode
paul@766 68
--------
paul@766 69
paul@766 70
Used to look up argument positions by name.
paul@766 71
paul@766 72
The strategy with keyword arguments in micropython is to attempt to position
paul@766 73
such arguments in the invocation frame as it is being constructed.
paul@766 74
paul@766 75
See the "Parameters and Lookups" section for more information.
paul@766 76
paul@766 77
Size
paul@766 78
----
paul@766 79
paul@766 80
Used to indicate the size of an object including attributes.
paul@766 81
paul@766 82
Attributes
paul@766 83
----------
paul@766 84
paul@766 85
For classes, modules and instances, the attributes in the structure correspond
paul@766 86
to the attributes of each kind of object. For functions, however, the
paul@766 87
attributes in the structure correspond to the default arguments for each
paul@766 88
function, if any.
paul@766 89
paul@766 90
Structure Types
paul@766 91
---------------
paul@766 92
paul@766 93
Class C:
paul@766 94
paul@766 95
    0           1           2           3           4           5           6
paul@766 96
    classcode   (unused)    __new__     funccode    size        attribute   ...
paul@766 97
    for C                   reference   for                     reference
paul@766 98
                                        instantiator
paul@766 99
paul@766 100
Instance of C:
paul@766 101
paul@766 102
    0           1           2           3           4           5           6
paul@766 103
    classcode   attrcode    C.__call__  funccode    size        attribute   ...
paul@766 104
    for C       for C       reference   for                     reference
paul@766 105
                            (if exists) C.__call__
paul@766 106
paul@766 107
Function f:
paul@766 108
paul@766 109
    0           1           2           3           4           5           6
paul@766 110
    classcode   attrcode    code        funccode    size        attribute   ...
paul@766 111
    for         for         reference                           (default)
paul@766 112
    function    function                                        reference
paul@766 113
paul@766 114
Module m:
paul@766 115
paul@766 116
    0           1           2           3           4           5           6
paul@766 117
    classcode   attrcode    (unused)    (unused)    (unused)    attribute   ...
paul@766 118
    for m       for m                                           (global)
paul@766 119
                                                                reference
paul@766 120
paul@766 121
The __class__ Attribute
paul@766 122
-----------------------
paul@766 123
paul@766 124
All objects should support the __class__ attribute, and in most cases this is
paul@766 125
done using the object table, yielding a common address for all instances of a
paul@766 126
given class.
paul@766 127
paul@766 128
Function: refers to the function class
paul@766 129
Instance: refers to the class instantiated to make the object
paul@766 130
paul@766 131
The object table cannot support two definitions simultaneously for both
paul@766 132
instances and their classes. Consequently, __class__ access on classes must be
paul@766 133
tested for and a special result returned.
paul@766 134
paul@766 135
Class: refers to the type class (type.__class__ also refers to the type class)
paul@766 136
paul@766 137
For convenience, the first attribute of a class will be the common __class__
paul@766 138
attribute for all its instances. As noted above, direct access to this
paul@766 139
attribute will not be possible for classes, and a constant result will be
paul@766 140
returned instead.
paul@766 141
paul@766 142
Lists and Tuples
paul@766 143
----------------
paul@766 144
paul@766 145
The built-in list and tuple sequences employ variable length structures using
paul@766 146
the attribute locations to store their elements, where each element is a
paul@766 147
reference to a separately stored object.
paul@766 148
paul@766 149
Testing Instance Compatibility with Classes (Attrcode)
paul@766 150
------------------------------------------------------
paul@766 151
paul@766 152
Although it would be possible to have a data structure mapping classes to
paul@766 153
compatible classes, such as a matrix indicating the subclasses (or
paul@766 154
superclasses) of each class, the need to retain the key to such a data
paul@766 155
structure for each class might introduce a noticeable overhead.
paul@766 156
paul@766 157
Instead of having a separate structure, descendant classes of each class are
paul@766 158
inserted as special attributes into the object table. This requires an extra
paul@766 159
key to be retained, since each class must provide its own attribute code such
paul@766 160
that upon an instance/class compatibility test, the code may be obtained and
paul@766 161
used in the object table.
paul@766 162
paul@766 163
Invocation and Code References
paul@766 164
------------------------------
paul@766 165
paul@766 166
Modules: there is no meaningful invocation reference since modules cannot be
paul@766 167
explicitly called.
paul@766 168
paul@766 169
Functions: a simple code reference is employed pointing to code implementing
paul@766 170
the function. Note that the function locals are completely distinct from this
paul@766 171
structure and are not comparable to attributes. Instead, attributes are
paul@766 172
reserved for default parameter values, although they do not appear in the
paul@766 173
object table described above, appearing instead in a separate parameter table
paul@766 174
described in concepts.txt.
paul@766 175
paul@766 176
Classes: given that classes must be invoked in order to create instances, a
paul@766 177
reference must be provided in class structures. However, this reference does
paul@766 178
not point directly at the __init__ method of the class. Instead, the
paul@766 179
referenced code belongs to a special initialiser function, __new__, consisting
paul@766 180
of the following instructions:
paul@766 181
paul@766 182
    create instance for C
paul@766 183
    call C.__init__(instance, ...)
paul@766 184
    return instance
paul@766 185
paul@766 186
Instances: each instance employs a reference to any __call__ method defined in
paul@766 187
the class hierarchy for the instance, thus maintaining its callable nature.
paul@766 188
paul@766 189
Both classes and modules may contain code in their definitions - the former in
paul@766 190
the "body" of the class, potentially defining attributes, and the latter as
paul@766 191
the "top-level" code in the module, potentially defining attributes/globals -
paul@766 192
but this code is not associated with any invocation target. It is thus
paul@766 193
generated in order of appearance and is not referenced externally.
paul@766 194
paul@766 195
Invocation Operation
paul@766 196
--------------------
paul@766 197
paul@766 198
Consequently, regardless of the object an invocation is always done as
paul@766 199
follows:
paul@766 200
paul@766 201
    get invocation reference from the header
paul@766 202
    jump to reference
paul@766 203
paul@766 204
Additional preparation is necessary before the above code: positional
paul@766 205
arguments must be saved in the invocation frame, and keyword arguments must be
paul@766 206
resolved and saved to the appropriate position in the invocation frame.
paul@766 207
paul@766 208
See invocation.txt for details.
paul@766 209
paul@766 210
Instantiation
paul@766 211
=============
paul@766 212
paul@766 213
When instantiating classes, memory must be reserved for the header of the
paul@766 214
resulting instance, along with locations for the attributes of the instance.
paul@766 215
Since the instance header contains data common to all instances of a class, a
paul@766 216
template header is copied to the start of the newly reserved memory region.
paul@766 217
paul@766 218
List and Tuple Representations
paul@766 219
==============================
paul@766 220
paul@766 221
Since tuples have a fixed size, the representation of a tuple instance is
paul@766 222
merely a header describing the size of the entire object, together with a
paul@766 223
sequence of references to the object "stored" at each position in the
paul@766 224
structure. Such references consist of the usual context and reference pair.
paul@766 225
paul@766 226
Lists, however, have a variable size and must be accessible via an unchanging
paul@766 227
location even as more memory is allocated elsewhere to accommodate the
paul@766 228
contents of the list. Consequently, the representation must resemble the
paul@766 229
following:
paul@766 230
paul@766 231
    Structure header for list (size == header plus special attribute)
paul@766 232
    Special attribute referencing the underlying sequence
paul@766 233
paul@766 234
The underlying sequence has a fixed size, like a tuple, but may contain fewer
paul@766 235
elements than the size of the sequence permits:
paul@766 236
paul@766 237
    Special header indicating the current size and allocated size
paul@766 238
    Element
paul@766 239
    ...             <-- current size
paul@766 240
    (Unused space)
paul@766 241
    ...             <-- allocated size
paul@766 242
paul@766 243
This representation permits the allocation of a new sequence when space is
paul@766 244
exhausted in an existing sequence, with the new sequence address stored in the
paul@766 245
main list structure. Since access to the contents of the list must go through
paul@766 246
the main list structure, underlying allocation activities may take place
paul@766 247
without the users of a list having to be aware of such activities.
paul@768 248
paul@768 249
Instruction Evaluation Model
paul@768 250
============================
paul@768 251
paul@768 252
Programs use a value stack containing local and temporary storage. A value
paul@768 253
stack pointer indicates the top of this stack. In addition, a separate stack
paul@768 254
is used to record the invocation frames. All stack pointers refer to the next
paul@768 255
address to be used by the stack, not the address of the uppermost element.
paul@768 256
paul@768 257
    Frame Stack     Value Stack
paul@768 258
    -----------     -----------     Address of Callable
paul@768 259
                                    -------------------
paul@768 260
    previous        ...
paul@768 261
    current ------> callable -----> identifier
paul@768 262
                    arg1            reference to code
paul@768 263
                    arg2
paul@768 264
                    arg3
paul@768 265
                    local4
paul@768 266
                    local5
paul@768 267
                    ...
paul@768 268
paul@768 269
Unlike the CPython virtual machine, programs do not use a value stack
paul@768 270
containing the results of expression evaluations. Instead, temporary storage
paul@768 271
is statically allocated and employed by instructions.
paul@768 272
paul@768 273
Loading local names and temporary values is a matter of performing
paul@768 274
frame-relative accesses to the value stack.
paul@768 275
paul@768 276
Invocations and Argument Evaluation
paul@768 277
-----------------------------------
paul@768 278
paul@768 279
When preparing for an invocation, the caller first sets the invocation frame
paul@768 280
pointer. A number of locations for the arguments required by the invocation
paul@768 281
are then reserved. With a frame to prepare, positional arguments are added to
paul@768 282
the frame in order such that the first argument positions are filled. The
paul@768 283
names of keyword arguments are used (in the form of table index values) to
paul@768 284
consult the parameter table and to find the frame location in which such
paul@768 285
arguments are to be stored.
paul@768 286
paul@768 287
    fn(a, b, d=1, e=2, c=3) -> fn(a, b, c, d, e)
paul@768 288
paul@768 289
    Frame
paul@768 290
    -----
paul@768 291
paul@768 292
    a               a               a               a
paul@768 293
    b               b               b               b
paul@768 294
    ___             ___             ___         --> 3
paul@768 295
    ___         --> 1               1           |   1
paul@768 296
    ___         |   ___         --> 2           |   2
paul@768 297
                |               |               |
paul@768 298
    1 -----------   2 -----------   3 -----------
paul@768 299
paul@768 300
Conceptually, the frame can be considered as a collection of attributes, as
paul@768 301
seen in other kinds of structures:
paul@768 302
paul@768 303
Frame for invocation of fn:
paul@768 304
paul@768 305
    0           1           2           3           4           5
paul@768 306
    code        a           b           c           d           e
paul@768 307
    reference
paul@768 308
paul@768 309
Where arguments are specified positionally, such "attributes" are set in
paul@768 310
order. Keyword arguments are set using a name-to-position attribute-like
paul@768 311
mapping, where the position of each argument is discovered using the parameter
paul@768 312
table.
paul@768 313
paul@768 314
Method Invocations
paul@768 315
------------------
paul@768 316
paul@768 317
Method invocations incorporate an implicit first argument which is obtained
paul@768 318
from the context of the method:
paul@768 319
paul@768 320
    method(a, b, d=1, e=2, c=3) -> method(self, a, b, c, d, e)
paul@768 321
paul@768 322
    Frame
paul@768 323
    -----
paul@768 324
paul@768 325
    context of method
paul@768 326
    a
paul@768 327
    b
paul@768 328
    3
paul@768 329
    1
paul@768 330
    2
paul@768 331
paul@768 332
Although it could be possible to permit any object to be provided as the first
paul@768 333
argument, in order to optimise instance attribute access in methods, we should
paul@768 334
seek to restrict the object type.
paul@768 335
paul@768 336
Verifying Supplied Arguments
paul@768 337
----------------------------
paul@768 338
paul@768 339
In order to ensure a correct invocation, it is also necessary to check the
paul@768 340
number of supplied arguments. If the target of the invocation is known at
paul@768 341
compile-time, no additional instructions need to be emitted; otherwise, the
paul@768 342
generated code must test for the following situations:
paul@768 343
paul@768 344
 1. That the number of supplied arguments is equal to the number of expected
paul@768 345
    parameters.
paul@768 346
paul@768 347
 2. That no keyword argument overwrites an existing positional parameter.
paul@768 348
paul@768 349
Default Arguments
paul@768 350
-----------------
paul@768 351
paul@768 352
Some arguments may have default values which are used if no value is provided
paul@768 353
in an invocation. Such defaults are initialised when the function itself is
paul@768 354
initialised, and are used to fill in any invocation frames that are known at
paul@768 355
compile-time.
paul@768 356
paul@768 357
To obtain the default values, a reference to the function object is required.
paul@768 358
This can be provided either in an additional frame location or in a special
paul@768 359
register.
paul@768 360
paul@768 361
Tuples, Frames and Allocation
paul@768 362
-----------------------------
paul@768 363
paul@768 364
Using the approach where arguments are treated like attributes in some kind of
paul@768 365
structure, we could choose to allocate frames in places other than a stack.
paul@768 366
This would produce something somewhat similar to a plain tuple object.
paul@769 367
paul@769 368
Exception Handling
paul@769 369
==================
paul@769 370
paul@769 371
Active Exceptions
paul@769 372
-----------------
paul@769 373
paul@769 374
When an exception is raised, an exception instance is defined as the active
paul@769 375
exception. This active exception remains in force until either a new exception
paul@769 376
is raised in any handling of the exception, or upon exit of a handler where
paul@769 377
the active exception has been caught (thus resetting the active exception).
paul@769 378
paul@769 379
The following operations can be employed to achieve this:
paul@769 380
paul@769 381
StoreException records the current value reference using the exception
paul@769 382
register.
paul@769 383
paul@769 384
LoadException obtains the current exception and puts it in the value register.
paul@769 385
paul@769 386
CheckException checks the current exception against a class referenced by the
paul@769 387
current value.
paul@769 388
paul@769 389
ClearException resets the exception register.
paul@769 390
paul@769 391
Handlers
paul@769 392
--------
paul@769 393
paul@769 394
An exception handler stack is defined such that when a try...except or
paul@769 395
try...finally block is entered, a new handler is defined.
paul@769 396
paul@769 397
When an exception is raised, the program jumps to the most recently defined
paul@769 398
handler. Inside the handler, the stack entry for the handler will be removed.
paul@769 399
paul@769 400
Depending on the nature of the handler and whether the exception is handled,
paul@769 401
the program may jump to the next most recent handler, and so on.
paul@769 402
paul@769 403
If no handler is defined when an exception is raised or re-raised, the program
paul@769 404
should terminate. This might be done by having a "handler #0" which explicitly
paul@769 405
terminates the program.
paul@769 406
paul@769 407
The following operations can be employed to achieve this:
paul@769 408
paul@769 409
PushHandler(block) defines an active handler at the location indicated by the
paul@769 410
given block.
paul@769 411
paul@769 412
PopHandler(n) removes the n topmost active handlers. A single handler is
paul@769 413
typically removed when leaving a try block, but potentially more handlers are
paul@769 414
removed when such a block is exited using a return statement.