micropython

docs/lowlevel.txt

791:ca3b724cf6c9
2014-03-15 Paul Boddie Skip class namespaces when looking for outer namespace definitions of names. syspython-as-target
     1 Low-level Implementation Details
     2 ================================
     3 
     4 Although micropython delegates the generation of low-level program code and
     5 data to syspython, various considerations of how an eventual program might be
     6 structured have been used to inform the way micropython represents the details
     7 of a program. This document describes these considerations and indicates how
     8 syspython or other technologies might represent a working program.
     9 
    10  * Objects and structures 
    11  * Instantiation
    12  * List and tuple representations
    13  * Instruction evaluation model
    14  * Exception handling
    15 
    16 Objects and Structures 
    17 ======================
    18 
    19 As well as references, micropython needs to have actual objects to refer to.
    20 Since classes, functions and instances are all objects, it is desirable that
    21 certain common features and operations are supported in the same way for all
    22 of these things. To permit this, a common data structure format is used.
    23 
    24     Header....................................................  Attributes.................
    25 
    26     Identifier  Identifier  Address     Identifier  Size        Object      ...
    27 
    28     0           1           2           3           4           5           6
    29     classcode   attrcode/   invocation  funccode    size        attribute   ...
    30                 instance    reference                           reference
    31                 status
    32 
    33 Classcode
    34 ---------
    35 
    36 Used in attribute lookup.
    37 
    38 Here, the classcode refers to the attribute lookup table for the object (as
    39 described in concepts.txt). Classes and instances share the same classcode,
    40 and their structures reflect this. Functions all belong to the same type and
    41 thus employ the classcode for the function built-in type, whereas modules have
    42 distinct types since they must support different sets of attributes.
    43 
    44 Attrcode
    45 --------
    46 
    47 Used to test instances for membership of classes (or descendants of classes).
    48 
    49 Since, in traditional Python, classes are only ever instances of some generic
    50 built-in type, support for testing such a relationship directly has been
    51 removed and the attrcode is not specified for classes: the presence of an
    52 attrcode indicates that a given object is an instance. In addition, support
    53 has also been removed for testing modules in the same way, meaning that the
    54 attrcode is also not specified for modules.
    55 
    56 See the "Testing Instance Compatibility with Classes (Attrcode)" section below
    57 for details of attrcodes.
    58 
    59 Invocation Reference
    60 --------------------
    61 
    62 Used when an object is called.
    63 
    64 This is the address of the code to be executed when an invocation is performed
    65 on the object.
    66 
    67 Funccode
    68 --------
    69 
    70 Used to look up argument positions by name.
    71 
    72 The strategy with keyword arguments in micropython is to attempt to position
    73 such arguments in the invocation frame as it is being constructed.
    74 
    75 See the "Parameters and Lookups" section for more information.
    76 
    77 Size
    78 ----
    79 
    80 Used to indicate the size of an object including attributes.
    81 
    82 Attributes
    83 ----------
    84 
    85 For classes, modules and instances, the attributes in the structure correspond
    86 to the attributes of each kind of object. For functions, however, the
    87 attributes in the structure correspond to the default arguments for each
    88 function, if any.
    89 
    90 Structure Types
    91 ---------------
    92 
    93 Class C:
    94 
    95     0           1           2           3           4           5           6
    96     classcode   (unused)    __new__     funccode    size        attribute   ...
    97     for C                   reference   for                     reference
    98                                         instantiator
    99 
   100 Instance of C:
   101 
   102     0           1           2           3           4           5           6
   103     classcode   attrcode    C.__call__  funccode    size        attribute   ...
   104     for C       for C       reference   for                     reference
   105                             (if exists) C.__call__
   106 
   107 Function f:
   108 
   109     0           1           2           3           4           5           6
   110     classcode   attrcode    code        funccode    size        attribute   ...
   111     for         for         reference                           (default)
   112     function    function                                        reference
   113 
   114 Module m:
   115 
   116     0           1           2           3           4           5           6
   117     classcode   attrcode    (unused)    (unused)    (unused)    attribute   ...
   118     for m       for m                                           (global)
   119                                                                 reference
   120 
   121 The __class__ Attribute
   122 -----------------------
   123 
   124 All objects should support the __class__ attribute, and in most cases this is
   125 done using the object table, yielding a common address for all instances of a
   126 given class.
   127 
   128 Function: refers to the function class
   129 Instance: refers to the class instantiated to make the object
   130 
   131 The object table cannot support two definitions simultaneously for both
   132 instances and their classes. Consequently, __class__ access on classes must be
   133 tested for and a special result returned.
   134 
   135 Class: refers to the type class (type.__class__ also refers to the type class)
   136 
   137 For convenience, the first attribute of a class will be the common __class__
   138 attribute for all its instances. As noted above, direct access to this
   139 attribute will not be possible for classes, and a constant result will be
   140 returned instead.
   141 
   142 Lists and Tuples
   143 ----------------
   144 
   145 The built-in list and tuple sequences employ variable length structures using
   146 the attribute locations to store their elements, where each element is a
   147 reference to a separately stored object.
   148 
   149 Testing Instance Compatibility with Classes (Attrcode)
   150 ------------------------------------------------------
   151 
   152 Although it would be possible to have a data structure mapping classes to
   153 compatible classes, such as a matrix indicating the subclasses (or
   154 superclasses) of each class, the need to retain the key to such a data
   155 structure for each class might introduce a noticeable overhead.
   156 
   157 Instead of having a separate structure, descendant classes of each class are
   158 inserted as special attributes into the object table. This requires an extra
   159 key to be retained, since each class must provide its own attribute code such
   160 that upon an instance/class compatibility test, the code may be obtained and
   161 used in the object table.
   162 
   163 Invocation and Code References
   164 ------------------------------
   165 
   166 Modules: there is no meaningful invocation reference since modules cannot be
   167 explicitly called.
   168 
   169 Functions: a simple code reference is employed pointing to code implementing
   170 the function. Note that the function locals are completely distinct from this
   171 structure and are not comparable to attributes. Instead, attributes are
   172 reserved for default parameter values, although they do not appear in the
   173 object table described above, appearing instead in a separate parameter table
   174 described in concepts.txt.
   175 
   176 Classes: given that classes must be invoked in order to create instances, a
   177 reference must be provided in class structures. However, this reference does
   178 not point directly at the __init__ method of the class. Instead, the
   179 referenced code belongs to a special initialiser function, __new__, consisting
   180 of the following instructions:
   181 
   182     create instance for C
   183     call C.__init__(instance, ...)
   184     return instance
   185 
   186 Instances: each instance employs a reference to any __call__ method defined in
   187 the class hierarchy for the instance, thus maintaining its callable nature.
   188 
   189 Both classes and modules may contain code in their definitions - the former in
   190 the "body" of the class, potentially defining attributes, and the latter as
   191 the "top-level" code in the module, potentially defining attributes/globals -
   192 but this code is not associated with any invocation target. It is thus
   193 generated in order of appearance and is not referenced externally.
   194 
   195 Invocation Operation
   196 --------------------
   197 
   198 Consequently, regardless of the object an invocation is always done as
   199 follows:
   200 
   201     get invocation reference from the header
   202     jump to reference
   203 
   204 Additional preparation is necessary before the above code: positional
   205 arguments must be saved in the invocation frame, and keyword arguments must be
   206 resolved and saved to the appropriate position in the invocation frame.
   207 
   208 See invocation.txt for details.
   209 
   210 Instantiation
   211 =============
   212 
   213 When instantiating classes, memory must be reserved for the header of the
   214 resulting instance, along with locations for the attributes of the instance.
   215 Since the instance header contains data common to all instances of a class, a
   216 template header is copied to the start of the newly reserved memory region.
   217 
   218 List and Tuple Representations
   219 ==============================
   220 
   221 Since tuples have a fixed size, the representation of a tuple instance is
   222 merely a header describing the size of the entire object, together with a
   223 sequence of references to the object "stored" at each position in the
   224 structure. Such references consist of the usual context and reference pair.
   225 
   226 Lists, however, have a variable size and must be accessible via an unchanging
   227 location even as more memory is allocated elsewhere to accommodate the
   228 contents of the list. Consequently, the representation must resemble the
   229 following:
   230 
   231     Structure header for list (size == header plus special attribute)
   232     Special attribute referencing the underlying sequence
   233 
   234 The underlying sequence has a fixed size, like a tuple, but may contain fewer
   235 elements than the size of the sequence permits:
   236 
   237     Special header indicating the current size and allocated size
   238     Element
   239     ...             <-- current size
   240     (Unused space)
   241     ...             <-- allocated size
   242 
   243 This representation permits the allocation of a new sequence when space is
   244 exhausted in an existing sequence, with the new sequence address stored in the
   245 main list structure. Since access to the contents of the list must go through
   246 the main list structure, underlying allocation activities may take place
   247 without the users of a list having to be aware of such activities.
   248 
   249 Instruction Evaluation Model
   250 ============================
   251 
   252 Programs use a value stack containing local and temporary storage. A value
   253 stack pointer indicates the top of this stack. In addition, a separate stack
   254 is used to record the invocation frames. All stack pointers refer to the next
   255 address to be used by the stack, not the address of the uppermost element.
   256 
   257     Frame Stack     Value Stack
   258     -----------     -----------     Address of Callable
   259                                     -------------------
   260     previous        ...
   261     current ------> callable -----> identifier
   262                     arg1            reference to code
   263                     arg2
   264                     arg3
   265                     local4
   266                     local5
   267                     ...
   268 
   269 Unlike the CPython virtual machine, programs do not use a value stack
   270 containing the results of expression evaluations. Instead, temporary storage
   271 is statically allocated and employed by instructions.
   272 
   273 Loading local names and temporary values is a matter of performing
   274 frame-relative accesses to the value stack.
   275 
   276 Invocations and Argument Evaluation
   277 -----------------------------------
   278 
   279 When preparing for an invocation, the caller first sets the invocation frame
   280 pointer. A number of locations for the arguments required by the invocation
   281 are then reserved. With a frame to prepare, positional arguments are added to
   282 the frame in order such that the first argument positions are filled. The
   283 names of keyword arguments are used (in the form of table index values) to
   284 consult the parameter table and to find the frame location in which such
   285 arguments are to be stored.
   286 
   287     fn(a, b, d=1, e=2, c=3) -> fn(a, b, c, d, e)
   288 
   289     Frame
   290     -----
   291 
   292     a               a               a               a
   293     b               b               b               b
   294     ___             ___             ___         --> 3
   295     ___         --> 1               1           |   1
   296     ___         |   ___         --> 2           |   2
   297                 |               |               |
   298     1 -----------   2 -----------   3 -----------
   299 
   300 Conceptually, the frame can be considered as a collection of attributes, as
   301 seen in other kinds of structures:
   302 
   303 Frame for invocation of fn:
   304 
   305     0           1           2           3           4           5
   306     code        a           b           c           d           e
   307     reference
   308 
   309 Where arguments are specified positionally, such "attributes" are set in
   310 order. Keyword arguments are set using a name-to-position attribute-like
   311 mapping, where the position of each argument is discovered using the parameter
   312 table.
   313 
   314 Method Invocations
   315 ------------------
   316 
   317 Method invocations incorporate an implicit first argument which is obtained
   318 from the context of the method:
   319 
   320     method(a, b, d=1, e=2, c=3) -> method(self, a, b, c, d, e)
   321 
   322     Frame
   323     -----
   324 
   325     context of method
   326     a
   327     b
   328     3
   329     1
   330     2
   331 
   332 Although it could be possible to permit any object to be provided as the first
   333 argument, in order to optimise instance attribute access in methods, we should
   334 seek to restrict the object type.
   335 
   336 Verifying Supplied Arguments
   337 ----------------------------
   338 
   339 In order to ensure a correct invocation, it is also necessary to check the
   340 number of supplied arguments. If the target of the invocation is known at
   341 compile-time, no additional instructions need to be emitted; otherwise, the
   342 generated code must test for the following situations:
   343 
   344  1. That the number of supplied arguments is equal to the number of expected
   345     parameters.
   346 
   347  2. That no keyword argument overwrites an existing positional parameter.
   348 
   349 Default Arguments
   350 -----------------
   351 
   352 Some arguments may have default values which are used if no value is provided
   353 in an invocation. Such defaults are initialised when the function itself is
   354 initialised, and are used to fill in any invocation frames that are known at
   355 compile-time.
   356 
   357 To obtain the default values, a reference to the function object is required.
   358 This can be provided either in an additional frame location or in a special
   359 register.
   360 
   361 Tuples, Frames and Allocation
   362 -----------------------------
   363 
   364 Using the approach where arguments are treated like attributes in some kind of
   365 structure, we could choose to allocate frames in places other than a stack.
   366 This would produce something somewhat similar to a plain tuple object.
   367 
   368 Exception Handling
   369 ==================
   370 
   371 Active Exceptions
   372 -----------------
   373 
   374 When an exception is raised, an exception instance is defined as the active
   375 exception. This active exception remains in force until either a new exception
   376 is raised in any handling of the exception, or upon exit of a handler where
   377 the active exception has been caught (thus resetting the active exception).
   378 
   379 The following operations can be employed to achieve this:
   380 
   381 StoreException records the current value reference using the exception
   382 register.
   383 
   384 LoadException obtains the current exception and puts it in the value register.
   385 
   386 CheckException checks the current exception against a class referenced by the
   387 current value.
   388 
   389 ClearException resets the exception register.
   390 
   391 Handlers
   392 --------
   393 
   394 An exception handler stack is defined such that when a try...except or
   395 try...finally block is entered, a new handler is defined.
   396 
   397 When an exception is raised, the program jumps to the most recently defined
   398 handler. Inside the handler, the stack entry for the handler will be removed.
   399 
   400 Depending on the nature of the handler and whether the exception is handled,
   401 the program may jump to the next most recent handler, and so on.
   402 
   403 If no handler is defined when an exception is raised or re-raised, the program
   404 should terminate. This might be done by having a "handler #0" which explicitly
   405 terminates the program.
   406 
   407 The following operations can be employed to achieve this:
   408 
   409 PushHandler(block) defines an active handler at the location indicated by the
   410 given block.
   411 
   412 PopHandler(n) removes the n topmost active handlers. A single handler is
   413 typically removed when leaving a try block, but potentially more handlers are
   414 removed when such a block is exited using a return statement.