1 Concepts
2 ========
3
4 This document describes the underlying concepts employed in micropython.
5
6 * Namespaces and attribute definition
7 * Attribute usage observations
8 * Imports and circular import detection
9 * Contexts and values
10 * Tables, attributes and lookups
11 * Objects and structures
12 * Parameters and lookups
13 * Instantiation
14 * Register usage
15 * List and tuple representations
16
17 Namespaces and Attribute Definition
18 ===================================
19
20 Namespaces are any objects which can retain attributes.
21
22 * Module attributes are defined either at the module level or by global
23 statements.
24 * Class attributes are defined only within class statements.
25 * Instance attributes are defined only by assignments to attributes of self
26 or tentatively as references to attributes of self.
27 * Locals are effectively attributes of the local namespace and are not
28 accessible externally (and thus cannot support closures).
29
30 These restrictions apply because such attributes are thus explicitly declared,
31 permitting the use of tables (described below). Module and class attributes
32 can also be finalised in this way in order to permit certain optimisations.
33
34 Rebinding of attributes outside classes and modules can be allowed if
35 attribute usage observations are being used to detect such external
36 modifications to such objects. Without such observations, such rebinding
37 should be forbidden since apparently constant attributes might be modified in
38 a running program, but code may have been generated that provides specific
39 objects for those attributes under the assumption that they will not be
40 changed.
41
42 Observations during initial program inspection populate namespaces in a
43 simplistic fashion. If a loop is active, definitions record that the value of
44 a name can be set potentially many times. In local namespaces, definitions are
45 also recorded using the mechanisms employed to track attribute usage, and such
46 observations may provide a more sophisticated view of the potential values of
47 local names.
48
49 See rejected.txt for complicating mechanisms which could be applied to
50 mitigate the effects of these restrictions on optimisations.
51
52 Attribute Usage Observations
53 ============================
54
55 Within a scope, a name may be used in conjunction with attribute names in
56 order to access attributes on objects referenced by the name. However, such
57 observations can only be regarded as reliable if the object referenced is not
58 changed independently by some other mechanism or activity.
59
60 With conventional functions and methods, any locally defined names can be
61 considered private to that scope and thus immune to independent modification,
62 at least within reasonable features of the language. Although stack
63 modification may be possible, it seems appropriate to reject such features,
64 especially since they lend themselves to unmaintainable programs.
65
66 For names defined during the initialisation of a class, since the class itself
67 cannot be referenced by name until its declaration has been completely
68 evaluated, no independent modification can occur from outside the class scope.
69
70 For names defined during the initialisation of a module, global declarations
71 in functions permit the rebinding of global variables and since functions may
72 be invoked during module initialisation, independent modification can
73 potentially occur if any functions are called.
74
75 Module globals can be accessed from other modules that can refer to a module
76 by its name. Initially, an import is required to make a module available for
77 modification, but there is no restriction on whether a module has been
78 completely imported (and thus defined) before an import statement can make it
79 available to other modules. Consider the following package root definition:
80
81 # Module changed:
82 def f():
83 import changed.modifier
84 x = 123
85 f()
86
87 # Module changed.modifier:
88 import changed
89 changed.x = 456
90
91 Here, an import of changed will initially set x to 123, but then the function
92 f will be invoked and cause the changed.modifier module to be imported. Since
93 the changed module is already being imported, the import statement will not
94 try to perform the import operation again, but it will make the partially
95 defined module available for access. Thus, the changed.modifier module will
96 then set x to 456, and independent modification of the changed namespace will
97 have been performed.
98
99 In conclusion, module globals cannot therefore be regarded as immune to
100 operations that would disrupt usage observations. Consequently, only locals
101 and class definition "locals" can be reliably employed in attribute usage
102 observations.
103
104 Imports and Circular Import Detection
105 =====================================
106
107 The matter of whether any given module is potentially modified by another
108 module before it has been completely imported can be addressed by separating
109 the import process into distinct phases:
110
111 1. Discovery/loading
112 2. Parsing/structure processing
113 3. Completion/code processing
114
115 Upon discovering a module, a test is made to determine whether it is already
116 being imported; if not, it is then loaded and its structure inspected to
117 determine whether it may import other modules, which will then in turn be
118 discovered and loaded. Once no more modules can be loaded, they will then be
119 completed by undergoing the more thorough systematic processing of each
120 module's contents, defining program units and requesting the completion of
121 other modules when import statements are encountered.
122
123 The motivation for such a multi-phase approach is to detect circular imports
124 in the structure processing phase before modules are populated and deductions
125 made about their objects' behaviour. Thus, globals belonging to a module known
126 to be imported in a circular fashion will already be regarded as potentially
127 modifiable by other modules and attribute usage observations will not be
128 recorded.
129
130 Contexts and Values
131 ===================
132
133 Values are used as the common reference representation in micropython: as
134 stored representations of attributes (of classes, instances, modules, and
135 other objects supporting attribute-like entities) as well as the stored values
136 associated with names in functions and methods.
137
138 Unlike other implementations, micropython does not create things like bound
139 method objects for individual instances. Instead, all objects are referenced
140 using a context, reference pair:
141
142 Value Layout
143 ------------
144
145 0 1
146 context object
147 reference reference
148
149 Specific implementations might reverse this ordering for optimisation
150 purposes.
151
152 Rationale
153 ---------
154
155 To reduce the number of created objects whilst retaining the ability to
156 support bound method invocations. The context indicates the context in which
157 an invocation is performed, typically the owner of the method.
158
159 Usage
160 -----
161
162 The context may be inserted as the first argument when a value is involved in
163 an invocation. This argument may then be omitted from the invocation if its
164 usage is not appropriate.
165
166 See invocation.txt for details.
167
168 Context Value Types
169 -------------------
170
171 The following types of context value exist:
172
173 Type Usage Transformations
174 ---- ----- ---------------
175
176 Replaceable With functions (not methods) May be replaced with an
177 instance or a class when a
178 value is stored on an
179 instance or class
180
181 Placeholder With classes May not be replaced
182
183 Instance With instances (and constants) May not be replaced
184 or functions as methods
185
186 Class With functions as methods May be replaced when a
187 value is loaded from a
188 class attribute via an
189 instance
190
191 Contexts in Acquired Values
192 ---------------------------
193
194 There are four classes of instructions which provide values:
195
196 Instruction Purpose Context Operations
197 ----------- ------- ------------------
198
199 1) LoadConst Load module, constant Use loaded object with itself
200 as context
201
202 2) LoadFunction Load function Combine replaceable context
203 with loaded object
204
205 3) LoadClass Load class Combine placeholder context
206 with loaded object
207
208 4) LoadAddress* Load attribute from Preserve or override stored
209 LoadAttr* class, module, context (as described in
210 instance assignment.txt)
211
212 In order to comply with traditional Python behaviour, contexts may or may not
213 represent the object from which an attribute has been acquired.
214
215 See assignment.txt for details.
216
217 Contexts in Stored Values
218 -------------------------
219
220 There are two classes of instruction for storing values:
221
222 Instruction Purpose Context Operations
223 ----------- ------- ------------------
224
225 1) StoreAddress Store attribute in a Preserve context; note that no
226 known object test for class attribute
227 assignment should be necessary
228 since this instruction should only
229 be generated for module globals
230
231 StoreAttr Store attribute in an Preserve context; note that no
232 instance test for class attribute
233 assignment should be necessary
234 since this instruction should only
235 be generated for self accesses
236
237 StoreAttrIndex Store attribute in an Preserve context; since the index
238 unknown object lookup could yield a class
239 attribute, a test of the nature of
240 the nature of the structure is
241 necessary in order to prevent
242 assignments to classes
243
244 2) StoreAddressContext Store attribute in a Override context if appropriate;
245 known object if the value has a replaceable
246 context, permit the target to
247 take ownership of the value
248
249 See assignment.txt for details.
250
251 Tables, Attributes and Lookups
252 ==============================
253
254 Attribute lookups, where the exact location of an object attribute is deduced,
255 are performed differently in micropython than in other implementations.
256 Instead of providing attribute dictionaries, in which attributes are found,
257 attributes are located at fixed places in object structures (described below)
258 and their locations are stored using a special representation known as a
259 table.
260
261 For a given program, a table can be considered as being like a matrix mapping
262 classes to attribute names. For example:
263
264 class A:
265 # instances have attributes x, y
266
267 class B(A):
268 # introduces attribute z for instances
269
270 class C:
271 # instances have attributes a, b, z
272
273 This would provide the following table, referred to as an object table in the
274 context of classes and instances:
275
276 Class/attr a b x y z
277
278 A 1 2
279 B 1 2 3
280 C 1 2 3
281
282 A limitation of this representation is that instance attributes may not shadow
283 class attributes: if an attribute with a given name is not defined on an
284 instance, an attribute with the same name cannot be provided by the class of
285 the instance or any superclass of the instance's class. This impacts the
286 provision of the __class__ attribute, as described below.
287
288 The table can be compacted using a representation known as a displacement
289 list (referred to as an object list in this context):
290
291 Classes with attribute offsets
292
293 classcode A
294 attrcode a b x y z
295
296 B
297 a b x y z
298
299 C
300 a b x y z
301
302 List . . 1 2 1 2 3 1 2 . . 3
303
304 Here, the classcode refers to the offset in the list at which a class's
305 attributes are defined, whereas the attrcode defines the offset within a
306 region of attributes corresponding to a single attribute of a given name.
307
308 Attribute Locations
309 -------------------
310
311 The locations stored in table/list elements are generally for instance
312 attributes relative to the location of the instance, whereas those for class
313 attributes and module attributes are generally absolute addresses. Thus, each
314 occupied table cell has the following structure:
315
316 attrcode, uses-absolute-address, address (or location)
317
318 This could be given instead as follows:
319
320 attrcode, is-class-or-module-attribute, location
321
322 Since uses-absolute-address corresponds to is-class-or-module-attribute, and
323 since there is a need to test for classes and modules to prevent assignment to
324 attributes of such objects, this particular information is always required.
325
326 The __class__ Attribute
327 -----------------------
328
329 In Python 2.x, at least with new-style classes, instances have __class__
330 attributes which indicate the class from which they have been instantiated,
331 whereas classes have __class__ attributes which reference the type class.
332 With the object table, it is not possible to provide absolute addresses which
333 can be used for both classes and instances, since this would result in classes
334 and instances having the same class, and thus the class of a class would be
335 the class itself.
336
337 One solution is to use object-relative values in the table so that referencing
338 the __class__ attribute of an instance produces a value which can be combined
339 with an instance's address to yield the address of the attribute, which itself
340 refers to the instance's class, whereas referencing the __class__ attribute of
341 a class produces a similar object-relative value that is combined with the
342 class's address to yield the address of the attribute, which itself refers to
343 the special type class.
344
345 Obviously, the above solution requires both classes and instances to retain an
346 attribute location specifically to hold the value appropriate for each object
347 type, whereas a scheme which omits the __class__ attribute on classes would be
348 able to employ an absolute address in the table and maintain only a single
349 address to refer to the class for all instances. The only problem with not
350 providing a sensible __class__ attribute entry for classes would be the need
351 for special treatment of __class__ to prevent inappropriate consultation of
352 the table for classes.
353
354 Comparing Tables as Matrices with Displacement Lists
355 ----------------------------------------------------
356
357 Although displacement lists can provide reasonable levels of compaction for
358 attribute data, the element size is larger than that required for a simple
359 matrix: the attribute code (attrcode) need not be stored since each element
360 unambiguously refers to the availability of an attribute for a particular
361 class or instance of that class, and so the data at a given element need not
362 be tested for relevance to a given attribute access operation.
363
364 Given a program with 20 object types and 100 attribute types, a matrix would
365 occupy the following amount of space:
366
367 number of object types * number of attribute types * element size
368 = 20 * 100 * 1 (assuming that a single location is sufficient for an element)
369 = 2000
370
371 In contrast, given a compaction to 40% of the matrix size (without considering
372 element size) in a displacement list, the amount of space would be as follows:
373
374 number of elements * element size
375 = 40% * (20 * 100) * 2 (assuming that one additional location is required)
376 = 1600
377
378 Consequently, the principal overhead of using a displacement list is likely to
379 be in the need to check element relevance when retrieving values from such a
380 list.
381
382 Objects and Structures
383 ======================
384
385 As well as references, micropython needs to have actual objects to refer to.
386 Since classes, functions and instances are all objects, it is desirable that
387 certain common features and operations are supported in the same way for all
388 of these things. To permit this, a common data structure format is used.
389
390 Header.................................................... Attributes.................
391
392 Identifier Identifier Address Identifier Size Object ...
393
394 0 1 2 3 4 5 6
395 classcode attrcode/ invocation funccode size attribute ...
396 instance reference reference
397 status
398
399 Classcode
400 ---------
401
402 Used in attribute lookup.
403
404 Here, the classcode refers to the attribute lookup table for the object (as
405 described above). Classes and instances share the same classcode, and their
406 structures reflect this. Functions all belong to the same type and thus employ
407 the classcode for the function built-in type, whereas modules have distinct
408 types since they must support different sets of attributes.
409
410 Attrcode
411 --------
412
413 Used to test instances for membership of classes (or descendants of classes).
414
415 Since, in traditional Python, classes are only ever instances of some generic
416 built-in type, support for testing such a relationship directly has been
417 removed and the attrcode is not specified for classes: the presence of an
418 attrcode indicates that a given object is an instance. In addition, support
419 has also been removed for testing modules in the same way, meaning that the
420 attrcode is also not specified for modules.
421
422 See the "Testing Instance Compatibility with Classes (Attrcode)" section below
423 for details of attrcodes.
424
425 Invocation Reference
426 --------------------
427
428 Used when an object is called.
429
430 This is the address of the code to be executed when an invocation is performed
431 on the object.
432
433 Funccode
434 --------
435
436 Used to look up argument positions by name.
437
438 The strategy with keyword arguments in micropython is to attempt to position
439 such arguments in the invocation frame as it is being constructed.
440
441 See the "Parameters and Lookups" section for more information.
442
443 Size
444 ----
445
446 Used to indicate the size of an object including attributes.
447
448 Attributes
449 ----------
450
451 For classes, modules and instances, the attributes in the structure correspond
452 to the attributes of each kind of object. For functions, however, the
453 attributes in the structure correspond to the default arguments for each
454 function, if any.
455
456 Structure Types
457 ---------------
458
459 Class C:
460
461 0 1 2 3 4 5 6
462 classcode (unused) __new__ funccode size attribute ...
463 for C reference for reference
464 instantiator
465
466 Instance of C:
467
468 0 1 2 3 4 5 6
469 classcode attrcode C.__call__ funccode size attribute ...
470 for C for C reference for reference
471 (if exists) C.__call__
472
473 Function f:
474
475 0 1 2 3 4 5 6
476 classcode attrcode code funccode size attribute ...
477 for for reference (default)
478 function function reference
479
480 Module m:
481
482 0 1 2 3 4 5 6
483 classcode attrcode (unused) (unused) (unused) attribute ...
484 for m for m (global)
485 reference
486
487 The __class__ Attribute
488 -----------------------
489
490 All objects should support the __class__ attribute, and in most cases this is
491 done using the object table, yielding a common address for all instances of a
492 given class.
493
494 Function: refers to the function class
495 Instance: refers to the class instantiated to make the object
496
497 The object table cannot support two definitions simultaneously for both
498 instances and their classes. Consequently, __class__ access on classes must be
499 tested for and a special result returned.
500
501 Class: refers to the type class (type.__class__ also refers to the type class)
502
503 For convenience, the first attribute of a class will be the common __class__
504 attribute for all its instances. As noted above, direct access to this
505 attribute will not be possible for classes, and a constant result will be
506 returned instead.
507
508 Lists and Tuples
509 ----------------
510
511 The built-in list and tuple sequences employ variable length structures using
512 the attribute locations to store their elements, where each element is a
513 reference to a separately stored object.
514
515 Testing Instance Compatibility with Classes (Attrcode)
516 ------------------------------------------------------
517
518 Although it would be possible to have a data structure mapping classes to
519 compatible classes, such as a matrix indicating the subclasses (or
520 superclasses) of each class, the need to retain the key to such a data
521 structure for each class might introduce a noticeable overhead.
522
523 Instead of having a separate structure, descendant classes of each class are
524 inserted as special attributes into the object table. This requires an extra
525 key to be retained, since each class must provide its own attribute code such
526 that upon an instance/class compatibility test, the code may be obtained and
527 used in the object table.
528
529 Invocation and Code References
530 ------------------------------
531
532 Modules: there is no meaningful invocation reference since modules cannot be
533 explicitly called.
534
535 Functions: a simple code reference is employed pointing to code implementing
536 the function. Note that the function locals are completely distinct from this
537 structure and are not comparable to attributes. Instead, attributes are
538 reserved for default parameter values, although they do not appear in the
539 object table described above, appearing instead in a separate parameter table
540 described below.
541
542 Classes: given that classes must be invoked in order to create instances, a
543 reference must be provided in class structures. However, this reference does
544 not point directly at the __init__ method of the class. Instead, the
545 referenced code belongs to a special initialiser function, __new__, consisting
546 of the following instructions:
547
548 create instance for C
549 call C.__init__(instance, ...)
550 return instance
551
552 Instances: each instance employs a reference to any __call__ method defined in
553 the class hierarchy for the instance, thus maintaining its callable nature.
554
555 Both classes and modules may contain code in their definitions - the former in
556 the "body" of the class, potentially defining attributes, and the latter as
557 the "top-level" code in the module, potentially defining attributes/globals -
558 but this code is not associated with any invocation target. It is thus
559 generated in order of appearance and is not referenced externally.
560
561 Invocation Operation
562 --------------------
563
564 Consequently, regardless of the object an invocation is always done as
565 follows:
566
567 get invocation reference from the header
568 jump to reference
569
570 Additional preparation is necessary before the above code: positional
571 arguments must be saved in the invocation frame, and keyword arguments must be
572 resolved and saved to the appropriate position in the invocation frame.
573
574 See invocation.txt for details.
575
576 Parameters and Lookups
577 ======================
578
579 Since Python supports keyword arguments when making invocations, it becomes
580 necessary to record the parameter names associated with each function or
581 method. Just as object tables record attributes positions on classes and
582 instances, parameter tables record parameter positions in function or method
583 parameter lists.
584
585 For a given program, a parameter table can be considered as being like a
586 matrix mapping functions/methods to parameter names. For example:
587
588 def f(x, y, z):
589 pass
590
591 def g(a, b, c):
592 pass
593
594 def h(a, x):
595 pass
596
597 This would provide the following table, referred to as a parameter table in
598 the context of functions and methods:
599
600 Function/param a b c x y z
601
602 f 1 2 3
603 g 1 2 3
604 h 1 2
605
606 Confusion can occur when functions are adopted as methods, since the context
607 then occupies the first slot in the invocation frame:
608
609 def f(x, y, z):
610 pass
611
612 f(x=1, y=2, z=3) -> f(<context>, 1, 2, 3)
613 -> f(1, 2, 3)
614
615 class C:
616 f = f
617
618 def g(x, y, z):
619 pass
620
621 c = C()
622
623 c.f(y=2, z=3) -> f(<context>, 2, 3)
624 c.g(y=2, z=3) -> C.g(<context>, 2, 3)
625
626 Just as with parameter tables, a displacement list can be prepared from a
627 parameter table:
628
629 Functions with parameter (attribute) offsets
630
631 funccode f
632 attrcode a b c x y z
633
634 g
635 a b c x y z
636
637 h
638 a b c x y z
639
640 List . . . 1 2 3 1 2 3 1 . . 2 . .
641
642 Here, the funccode refers to the offset in the list at which a function's
643 parameters are defined, whereas the attrcode defines the offset within a
644 region of attributes corresponding to a single parameter of a given name.
645
646 Instantiation
647 =============
648
649 When instantiating classes, memory must be reserved for the header of the
650 resulting instance, along with locations for the attributes of the instance.
651 Since the instance header contains data common to all instances of a class, a
652 template header is copied to the start of the newly reserved memory region.
653
654 Register Usage
655 ==============
656
657 During code generation, much of the evaluation produces results which are
658 implicitly recorded in the "active value" or "working" register, and various
659 instructions will consume this active value. In addition, some instructions
660 will consume a separate "active source value" from a register, typically those
661 which are assigning the result of an expression to an assignment target.
662
663 Since values often need to be retained for later use, a set of temporary
664 storage locations are typically employed. However, optimisations may reduce
665 the need to use such temporary storage where instructions which provide the
666 "active value" can be re-executed and will produce the same result. Whether
667 re-use of instructions is possible or not, values still need to be loaded into
668 a "source" register to be accessed by an assignment instruction.
669
670 RSVP instructions generally have the notion of working, target and source
671 registers (see registers.txt). These register "roles" are independent from the
672 actual registers defined for the RSVP machine itself, even though the naming
673 is similar. Generally, instructions do regard the RSVP "working" registers as
674 the class of register in the "working" role, although some occurrences of
675 instruction usage may employ other registers (such as the "result" registers)
676 and thus take advantage of the generality of the RSVP implementation of such
677 instructions.
678
679 List and Tuple Representations
680 ==============================
681
682 Since tuples have a fixed size, the representation of a tuple instance is
683 merely a header describing the size of the entire object, together with a
684 sequence of references to the object "stored" at each position in the
685 structure. Such references consist of the usual context and reference pair.
686
687 Lists, however, have a variable size and must be accessible via an unchanging
688 location even as more memory is allocated elsewhere to accommodate the
689 contents of the list. Consequently, the representation must resemble the
690 following:
691
692 Structure header for list (size == header plus special attribute)
693 Special attribute referencing the underlying sequence
694
695 The underlying sequence has a fixed size, like a tuple, but may contain fewer
696 elements than the size of the sequence permits:
697
698 Special header indicating the current size and allocated size
699 Element
700 ... <-- current size
701 (Unused space)
702 ... <-- allocated size
703
704 This representation permits the allocation of a new sequence when space is
705 exhausted in an existing sequence, with the new sequence address stored in the
706 main list structure. Since access to the contents of the list must go through
707 the main list structure, underlying allocation activities may take place
708 without the users of a list having to be aware of such activities.