1 Low-level Implementation Details
2 ================================
3
4 Although micropython delegates the generation of low-level program code and
5 data to syspython, various considerations of how an eventual program might be
6 structured have been used to inform the way micropython represents the details
7 of a program. This document describes these considerations and indicates how
8 syspython or other technologies might represent a working program.
9
10 * Objects and structures
11 * Instantiation
12 * List and tuple representations
13 * Instruction evaluation model
14 * Exception handling
15
16 Objects and Structures
17 ======================
18
19 As well as references, micropython needs to have actual objects to refer to.
20 Since classes, functions and instances are all objects, it is desirable that
21 certain common features and operations are supported in the same way for all
22 of these things. To permit this, a common data structure format is used.
23
24 Header.................................................... Attributes.................
25
26 Identifier Identifier Address Identifier Size Object ...
27
28 0 1 2 3 4 5 6
29 classcode attrcode/ invocation funccode size attribute ...
30 instance reference reference
31 status
32
33 Classcode
34 ---------
35
36 Used in attribute lookup.
37
38 Here, the classcode refers to the attribute lookup table for the object (as
39 described in concepts.txt). Classes and instances share the same classcode,
40 and their structures reflect this. Functions all belong to the same type and
41 thus employ the classcode for the function built-in type, whereas modules have
42 distinct types since they must support different sets of attributes.
43
44 Attrcode
45 --------
46
47 Used to test instances for membership of classes (or descendants of classes).
48
49 Since, in traditional Python, classes are only ever instances of some generic
50 built-in type, support for testing such a relationship directly has been
51 removed and the attrcode is not specified for classes: the presence of an
52 attrcode indicates that a given object is an instance. In addition, support
53 has also been removed for testing modules in the same way, meaning that the
54 attrcode is also not specified for modules.
55
56 See the "Testing Instance Compatibility with Classes (Attrcode)" section below
57 for details of attrcodes.
58
59 Invocation Reference
60 --------------------
61
62 Used when an object is called.
63
64 This is the address of the code to be executed when an invocation is performed
65 on the object.
66
67 Funccode
68 --------
69
70 Used to look up argument positions by name.
71
72 The strategy with keyword arguments in micropython is to attempt to position
73 such arguments in the invocation frame as it is being constructed.
74
75 See the "Parameters and Lookups" section for more information.
76
77 Size
78 ----
79
80 Used to indicate the size of an object including attributes.
81
82 Attributes
83 ----------
84
85 For classes, modules and instances, the attributes in the structure correspond
86 to the attributes of each kind of object. For functions, however, the
87 attributes in the structure correspond to the default arguments for each
88 function, if any.
89
90 Structure Types
91 ---------------
92
93 Class C:
94
95 0 1 2 3 4 5 6
96 classcode (unused) __new__ funccode size attribute ...
97 for C reference for reference
98 instantiator
99
100 Instance of C:
101
102 0 1 2 3 4 5 6
103 classcode attrcode C.__call__ funccode size attribute ...
104 for C for C reference for reference
105 (if exists) C.__call__
106
107 Function f:
108
109 0 1 2 3 4 5 6
110 classcode attrcode code funccode size attribute ...
111 for for reference (default)
112 function function reference
113
114 Module m:
115
116 0 1 2 3 4 5 6
117 classcode attrcode (unused) (unused) (unused) attribute ...
118 for m for m (global)
119 reference
120
121 The __class__ Attribute
122 -----------------------
123
124 All objects should support the __class__ attribute, and in most cases this is
125 done using the object table, yielding a common address for all instances of a
126 given class.
127
128 Function: refers to the function class
129 Instance: refers to the class instantiated to make the object
130
131 The object table cannot support two definitions simultaneously for both
132 instances and their classes. Consequently, __class__ access on classes must be
133 tested for and a special result returned.
134
135 Class: refers to the type class (type.__class__ also refers to the type class)
136
137 For convenience, the first attribute of a class will be the common __class__
138 attribute for all its instances. As noted above, direct access to this
139 attribute will not be possible for classes, and a constant result will be
140 returned instead.
141
142 Lists and Tuples
143 ----------------
144
145 The built-in list and tuple sequences employ variable length structures using
146 the attribute locations to store their elements, where each element is a
147 reference to a separately stored object.
148
149 Testing Instance Compatibility with Classes (Attrcode)
150 ------------------------------------------------------
151
152 Although it would be possible to have a data structure mapping classes to
153 compatible classes, such as a matrix indicating the subclasses (or
154 superclasses) of each class, the need to retain the key to such a data
155 structure for each class might introduce a noticeable overhead.
156
157 Instead of having a separate structure, descendant classes of each class are
158 inserted as special attributes into the object table. This requires an extra
159 key to be retained, since each class must provide its own attribute code such
160 that upon an instance/class compatibility test, the code may be obtained and
161 used in the object table.
162
163 Invocation and Code References
164 ------------------------------
165
166 Modules: there is no meaningful invocation reference since modules cannot be
167 explicitly called.
168
169 Functions: a simple code reference is employed pointing to code implementing
170 the function. Note that the function locals are completely distinct from this
171 structure and are not comparable to attributes. Instead, attributes are
172 reserved for default parameter values, although they do not appear in the
173 object table described above, appearing instead in a separate parameter table
174 described in concepts.txt.
175
176 Classes: given that classes must be invoked in order to create instances, a
177 reference must be provided in class structures. However, this reference does
178 not point directly at the __init__ method of the class. Instead, the
179 referenced code belongs to a special initialiser function, __new__, consisting
180 of the following instructions:
181
182 create instance for C
183 call C.__init__(instance, ...)
184 return instance
185
186 Instances: each instance employs a reference to any __call__ method defined in
187 the class hierarchy for the instance, thus maintaining its callable nature.
188
189 Both classes and modules may contain code in their definitions - the former in
190 the "body" of the class, potentially defining attributes, and the latter as
191 the "top-level" code in the module, potentially defining attributes/globals -
192 but this code is not associated with any invocation target. It is thus
193 generated in order of appearance and is not referenced externally.
194
195 Invocation Operation
196 --------------------
197
198 Consequently, regardless of the object an invocation is always done as
199 follows:
200
201 get invocation reference from the header
202 jump to reference
203
204 Additional preparation is necessary before the above code: positional
205 arguments must be saved in the invocation frame, and keyword arguments must be
206 resolved and saved to the appropriate position in the invocation frame.
207
208 See invocation.txt for details.
209
210 Instantiation
211 =============
212
213 When instantiating classes, memory must be reserved for the header of the
214 resulting instance, along with locations for the attributes of the instance.
215 Since the instance header contains data common to all instances of a class, a
216 template header is copied to the start of the newly reserved memory region.
217
218 List and Tuple Representations
219 ==============================
220
221 Since tuples have a fixed size, the representation of a tuple instance is
222 merely a header describing the size of the entire object, together with a
223 sequence of references to the object "stored" at each position in the
224 structure. Such references consist of the usual context and reference pair.
225
226 Lists, however, have a variable size and must be accessible via an unchanging
227 location even as more memory is allocated elsewhere to accommodate the
228 contents of the list. Consequently, the representation must resemble the
229 following:
230
231 Structure header for list (size == header plus special attribute)
232 Special attribute referencing the underlying sequence
233
234 The underlying sequence has a fixed size, like a tuple, but may contain fewer
235 elements than the size of the sequence permits:
236
237 Special header indicating the current size and allocated size
238 Element
239 ... <-- current size
240 (Unused space)
241 ... <-- allocated size
242
243 This representation permits the allocation of a new sequence when space is
244 exhausted in an existing sequence, with the new sequence address stored in the
245 main list structure. Since access to the contents of the list must go through
246 the main list structure, underlying allocation activities may take place
247 without the users of a list having to be aware of such activities.
248
249 Instruction Evaluation Model
250 ============================
251
252 Programs use a value stack containing local and temporary storage. A value
253 stack pointer indicates the top of this stack. In addition, a separate stack
254 is used to record the invocation frames. All stack pointers refer to the next
255 address to be used by the stack, not the address of the uppermost element.
256
257 Frame Stack Value Stack
258 ----------- ----------- Address of Callable
259 -------------------
260 previous ...
261 current ------> callable -----> identifier
262 arg1 reference to code
263 arg2
264 arg3
265 local4
266 local5
267 ...
268
269 Unlike the CPython virtual machine, programs do not use a value stack
270 containing the results of expression evaluations. Instead, temporary storage
271 is statically allocated and employed by instructions.
272
273 Loading local names and temporary values is a matter of performing
274 frame-relative accesses to the value stack.
275
276 Invocations and Argument Evaluation
277 -----------------------------------
278
279 When preparing for an invocation, the caller first sets the invocation frame
280 pointer. A number of locations for the arguments required by the invocation
281 are then reserved. With a frame to prepare, positional arguments are added to
282 the frame in order such that the first argument positions are filled. The
283 names of keyword arguments are used (in the form of table index values) to
284 consult the parameter table and to find the frame location in which such
285 arguments are to be stored.
286
287 fn(a, b, d=1, e=2, c=3) -> fn(a, b, c, d, e)
288
289 Frame
290 -----
291
292 a a a a
293 b b b b
294 ___ ___ ___ --> 3
295 ___ --> 1 1 | 1
296 ___ | ___ --> 2 | 2
297 | | |
298 1 ----------- 2 ----------- 3 -----------
299
300 Conceptually, the frame can be considered as a collection of attributes, as
301 seen in other kinds of structures:
302
303 Frame for invocation of fn:
304
305 0 1 2 3 4 5
306 code a b c d e
307 reference
308
309 Where arguments are specified positionally, such "attributes" are set in
310 order. Keyword arguments are set using a name-to-position attribute-like
311 mapping, where the position of each argument is discovered using the parameter
312 table.
313
314 Method Invocations
315 ------------------
316
317 Method invocations incorporate an implicit first argument which is obtained
318 from the context of the method:
319
320 method(a, b, d=1, e=2, c=3) -> method(self, a, b, c, d, e)
321
322 Frame
323 -----
324
325 context of method
326 a
327 b
328 3
329 1
330 2
331
332 Although it could be possible to permit any object to be provided as the first
333 argument, in order to optimise instance attribute access in methods, we should
334 seek to restrict the object type.
335
336 Verifying Supplied Arguments
337 ----------------------------
338
339 In order to ensure a correct invocation, it is also necessary to check the
340 number of supplied arguments. If the target of the invocation is known at
341 compile-time, no additional instructions need to be emitted; otherwise, the
342 generated code must test for the following situations:
343
344 1. That the number of supplied arguments is equal to the number of expected
345 parameters.
346
347 2. That no keyword argument overwrites an existing positional parameter.
348
349 Default Arguments
350 -----------------
351
352 Some arguments may have default values which are used if no value is provided
353 in an invocation. Such defaults are initialised when the function itself is
354 initialised, and are used to fill in any invocation frames that are known at
355 compile-time.
356
357 To obtain the default values, a reference to the function object is required.
358 This can be provided either in an additional frame location or in a special
359 register.
360
361 Tuples, Frames and Allocation
362 -----------------------------
363
364 Using the approach where arguments are treated like attributes in some kind of
365 structure, we could choose to allocate frames in places other than a stack.
366 This would produce something somewhat similar to a plain tuple object.
367
368 Exception Handling
369 ==================
370
371 Active Exceptions
372 -----------------
373
374 When an exception is raised, an exception instance is defined as the active
375 exception. This active exception remains in force until either a new exception
376 is raised in any handling of the exception, or upon exit of a handler where
377 the active exception has been caught (thus resetting the active exception).
378
379 The following operations can be employed to achieve this:
380
381 StoreException records the current value reference using the exception
382 register.
383
384 LoadException obtains the current exception and puts it in the value register.
385
386 CheckException checks the current exception against a class referenced by the
387 current value.
388
389 ClearException resets the exception register.
390
391 Handlers
392 --------
393
394 An exception handler stack is defined such that when a try...except or
395 try...finally block is entered, a new handler is defined.
396
397 When an exception is raised, the program jumps to the most recently defined
398 handler. Inside the handler, the stack entry for the handler will be removed.
399
400 Depending on the nature of the handler and whether the exception is handled,
401 the program may jump to the next most recent handler, and so on.
402
403 If no handler is defined when an exception is raised or re-raised, the program
404 should terminate. This might be done by having a "handler #0" which explicitly
405 terminates the program.
406
407 The following operations can be employed to achieve this:
408
409 PushHandler(block) defines an active handler at the location indicated by the
410 given block.
411
412 PopHandler(n) removes the n topmost active handlers. A single handler is
413 typically removed when leaving a try block, but potentially more handlers are
414 removed when such a block is exited using a return statement.