1 Low-level Implementation Details
2 ================================
3
4 Although micropython delegates the generation of low-level program code and
5 data to syspython, various considerations of how an eventual program might be
6 structured have been used to inform the way micropython represents the details
7 of a program. This document describes these considerations and indicates how
8 syspython or other technologies might represent a working program.
9
10 * Objects and structures
11 * Instantiation
12 * List and tuple representations
13 * Instruction evaluation model
14
15 Objects and Structures
16 ======================
17
18 As well as references, micropython needs to have actual objects to refer to.
19 Since classes, functions and instances are all objects, it is desirable that
20 certain common features and operations are supported in the same way for all
21 of these things. To permit this, a common data structure format is used.
22
23 Header.................................................... Attributes.................
24
25 Identifier Identifier Address Identifier Size Object ...
26
27 0 1 2 3 4 5 6
28 classcode attrcode/ invocation funccode size attribute ...
29 instance reference reference
30 status
31
32 Classcode
33 ---------
34
35 Used in attribute lookup.
36
37 Here, the classcode refers to the attribute lookup table for the object (as
38 described in concepts.txt). Classes and instances share the same classcode,
39 and their structures reflect this. Functions all belong to the same type and
40 thus employ the classcode for the function built-in type, whereas modules have
41 distinct types since they must support different sets of attributes.
42
43 Attrcode
44 --------
45
46 Used to test instances for membership of classes (or descendants of classes).
47
48 Since, in traditional Python, classes are only ever instances of some generic
49 built-in type, support for testing such a relationship directly has been
50 removed and the attrcode is not specified for classes: the presence of an
51 attrcode indicates that a given object is an instance. In addition, support
52 has also been removed for testing modules in the same way, meaning that the
53 attrcode is also not specified for modules.
54
55 See the "Testing Instance Compatibility with Classes (Attrcode)" section below
56 for details of attrcodes.
57
58 Invocation Reference
59 --------------------
60
61 Used when an object is called.
62
63 This is the address of the code to be executed when an invocation is performed
64 on the object.
65
66 Funccode
67 --------
68
69 Used to look up argument positions by name.
70
71 The strategy with keyword arguments in micropython is to attempt to position
72 such arguments in the invocation frame as it is being constructed.
73
74 See the "Parameters and Lookups" section for more information.
75
76 Size
77 ----
78
79 Used to indicate the size of an object including attributes.
80
81 Attributes
82 ----------
83
84 For classes, modules and instances, the attributes in the structure correspond
85 to the attributes of each kind of object. For functions, however, the
86 attributes in the structure correspond to the default arguments for each
87 function, if any.
88
89 Structure Types
90 ---------------
91
92 Class C:
93
94 0 1 2 3 4 5 6
95 classcode (unused) __new__ funccode size attribute ...
96 for C reference for reference
97 instantiator
98
99 Instance of C:
100
101 0 1 2 3 4 5 6
102 classcode attrcode C.__call__ funccode size attribute ...
103 for C for C reference for reference
104 (if exists) C.__call__
105
106 Function f:
107
108 0 1 2 3 4 5 6
109 classcode attrcode code funccode size attribute ...
110 for for reference (default)
111 function function reference
112
113 Module m:
114
115 0 1 2 3 4 5 6
116 classcode attrcode (unused) (unused) (unused) attribute ...
117 for m for m (global)
118 reference
119
120 The __class__ Attribute
121 -----------------------
122
123 All objects should support the __class__ attribute, and in most cases this is
124 done using the object table, yielding a common address for all instances of a
125 given class.
126
127 Function: refers to the function class
128 Instance: refers to the class instantiated to make the object
129
130 The object table cannot support two definitions simultaneously for both
131 instances and their classes. Consequently, __class__ access on classes must be
132 tested for and a special result returned.
133
134 Class: refers to the type class (type.__class__ also refers to the type class)
135
136 For convenience, the first attribute of a class will be the common __class__
137 attribute for all its instances. As noted above, direct access to this
138 attribute will not be possible for classes, and a constant result will be
139 returned instead.
140
141 Lists and Tuples
142 ----------------
143
144 The built-in list and tuple sequences employ variable length structures using
145 the attribute locations to store their elements, where each element is a
146 reference to a separately stored object.
147
148 Testing Instance Compatibility with Classes (Attrcode)
149 ------------------------------------------------------
150
151 Although it would be possible to have a data structure mapping classes to
152 compatible classes, such as a matrix indicating the subclasses (or
153 superclasses) of each class, the need to retain the key to such a data
154 structure for each class might introduce a noticeable overhead.
155
156 Instead of having a separate structure, descendant classes of each class are
157 inserted as special attributes into the object table. This requires an extra
158 key to be retained, since each class must provide its own attribute code such
159 that upon an instance/class compatibility test, the code may be obtained and
160 used in the object table.
161
162 Invocation and Code References
163 ------------------------------
164
165 Modules: there is no meaningful invocation reference since modules cannot be
166 explicitly called.
167
168 Functions: a simple code reference is employed pointing to code implementing
169 the function. Note that the function locals are completely distinct from this
170 structure and are not comparable to attributes. Instead, attributes are
171 reserved for default parameter values, although they do not appear in the
172 object table described above, appearing instead in a separate parameter table
173 described in concepts.txt.
174
175 Classes: given that classes must be invoked in order to create instances, a
176 reference must be provided in class structures. However, this reference does
177 not point directly at the __init__ method of the class. Instead, the
178 referenced code belongs to a special initialiser function, __new__, consisting
179 of the following instructions:
180
181 create instance for C
182 call C.__init__(instance, ...)
183 return instance
184
185 Instances: each instance employs a reference to any __call__ method defined in
186 the class hierarchy for the instance, thus maintaining its callable nature.
187
188 Both classes and modules may contain code in their definitions - the former in
189 the "body" of the class, potentially defining attributes, and the latter as
190 the "top-level" code in the module, potentially defining attributes/globals -
191 but this code is not associated with any invocation target. It is thus
192 generated in order of appearance and is not referenced externally.
193
194 Invocation Operation
195 --------------------
196
197 Consequently, regardless of the object an invocation is always done as
198 follows:
199
200 get invocation reference from the header
201 jump to reference
202
203 Additional preparation is necessary before the above code: positional
204 arguments must be saved in the invocation frame, and keyword arguments must be
205 resolved and saved to the appropriate position in the invocation frame.
206
207 See invocation.txt for details.
208
209 Instantiation
210 =============
211
212 When instantiating classes, memory must be reserved for the header of the
213 resulting instance, along with locations for the attributes of the instance.
214 Since the instance header contains data common to all instances of a class, a
215 template header is copied to the start of the newly reserved memory region.
216
217 List and Tuple Representations
218 ==============================
219
220 Since tuples have a fixed size, the representation of a tuple instance is
221 merely a header describing the size of the entire object, together with a
222 sequence of references to the object "stored" at each position in the
223 structure. Such references consist of the usual context and reference pair.
224
225 Lists, however, have a variable size and must be accessible via an unchanging
226 location even as more memory is allocated elsewhere to accommodate the
227 contents of the list. Consequently, the representation must resemble the
228 following:
229
230 Structure header for list (size == header plus special attribute)
231 Special attribute referencing the underlying sequence
232
233 The underlying sequence has a fixed size, like a tuple, but may contain fewer
234 elements than the size of the sequence permits:
235
236 Special header indicating the current size and allocated size
237 Element
238 ... <-- current size
239 (Unused space)
240 ... <-- allocated size
241
242 This representation permits the allocation of a new sequence when space is
243 exhausted in an existing sequence, with the new sequence address stored in the
244 main list structure. Since access to the contents of the list must go through
245 the main list structure, underlying allocation activities may take place
246 without the users of a list having to be aware of such activities.
247
248 Instruction Evaluation Model
249 ============================
250
251 Programs use a value stack containing local and temporary storage. A value
252 stack pointer indicates the top of this stack. In addition, a separate stack
253 is used to record the invocation frames. All stack pointers refer to the next
254 address to be used by the stack, not the address of the uppermost element.
255
256 Frame Stack Value Stack
257 ----------- ----------- Address of Callable
258 -------------------
259 previous ...
260 current ------> callable -----> identifier
261 arg1 reference to code
262 arg2
263 arg3
264 local4
265 local5
266 ...
267
268 Unlike the CPython virtual machine, programs do not use a value stack
269 containing the results of expression evaluations. Instead, temporary storage
270 is statically allocated and employed by instructions.
271
272 Loading local names and temporary values is a matter of performing
273 frame-relative accesses to the value stack.
274
275 Invocations and Argument Evaluation
276 -----------------------------------
277
278 When preparing for an invocation, the caller first sets the invocation frame
279 pointer. A number of locations for the arguments required by the invocation
280 are then reserved. With a frame to prepare, positional arguments are added to
281 the frame in order such that the first argument positions are filled. The
282 names of keyword arguments are used (in the form of table index values) to
283 consult the parameter table and to find the frame location in which such
284 arguments are to be stored.
285
286 fn(a, b, d=1, e=2, c=3) -> fn(a, b, c, d, e)
287
288 Frame
289 -----
290
291 a a a a
292 b b b b
293 ___ ___ ___ --> 3
294 ___ --> 1 1 | 1
295 ___ | ___ --> 2 | 2
296 | | |
297 1 ----------- 2 ----------- 3 -----------
298
299 Conceptually, the frame can be considered as a collection of attributes, as
300 seen in other kinds of structures:
301
302 Frame for invocation of fn:
303
304 0 1 2 3 4 5
305 code a b c d e
306 reference
307
308 Where arguments are specified positionally, such "attributes" are set in
309 order. Keyword arguments are set using a name-to-position attribute-like
310 mapping, where the position of each argument is discovered using the parameter
311 table.
312
313 Method Invocations
314 ------------------
315
316 Method invocations incorporate an implicit first argument which is obtained
317 from the context of the method:
318
319 method(a, b, d=1, e=2, c=3) -> method(self, a, b, c, d, e)
320
321 Frame
322 -----
323
324 context of method
325 a
326 b
327 3
328 1
329 2
330
331 Although it could be possible to permit any object to be provided as the first
332 argument, in order to optimise instance attribute access in methods, we should
333 seek to restrict the object type.
334
335 Verifying Supplied Arguments
336 ----------------------------
337
338 In order to ensure a correct invocation, it is also necessary to check the
339 number of supplied arguments. If the target of the invocation is known at
340 compile-time, no additional instructions need to be emitted; otherwise, the
341 generated code must test for the following situations:
342
343 1. That the number of supplied arguments is equal to the number of expected
344 parameters.
345
346 2. That no keyword argument overwrites an existing positional parameter.
347
348 Default Arguments
349 -----------------
350
351 Some arguments may have default values which are used if no value is provided
352 in an invocation. Such defaults are initialised when the function itself is
353 initialised, and are used to fill in any invocation frames that are known at
354 compile-time.
355
356 To obtain the default values, a reference to the function object is required.
357 This can be provided either in an additional frame location or in a special
358 register.
359
360 Tuples, Frames and Allocation
361 -----------------------------
362
363 Using the approach where arguments are treated like attributes in some kind of
364 structure, we could choose to allocate frames in places other than a stack.
365 This would produce something somewhat similar to a plain tuple object.