1 = Translating Programs = 2 3 The Lichen toolchain currently targets the C programming language, generating 4 programs that are then compiled using existing C compilers. The process of 5 translation involves the optimisation and generation of program structures, 6 and the translation of Lichen language constructs into equivalent C language 7 constructs. 8 9 <<TableOfContents(2,3)>> 10 11 == Attribute Access Plans == 12 13 During [[../Inspection|inspection]], each attribute access is associated with 14 a unique location and its details stored. During [[../Deduction|deduction]], 15 such accesses are resolved and characterised. During 16 [[../Optimisation|optimisation]], such accesses are encoded as '''instruction 17 plans''' indicating the operations or instructions required to achieve the 18 access. During translation, these instruction plans are generated as part of 19 the final program. 20 21 == Structure Optimisation == 22 23 All object structures need to support the attributes defined for them, and 24 although it may be possible to determine the identity of attributes in advance 25 of a program being run, there generally remains a need to provide mechanisms 26 to determine the nature of an object at run-time and to access its attributes 27 dynamically. To achieve this, an [[../Optimisation|optimisation]] process 28 catalogues the presence of each attribute name on all program objects, then 29 deciding upon a position for the structure member corresponding to that 30 attribute name in all objects. 31 32 == Structure Generation == 33 34 Each program consists of a number of structures providing the program's 35 objects and defining classes, functions, modules and any predefined instances. 36 37 === Attribute Representation === 38 39 Attributes most typically provide references to objects within a certain 40 optional context. Since attributes are employed to encode various other kinds 41 of information, the data structure may be interpreted in other ways within 42 suitably controlled environments. For example, the `__data__` attribute found 43 on instances of certain built-in classes will encode references to other kinds 44 of information that are used within native functions. 45 46 === Object Representation === 47 48 Objects provide collections of attributes, and the object representation is 49 meant to support efficient computed access to attributes whilst also allowing 50 a reasonably efficient means of testing and accessing attributes at run-time. 51 52 === Structure Constants === 53 54 For clarity, several classes of symbolic constant are defined in the 55 translated program to help define structures or to refer to structure members. 56 57 || '''Constant Class''' || '''Purpose''' || '''Example''' || '''Application''' || 58 || `csize` || Indicates the number of attributes in a class || `__csize___builtins___list_list` ||<|3> Structure definition || 59 || `isize` || Indicates the number of attributes in an instance || `__isize___builtins___list_list` || 60 || `msize` || Indicates the number of attributes in a module || `__msize___builtins___list` || 61 || `pminsize` || Indicates the minimum number of parameters expected by a callable || `__pminsize___builtins___list_list_append` ||<|2> Invocation || 62 || `pmaxsize` || Indicates the maximum number of parameters expected by a callable || `__pmaxsize___builtins___list_list_append` || 63 || `pcode` || Assigns a code to a particular parameter name || `__pcode_value` ||<|2> Keyword argument positioning || 64 || `ppos` || Indicates the table position used by a particular parameter name || `__ppos_value` || 65 || `code` || Assigns a code to a particular attribute name || `__code_value` ||<|2> Attribute access || 66 || `pos` || Indicates the table position used by a particular attribute name || `__pos_value` || 67 68 Such constants are encoded using helper functions, producing names resembling 69 those above that are intended to be distinct and not to conflict with other 70 defined names in the final program. 71 72 === Instance Structures === 73 74 Like all program objects, instances are defined in C according to a generic 75 structure, meaning that all instances have the same generic members from the 76 perspective of a C program. However, specific structures are defined for each 77 class in order to conveniently describe the dimensions of instances of that 78 class. For example, for the built-in list class, a declaration resembling the 79 following is issued: 80 81 {{{ 82 typedef struct { 83 const __table * table; 84 unsigned int pos; 85 __attr attrs[__isize___builtins___list_list___init__]; 86 } __obj___builtins___list_list___init__; 87 }}} 88 89 == Program Translation == 90 91 === Useful C Language Features === 92 93 The translated code relies on the availability of certain useful features of 94 C, some of them supported only in modern compilers: 95 96 * Array and structure literals: these are used to define values within 97 expressions that would otherwise require separate definition; invocation 98 arguments are defined using inline expressions and the `__ARGS` macro (for 99 unknown invocation targets and for keyword arguments) 100 * [[https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html#Variadic-Macros| 101 Variadic macros]]: such macros permit variable numbers of arguments, thus 102 making the `__ARGS` macro possible 103 * [[WikiPedia:Comma operator]] sequences: these are used to construct 104 instruction sequences within expressions, providing a mechanism for 105 formulating attribute accesses 106 107 === Function Naming === 108 109 A naming scheme is applied to functions defined in the final C program to 110 support different kinds of callables present in the original program. 111 112 || '''Name Class''' || '''Purpose''' || '''Example''' || 113 || `fn` || Indicates a plain function || `__fn___builtins___str_str` || 114 || `main` || Indicates a module main program || `__main_sys` || 115 || `new` || Indicates a class instantiator || `__new___builtins___list_list` || 116 || `newliteral` || Provides an instantiator for literals mentioned in programs || `__newliteral___builtins___list_list` || 117 118 === Generic Program Functionality === 119 120 Given the representation of the fundamental program objects, a suite of 121 generic operations is provided to support the activities necessary to inspect 122 and update program data. Such operations are largely concerned with generic 123 object and attribute representations and are practically identical from 124 program to program. 125 126 For example, an attribute provided by an object can be accessed in several 127 different ways depending on the information known about the object when the 128 program is translated. Consequently, a suite of attribute access operations is 129 provided to support each different scenario: 130 131 ||<-2|2> ||<-2> '''Attribute provider''' || 132 || '''Class''' || '''Instance''' || 133 ||<|3> '''Accessor''' || '''Class''' || `load_via_object` || || 134 || '''Instance''' || `load_via_class` || `load_via_object` || 135 || '''Class or instance''' || `get_class_and_load` || `check_and_load_via_any` || 136 137 === Specific Program Functionality === 138 139 Since objects at the program level have their layouts 140 [[../Optimisation|optimised]], and since these layouts may differ from program 141 to program, a suite of specific operations is provided to support a range of 142 activities within programs, such operations being configured with 143 program-specific information in order to do the right thing within a given 144 program. 145 146 || '''Function''' || '''Purpose''' || 147 || `__invoke` || A generic invocation function that populates the argument array and obtains the appropriate target || 148 || `__new` || Creates a new object, setting the `__class__` attribute to the indicated class identity || 149 || `__BOOL` || Tests the identity of an attribute value against the `True` constant's address || 150 || `__GETDEFAULT` || Obtains a function default from additional members of a function instance beyond the core members || 151 || `__SETDEFAULT` || Updates a function default from additional members of a function instance beyond the core members || 152 153 === Module Files === 154 155 Each program module is defined in its own file in the translated program. 156 Since C has no notion of general module-level code, a special main function is 157 generated for each module to perform the operations defined at the module 158 level in the original program. These main functions are called in turn by the 159 principal `main` function of the final C program, and the order of invocation 160 is defined by the module initialisation ordering established when resolving 161 inter-module dependencies. 162 163 === Statement Translation === 164 165 Statements in the original program are generally translated to statements in 166 the final C program. During inspection and translation, certain constructs are 167 transformed to produce the operations necessary to implement such constructs. 168 For instance, operators need to be rewritten as function invocations, and thus 169 the form of the original program changes before it is translated to C. 170 171 When translating certain operations that may appear within a statement, a 172 sequence of instructions is generated, and such instructions must be performed 173 in turn. Fortunately, C supports a wide variety of operations - such as 174 assignments - within expressions, and it also provides the comma operator 175 which permits sequences of operations to be performed in order and to yield a 176 result that can then be used in the context of an expression. Such sequences 177 of operations are employed extensively to realise attribute access instruction 178 plans. 179 180 === Name References === 181 182 Names can be translated in different ways for the final C program. Function 183 locals and parameters correspond directly to locals in C functions. 184 Module-level globals and class-level locals are translated to structure 185 accesses. However, temporary names that are used in sequence assignment are 186 translated to C function locals, either in functions corresponding to their 187 equivalents in the original program, or in module initialisation functions 188 corresponding to the module scope in the original program. 189 190 Certain special names that are formulated during [[../Inspection|inspection]] 191 are converted to constant or static references. 192 193 === Attribute Accesses === 194 195 Each attribute access is associated with a particular access location, 196 corresponding to an operation occurring in the original program identified 197 during inspection. This location can be used as a key to access a 198 [[#Attribute_Access_Plans|plan]] that describes how the access may be realised 199 in the final program. Since most of the work computing the nature of the 200 access is done during the preceding deduction and optimisation activities, 201 what remains is mostly the emission of the plan into the generated program 202 text with some substitutions performed. 203 204 === Invocations === 205 206 Each invocation involves an object whose identity may or may not be known at 207 compile-time, plus a collection of arguments. The translation process is 208 concerned with obtaining a target to be invoked in the generated program, 209 populating an argument array appropriately for the target, and providing the 210 means of invocation. 211 212 Where the identity of the object is not known at all, the information provided 213 is effectively passed on to the special `__invoke` function which is then 214 required to do all the work at run-time. However, one of the goals of 215 analysing and compiling the program is to avoid doing such work at run-time 216 and to take advantage of those cases where the identity of the object can be 217 determined. 218 219 Thus, where the identity of the called object is known, details of the 220 object's signature - its parameters and their positions - are used to populate 221 the argument array and to determine whether this can be done successfully. 222 Some callables require a context to be provided, and knowledge about the 223 callable can be used to obtain or omit such a context from the arguments. 224 225 Finally, the invocation target must be obtained. Where some knowledge of the 226 identity of the object is available, it may be sufficient to access the 227 special `__fn__` member directly (potentially testing for its presence if this 228 is not certain) and to call the appropriate function with the assembled 229 argument array. Where a definite identity is available, the actual function 230 itself may be named and thus called with the arguments. Such operations are 231 mere fragments of the total work performed by the `__invoke` function in the 232 least optimal case. 233 234 === Assignments === 235 236 Assignments of objects to names occur for the following kinds of statements: 237 assignments, definition statements (`class` and `def`), import statements 238 (`from` and `import`). 239 240 === Guards === 241 242 Guards are identified during [[../Deduction|deduction]] as being necessary to 243 uphold deductions about the nature of objects referenced via names that may be 244 undermined at run-time by potentially erroneous program behaviour. 245 246 === Exceptions === 247 248 The C programming language does not support exceptions in a form supported by 249 more modern programming languages. Consequently, lower-level mechanisms need 250 to be employed to reproduce the semantics of the `try` statement, its `except` 251 and `finally` clauses, and the `raise` statement, as well as considering the 252 effect of exceptions on the `return` statement. 253 254 Fortunately, some consideration has been given to this topic before. The 255 [[http://www.nicemice.net/cexcept/|cexcept]] library - really a collection of 256 convenience macros - provides a way of expressing exception constructs in a 257 natural way in C while translating those constructs to usage of the `setjmp` 258 and `longjmp` functions. Although direct usage of such functions could be 259 generated during program translation, for convenience and to focus on other 260 implementation concerns, it was decided to employ the cexcept macros and to 261 build any additional functionality in terms of their usage. 262 263 A structure is defined to hold exception-related information: 264 265 {{{ 266 typedef struct 267 { 268 __attr arg; 269 int raising; 270 int raising_else; 271 int completing; 272 } __exc; 273 }}} 274 275 Here, the `arg` member holds a raised exception object reference. Meanwhile, 276 the remaining members interact with language constructs as follows: 277 278 || '''Member''' || '''Description''' || '''Output Program Operation''' || 279 || `raising` || Indicates an exception that has been raised and not yet handled || `__Raise` || 280 || `raising_else` || Indicates an exception that must be re-raised because it occurred in an `else` clause || `__RaiseElse` || 281 || `completing` || Indicates the completion of, or return from, a `try` clause requiring entry into a `finally` clause || `__Complete` and `__Return` || 282 283 === Memory Allocation and Garbage Collection === 284 285 To avoid having to write a garbage collector, the 286 [[http://www.hboehm.info/gc/|Boehm-Demers-Weiser garbage collector]] is 287 employed by programs to allocate and free memory needed for its objects.