Lichen

docs/wiki/Translation

861:f745d151a441
2018-08-17 Paul Boddie Wrapped text in the documentation for more convenient "offline" editing.
     1 = Translating Programs =     2      3 The Lichen toolchain currently targets the C programming language, generating     4 programs that are then compiled using existing C compilers. The process of     5 translation involves the optimisation and generation of program structures,     6 and the translation of Lichen language constructs into equivalent C language     7 constructs.     8      9 <<TableOfContents(2,3)>>    10     11 == Attribute Access Plans ==    12     13 During [[../Inspection|inspection]], each attribute access is associated with    14 a unique location and its details stored. During [[../Deduction|deduction]],    15 such accesses are resolved and characterised. During    16 [[../Optimisation|optimisation]], such accesses are encoded as '''instruction    17 plans''' indicating the operations or instructions required to achieve the    18 access. During translation, these instruction plans are generated as part of    19 the final program.    20     21 == Structure Optimisation ==    22     23 All object structures need to support the attributes defined for them, and    24 although it may be possible to determine the identity of attributes in advance    25 of a program being run, there generally remains a need to provide mechanisms    26 to determine the nature of an object at run-time and to access its attributes    27 dynamically. To achieve this, an [[../Optimisation|optimisation]] process    28 catalogues the presence of each attribute name on all program objects, then    29 deciding upon a position for the structure member corresponding to that    30 attribute name in all objects.    31     32 == Structure Generation ==    33     34 Each program consists of a number of structures providing the program's    35 objects and defining classes, functions, modules and any predefined instances.    36     37 === Attribute Representation ===    38     39 Attributes most typically provide references to objects within a certain    40 optional context. Since attributes are employed to encode various other kinds    41 of information, the data structure may be interpreted in other ways within    42 suitably controlled environments. For example, the `__data__` attribute found    43 on instances of certain built-in classes will encode references to other kinds    44 of information that are used within native functions.    45     46 === Object Representation ===    47     48 Objects provide collections of attributes, and the object representation is    49 meant to support efficient computed access to attributes whilst also allowing    50 a reasonably efficient means of testing and accessing attributes at run-time.    51     52 === Structure Constants ===    53     54 For clarity, several classes of symbolic constant are defined in the    55 translated program to help define structures or to refer to structure members.    56     57 || '''Constant Class''' || '''Purpose''' || '''Example''' || '''Application''' ||    58 || `csize` || Indicates the number of attributes in a class || `__csize___builtins___list_list` ||<|3> Structure definition ||    59 || `isize` || Indicates the number of attributes in an instance || `__isize___builtins___list_list` ||    60 || `msize` || Indicates the number of attributes in a module || `__msize___builtins___list` ||    61 || `pminsize` || Indicates the minimum number of parameters expected by a callable || `__pminsize___builtins___list_list_append` ||<|2> Invocation ||    62 || `pmaxsize` || Indicates the maximum number of parameters expected by a callable || `__pmaxsize___builtins___list_list_append` ||    63 || `pcode` || Assigns a code to a particular parameter name || `__pcode_value` ||<|2> Keyword argument positioning ||    64 || `ppos` || Indicates the table position used by a particular parameter name || `__ppos_value` ||    65 || `code` || Assigns a code to a particular attribute name || `__code_value` ||<|2> Attribute access ||    66 || `pos` || Indicates the table position used by a particular attribute name || `__pos_value` ||    67     68 Such constants are encoded using helper functions, producing names resembling    69 those above that are intended to be distinct and not to conflict with other    70 defined names in the final program.    71     72 === Instance Structures ===    73     74 Like all program objects, instances are defined in C according to a generic    75 structure, meaning that all instances have the same generic members from the    76 perspective of a C program. However, specific structures are defined for each    77 class in order to conveniently describe the dimensions of instances of that    78 class. For example, for the built-in list class, a declaration resembling the    79 following is issued:    80     81 {{{    82 typedef struct {    83     const __table * table;    84     unsigned int pos;    85     __attr attrs[__isize___builtins___list_list___init__];    86 } __obj___builtins___list_list___init__;    87 }}}    88     89 == Program Translation ==    90     91 === Useful C Language Features ===    92     93 The translated code relies on the availability of certain useful features of    94 C, some of them supported only in modern compilers:    95     96  * Array and structure literals: these are used to define values within    97    expressions that would otherwise require separate definition; invocation    98    arguments are defined using inline expressions and the `__ARGS` macro (for    99    unknown invocation targets and for keyword arguments)   100  * [[https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html#Variadic-Macros|   101    Variadic macros]]: such macros permit variable numbers of arguments, thus   102    making the `__ARGS` macro possible   103  * [[WikiPedia:Comma operator]] sequences: these are used to construct   104    instruction sequences within expressions, providing a mechanism for   105    formulating attribute accesses   106    107 === Function Naming ===   108    109 A naming scheme is applied to functions defined in the final C program to   110 support different kinds of callables present in the original program.   111    112 || '''Name Class''' || '''Purpose''' || '''Example''' ||   113 || `fn` || Indicates a plain function || `__fn___builtins___str_str` ||   114 || `main` || Indicates a module main program || `__main_sys` ||   115 || `new` || Indicates a class instantiator || `__new___builtins___list_list` ||   116 || `newliteral` || Provides an instantiator for literals mentioned in programs || `__newliteral___builtins___list_list` ||   117    118 === Generic Program Functionality ===   119    120 Given the representation of the fundamental program objects, a suite of   121 generic operations is provided to support the activities necessary to inspect   122 and update program data. Such operations are largely concerned with generic   123 object and attribute representations and are practically identical from   124 program to program.   125    126 For example, an attribute provided by an object can be accessed in several   127 different ways depending on the information known about the object when the   128 program is translated. Consequently, a suite of attribute access operations is   129 provided to support each different scenario:   130    131 ||<-2|2> ||<-2> '''Attribute provider''' ||   132 || '''Class''' || '''Instance''' ||   133 ||<|3> '''Accessor''' || '''Class''' || `load_via_object` || ||   134 || '''Instance''' || `load_via_class` || `load_via_object` ||   135 || '''Class or instance''' || `get_class_and_load` ||  `check_and_load_via_any` ||   136    137 === Specific Program Functionality ===   138    139 Since objects at the program level have their layouts   140 [[../Optimisation|optimised]], and since these layouts may differ from program   141 to program, a suite of specific operations is provided to support a range of   142 activities within programs, such operations being configured with   143 program-specific information in order to do the right thing within a given   144 program.   145    146 || '''Function''' || '''Purpose''' ||   147 || `__invoke` || A generic invocation function that populates the argument array and obtains the appropriate target ||   148 || `__new` || Creates a new object, setting the `__class__` attribute to the indicated class identity ||   149 || `__BOOL` || Tests the identity of an attribute value against the `True` constant's address ||   150 || `__GETDEFAULT` || Obtains a function default from additional members of a function instance beyond the core members ||   151 || `__SETDEFAULT` || Updates a function default from additional members of a function instance beyond the core members ||   152    153 === Module Files ===   154    155 Each program module is defined in its own file in the translated program.   156 Since C has no notion of general module-level code, a special main function is   157 generated for each module to perform the operations defined at the module   158 level in the original program. These main functions are called in turn by the   159 principal `main` function of the final C program, and the order of invocation   160 is defined by the module initialisation ordering established when resolving   161 inter-module dependencies.   162    163 === Statement Translation ===   164    165 Statements in the original program are generally translated to statements in   166 the final C program. During inspection and translation, certain constructs are   167 transformed to produce the operations necessary to implement such constructs.   168 For instance, operators need to be rewritten as function invocations, and thus   169 the form of the original program changes before it is translated to C.   170    171 When translating certain operations that may appear within a statement, a   172 sequence of instructions is generated, and such instructions must be performed   173 in turn. Fortunately, C supports a wide variety of operations - such as   174 assignments - within expressions, and it also provides the comma operator   175 which permits sequences of operations to be performed in order and to yield a   176 result that can then be used in the context of an expression. Such sequences   177 of operations are employed extensively to realise attribute access instruction   178 plans.   179    180 === Name References ===   181    182 Names can be translated in different ways for the final C program. Function   183 locals and parameters correspond directly to locals in C functions.   184 Module-level globals and class-level locals are translated to structure   185 accesses. However, temporary names that are used in sequence assignment are   186 translated to C function locals, either in functions corresponding to their   187 equivalents in the original program, or in module initialisation functions   188 corresponding to the module scope in the original program.   189    190 Certain special names that are formulated during [[../Inspection|inspection]]   191 are converted to constant or static references.   192    193 === Attribute Accesses ===   194    195 Each attribute access is associated with a particular access location,   196 corresponding to an operation occurring in the original program identified   197 during inspection. This location can be used as a key to access a   198 [[#Attribute_Access_Plans|plan]] that describes how the access may be realised   199 in the final program. Since most of the work computing the nature of the   200 access is done during the preceding deduction and optimisation activities,   201 what remains is mostly the emission of the plan into the generated program   202 text with some substitutions performed.   203    204 === Invocations ===   205    206 Each invocation involves an object whose identity may or may not be known at   207 compile-time, plus a collection of arguments. The translation process is   208 concerned with obtaining a target to be invoked in the generated program,   209 populating an argument array appropriately for the target, and providing the   210 means of invocation.   211    212 Where the identity of the object is not known at all, the information provided   213 is effectively passed on to the special `__invoke` function which is then   214 required to do all the work at run-time. However, one of the goals of   215 analysing and compiling the program is to avoid doing such work at run-time   216 and to take advantage of those cases where the identity of the object can be   217 determined.   218    219 Thus, where the identity of the called object is known, details of the   220 object's signature - its parameters and their positions - are used to populate   221 the argument array and to determine whether this can be done successfully.   222 Some callables require a context to be provided, and knowledge about the   223 callable can be used to obtain or omit such a context from the arguments.   224    225 Finally, the invocation target must be obtained. Where some knowledge of the   226 identity of the object is available, it may be sufficient to access the   227 special `__fn__` member directly (potentially testing for its presence if this   228 is not certain) and to call the appropriate function with the assembled   229 argument array. Where a definite identity is available, the actual function   230 itself may be named and thus called with the arguments. Such operations are   231 mere fragments of the total work performed by the `__invoke` function in the   232 least optimal case.   233    234 === Assignments ===   235    236 Assignments of objects to names occur for the following kinds of statements:   237 assignments, definition statements (`class` and `def`), import statements   238 (`from` and `import`).   239    240 === Guards ===   241    242 Guards are identified during [[../Deduction|deduction]] as being necessary to   243 uphold deductions about the nature of objects referenced via names that may be   244 undermined at run-time by potentially erroneous program behaviour.   245    246 === Exceptions ===   247    248 The C programming language does not support exceptions in a form supported by   249 more modern programming languages. Consequently, lower-level mechanisms need   250 to be employed to reproduce the semantics of the `try` statement, its `except`   251 and `finally` clauses, and the `raise` statement, as well as considering the   252 effect of exceptions on the `return` statement.   253    254 Fortunately, some consideration has been given to this topic before. The   255 [[http://www.nicemice.net/cexcept/|cexcept]] library - really a collection of   256 convenience macros - provides a way of expressing exception constructs in a   257 natural way in C while translating those constructs to usage of the `setjmp`   258 and `longjmp` functions. Although direct usage of such functions could be   259 generated during program translation, for convenience and to focus on other   260 implementation concerns, it was decided to employ the cexcept macros and to   261 build any additional functionality in terms of their usage.   262    263 A structure is defined to hold exception-related information:   264    265 {{{   266 typedef struct   267 {   268     __attr arg;   269     int raising;   270     int raising_else;   271     int completing;   272 } __exc;   273 }}}   274    275 Here, the `arg` member holds a raised exception object reference. Meanwhile,   276 the remaining members interact with language constructs as follows:   277    278 || '''Member''' || '''Description''' || '''Output Program Operation''' ||   279 || `raising` || Indicates an exception that has been raised and not yet handled || `__Raise` ||   280 || `raising_else` || Indicates an exception that must be re-raised because it occurred in an `else` clause || `__RaiseElse` ||   281 || `completing` || Indicates the completion of, or return from, a `try` clause requiring entry into a `finally` clause || `__Complete` and `__Return` ||   282    283 === Memory Allocation and Garbage Collection ===   284    285 To avoid having to write a garbage collector, the   286 [[http://www.hboehm.info/gc/|Boehm-Demers-Weiser garbage collector]] is   287 employed by programs to allocate and free memory needed for its objects.