paul@810 | 1 | = Optimisation = |
paul@810 | 2 | |
paul@861 | 3 | In the process of generating an output representation of a Lichen program, the |
paul@861 | 4 | process of optimisation is concerned with positioning members within program |
paul@861 | 5 | structures. Attributes are positioned in object structures, and '''object |
paul@861 | 6 | tables''' are defined so that attributes can be located in such objects. |
paul@861 | 7 | Similarly, parameters, whose positions are determined by their appearance in |
paul@861 | 8 | function and other callable signatures, have positioning information defined |
paul@861 | 9 | in '''parameter tables''' that can be used to position arguments in parameter |
paul@861 | 10 | arrays when their names are used in argument lists. |
paul@810 | 11 | |
paul@861 | 12 | Also performed during optimisation is the consolidation of constant |
paul@861 | 13 | information, initially discussed [[../Inspection#Literals_and_Constants|in the |
paul@861 | 14 | context of inspection]]. |
paul@810 | 15 | |
paul@810 | 16 | <<TableOfContents(2,3)>> |
paul@810 | 17 | |
paul@810 | 18 | == Populating Objects == |
paul@810 | 19 | |
paul@861 | 20 | A program will have objects that are classes, modules and instances, and each |
paul@861 | 21 | such object will provide a number of attributes. Each object's attributes are |
paul@861 | 22 | stored in an array. For simplicity, each attribute having a given name will |
paul@861 | 23 | always be positioned at the same place in any array provided by an object |
paul@861 | 24 | featuring an attribute with that name. Even different object types will use |
paul@861 | 25 | the same position. |
paul@810 | 26 | |
paul@810 | 27 | Consider an attribute called `elephant` provided by a number of types: |
paul@810 | 28 | |
paul@810 | 29 | || '''Type''' ||<-5> '''Attributes''' || |
paul@810 | 30 | || class C || `__class__` || `ant` || `dog` ||<style="background-color: #ddd"> `elephant` || `cat` || |
paul@810 | 31 | || class D || `__class__` || `ant` || `duck` ||<style="background-color: #ddd"> `elephant` || `horse` || |
paul@810 | 32 | || class E || `__class__` || `ant` || `dog` || `giraffe` || `horse` || |
paul@810 | 33 | || module M || `__class__` || `rhino` || `hippo` ||<style="background-color: #ddd"> `elephant` || `zebra` || |
paul@810 | 34 | |
paul@861 | 35 | Where `elephant` is provided as an attribute, it will always appear in the |
paul@861 | 36 | same position - in this case as the fourth attribute - in any object attribute |
paul@861 | 37 | array. |
paul@810 | 38 | |
paul@861 | 39 | Consequently, any operation involving an object of unknown identity that |
paul@861 | 40 | employs `elephant` can employ the same position information to determine |
paul@861 | 41 | whether the attribute is supported and to retrieve or modify its value. Where |
paul@861 | 42 | an object does not support `elephant`, it may use the same attribute position |
paul@861 | 43 | for another attribute. Determining whether objects support attributes is done |
paul@861 | 44 | using tables, described below. |
paul@810 | 45 | |
paul@861 | 46 | It should be noted that the positions of attributes in object structures are |
paul@861 | 47 | the same as the positions of attribute identifiers in object tables. With |
paul@861 | 48 | attribute positions allocated, table definition is trivial. |
paul@810 | 49 | |
paul@810 | 50 | === Allocation Considerations === |
paul@810 | 51 | |
paul@861 | 52 | Such common positioning of attributes demands a degree of coordination between |
paul@861 | 53 | objects. If `elephant` is positioned in a given place in one object, then |
paul@861 | 54 | given the constraint of it only ever being positioned in a single location for |
paul@861 | 55 | all objects, it may not then be positioned in a different place in another |
paul@861 | 56 | object. Thus, where two attributes co-exist in an object, their positions must |
paul@861 | 57 | be different and will affect all objects supporting those attributes, |
paul@861 | 58 | regardless of whether they support them both. For example: |
paul@810 | 59 | |
paul@810 | 60 | || '''Type''' ||<-5> '''Attributes''' || |
paul@810 | 61 | || class C || `__class__` ||<style="background-color: #ddd"> `ant` || `dog` ||<style="background-color: #ddd"> `elephant` || `cat` || |
paul@810 | 62 | || class D || `__class__` ||<style="background-color: #ddd"> `ant` || `duck` ||<style="background-color: #ddd"> `elephant` || `horse` || |
paul@810 | 63 | || class E || `__class__` ||<style="background-color: #ddd"> `ant` || `dog` || `giraffe` || `horse` || |
paul@810 | 64 | || module M || `__class__` || `rhino` || `hippo` ||<style="background-color: #ddd"> `elephant` || `zebra` || |
paul@810 | 65 | || module N || `__class__` || || ||<style="background-color: #ddd"> `elephant` || `zebra` || |
paul@810 | 66 | |
paul@861 | 67 | Here, module N still positions `elephant` in the fourth position. If |
paul@861 | 68 | `elephant` were moved to the second position, it would conflict with `ant` or |
paul@861 | 69 | `rhino` in those objects supporting those attributes. Such coordination |
paul@861 | 70 | therefore has an impact on allocation strategies. Care has to be taken to |
paul@861 | 71 | allocate attributes in such a way that small objects (having few attributes) |
paul@861 | 72 | do not have their attributes positioned far from the start of their attribute |
paul@861 | 73 | arrays, because failing to do so effectively makes those objects large, |
paul@861 | 74 | inefficiently-represented objects. |
paul@810 | 75 | |
paul@861 | 76 | A reasonable solution is to consider the following when allocating attribute |
paul@861 | 77 | positions: |
paul@810 | 78 | |
paul@810 | 79 | * Attribute frequency (or ubiquity) |
paul@810 | 80 | * Object size |
paul@810 | 81 | * Object kind (whether the object is a class, module or instance) |
paul@810 | 82 | |
paul@861 | 83 | More frequently-occurring (or ubiquitous) attributes may need to appear in |
paul@861 | 84 | both large and small objects and should probably be allocated in lower |
paul@861 | 85 | positions (`__class__` being an extreme example of needing to be allocated for |
paul@861 | 86 | all objects). Attributes featured in small objects should also be given |
paul@861 | 87 | priority for lower positions due to the desirability of keeping such objects |
paul@861 | 88 | small. Meanwhile, classes and modules only appear once in a program, whereas |
paul@861 | 89 | there may be countless instances allocated during the lifetime of a program's |
paul@861 | 90 | execution. Therefore, attributes featured in instances should have priority |
paul@861 | 91 | over other attributes for lower positions in object structures. |
paul@810 | 92 | |
paul@810 | 93 | == Populating Signatures == |
paul@810 | 94 | |
paul@861 | 95 | The positions of parameters in callable signatures are determined by the |
paul@861 | 96 | definitions of those signatures in the source program. When callables are |
paul@861 | 97 | invoked using an argument list, arguments that are not specified using |
paul@861 | 98 | keywords are merely copied into the parameter array in the same position. |
paul@861 | 99 | However, keyword arguments will need to have their positions looked up in the |
paul@861 | 100 | appropriate parameter table for the callable involved. |
paul@810 | 101 | |
paul@861 | 102 | Consequently, no allocation needs to be performed on the parameters |
paul@861 | 103 | themselves: they have specific positions for each callable. However, just as |
paul@861 | 104 | attribute names must yield positions when accessing attributes on objects, a |
paul@861 | 105 | similar mechanism that takes parameter names and yields positions must be |
paul@861 | 106 | provided. It is instead the positions of parameter details that must be |
paul@861 | 107 | allocated in structures to be consulted as if parameter names were attribute |
paul@861 | 108 | names, with attributes providing parameter position details. |
paul@810 | 109 | |
paul@810 | 110 | Consider the following functions: |
paul@810 | 111 | |
paul@810 | 112 | {{{#!python numbers=disable |
paul@810 | 113 | def f(a, b, c, d): |
paul@810 | 114 | ... |
paul@810 | 115 | |
paul@810 | 116 | def g(p, q, r): |
paul@810 | 117 | ... |
paul@810 | 118 | |
paul@810 | 119 | def h(d, p, v): |
paul@810 | 120 | ... |
paul@810 | 121 | }}} |
paul@810 | 122 | |
paul@861 | 123 | In order to find the position of each parameter using its name, the following |
paul@861 | 124 | table could be provided: |
paul@810 | 125 | |
paul@810 | 126 | || '''Table''' ||<-4> '''Parameters''' || |
paul@810 | 127 | || function f || `a` at 1 || `b` at 2 || `c` at 3 || `d` at 4 || |
paul@810 | 128 | || function g || `q` at 2 ||<style="background-color: #ddd"> `p` at 1 || `r` at 3 || || |
paul@810 | 129 | || function h || `v` at 2 ||<style="background-color: #ddd"> `p` at 2 || || `d` at 1 || |
paul@810 | 130 | |
paul@861 | 131 | Such a table behaves like an object structure (or an object table) with |
paul@861 | 132 | parameters acting like attributes in such a structure. Here, the attributes |
paul@861 | 133 | yield the positions of parameters within parameter arrays. Note how `p` always |
paul@861 | 134 | resides in the same location but may yield different positions depending on |
paul@861 | 135 | the actual callable involved (as is also observable with `d`). |
paul@810 | 136 | |
paul@810 | 137 | === Allocation Considerations === |
paul@810 | 138 | |
paul@861 | 139 | Parameter table allocation involves similar considerations to those |
paul@861 | 140 | influencing object table allocation. In order to keep parameter tables small, |
paul@861 | 141 | more frequently appearing parameters should be positioned earlier in arrays. A |
paul@861 | 142 | specific consideration of importance is that of the number of tables |
paul@861 | 143 | generated. There may be many callables with the same signature, and such |
paul@861 | 144 | callables therefore do not need their own parameter tables since they will |
paul@861 | 145 | merely be duplicates. An attempt is therefore made to identify distinct |
paul@861 | 146 | signatures and to generate parameter tables only for these signatures, instead |
paul@861 | 147 | of providing a one-to-one mapping between callables and tables. |
paul@810 | 148 | |
paul@810 | 149 | == Populating Tables == |
paul@810 | 150 | |
paul@861 | 151 | With names allocated to positions in each kind of table, the straightforward |
paul@861 | 152 | task of generating such tables proceeds as follows. |
paul@810 | 153 | |
paul@810 | 154 | === Object Tables and Attribute Codes === |
paul@810 | 155 | |
paul@861 | 156 | Object tables consist of locations directly corresponding to structure member |
paul@861 | 157 | locations. Each location in a table will correspond to a specific name, but |
paul@861 | 158 | since potentially many names may have the same position, a table must provide |
paul@861 | 159 | identifying details for the name that is actually supported. |
paul@810 | 160 | |
paul@861 | 161 | In object tables, such identifying details take the form of '''attribute |
paul@861 | 162 | codes'''. Each attribute name is mapped to a distinct identifier, and upon |
paul@861 | 163 | consulting an object table, the lookup process must read the stored attribute |
paul@861 | 164 | code and compare it to the code for the attribute it intends to access. If |
paul@861 | 165 | these codes match, then it can be assumed that the object involved supports |
paul@861 | 166 | the named attribute. Otherwise, a different attribute (or even no attribute at |
paul@861 | 167 | all) resides at that location in the object structure, making the access |
paul@861 | 168 | inappropriate. |
paul@810 | 169 | |
paul@810 | 170 | A more comprehensive object table resembles the following: |
paul@810 | 171 | |
paul@810 | 172 | || '''Type''' ||<-5> '''Attributes (Codes)''' || |
paul@810 | 173 | || class C || `__class__` (1) || `ant` (2) || `dog` (4) || `elephant` (6) || `cat` (3) || |
paul@810 | 174 | || class D || `__class__` (1) || `ant` (2) || `duck` (5) || `elephant` (6) || `horse` (9) || |
paul@810 | 175 | || class E || `__class__` (1) || `ant` (2) || `dog` (4) || `giraffe` (7) || `horse` (9) || |
paul@810 | 176 | || module M || `__class__` (1) || `rhino` (10) || `hippo` (8) || `elephant` (6) || `zebra` (11) || |
paul@810 | 177 | || module N || `__class__` (1) || || || `elephant` (6) || `zebra` (11) || |
paul@810 | 178 | |
paul@810 | 179 | The following attribute codes would be employed: |
paul@810 | 180 | |
paul@810 | 181 | || '''Attribute Name''' || '''Attribute Code''' || |
paul@810 | 182 | || `__class__` || 1 || |
paul@810 | 183 | || `ant` || 2 || |
paul@810 | 184 | || `cat` || 3 || |
paul@810 | 185 | || `dog` || 4 || |
paul@810 | 186 | || `duck` || 5 || |
paul@810 | 187 | || `elephant` || 6 || |
paul@810 | 188 | || `giraffe` || 7 || |
paul@810 | 189 | || `hippo` || 8 || |
paul@810 | 190 | || `horse` || 9 || |
paul@810 | 191 | || `rhino` || 10 || |
paul@810 | 192 | || `zebra` || 11 || |
paul@810 | 193 | |
paul@810 | 194 | === Parameter Tables and Parameter Codes === |
paul@810 | 195 | |
paul@861 | 196 | Parameter tables consist of locations yielding parameter position information. |
paul@861 | 197 | Each location in a table will correspond to a particular name, but since |
paul@861 | 198 | potentially many names may have the same position, a table must provide |
paul@861 | 199 | identifying details for the name that is actually supported. |
paul@810 | 200 | |
paul@861 | 201 | Just as with object tables, in parameter tables, such identifying details take |
paul@861 | 202 | the form of '''parameter codes'''. Each parameter name is mapped to a distinct |
paul@861 | 203 | identifier, and upon consulting a parameter table, the lookup process must |
paul@861 | 204 | read the stored parameter code and compare it to the code for the parameter it |
paul@861 | 205 | is attempting to position in a parameter list. If these codes match, then it |
paul@861 | 206 | can be assumed that the signature supports the named parameter. Otherwise, a |
paul@861 | 207 | different parameter (or even no parameter at all) resides at that location in |
paul@861 | 208 | the parameter table, making any attempt to set such a parameter inappropriate. |
paul@810 | 209 | |
paul@861 | 210 | Since parameter tables provide both identifying information and parameter |
paul@861 | 211 | positions, a more comprehensive parameter table resembles the following: |
paul@810 | 212 | |
paul@810 | 213 | || '''Table''' ||<-4> '''Parameters (Codes)''' || |
paul@810 | 214 | || function f || `a` at 1 (1) || `b` at 2 (2) || `c` at 3 (3) || `d` at 4 (4) || |
paul@810 | 215 | || function g || `q` at 2 (6) ||<style="background-color: #ddd"> `p` at 1 (5) || `r` at 3 (7) || || |
paul@810 | 216 | || function h || `v` at 2 (8) ||<style="background-color: #ddd"> `p` at 2 (5) || || `d` at 1 (4) || |
paul@810 | 217 | |
paul@810 | 218 | The following parameter codes would be employed: |
paul@810 | 219 | |
paul@810 | 220 | || '''Parameter Name''' || '''Parameter Code''' || |
paul@810 | 221 | || `a` || 1 || |
paul@810 | 222 | || `b` || 2 || |
paul@810 | 223 | || `c` || 3 || |
paul@810 | 224 | || `d` || 4 || |
paul@810 | 225 | || `p` || 5 || |
paul@810 | 226 | || `q` || 6 || |
paul@810 | 227 | || `r` || 7 || |
paul@810 | 228 | || `v` || 8 || |
paul@810 | 229 | |
paul@810 | 230 | == Consolidating Constants == |
paul@810 | 231 | |
paul@861 | 232 | With knowledge about all constants used in a program, it becomes possible to |
paul@861 | 233 | prepare a catalogue of distinct constants and to assign each of them a unique |
paul@861 | 234 | name for convenient referencing in the generated program code. All constants |
paul@861 | 235 | are inspected in turn and a content digest prepared for each of them, |
paul@861 | 236 | establishing a mapping from values to such digests which act as global |
paul@861 | 237 | identifiers. Since local names are associated with constants, a mapping is |
paul@861 | 238 | also established from each local name to the corresponding global identifier. |
paul@810 | 239 | |
paul@861 | 240 | Considering the [[../Inspection#Literals_and_Constants|previous example]], the |
paul@861 | 241 | following mappings would be established, from values to global identifiers: |
paul@810 | 242 | |
paul@810 | 243 | || '''Type''' || '''Encoding''' || '''Value''' || '''Global Identifier (Content Digest)''' || |
paul@810 | 244 | || `__builtins__.str.string` || `iso-8859-15` || `'\xc6\xd8\xc5'` || `OeyoRfPI__XqwJcPgTDTItX9v__OM` || |
paul@810 | 245 | || `__builtins__.int.int` || || `123` || `qPRjUheGUngBhhVaHVqYNbBu4m8` || |
paul@810 | 246 | |
paul@810 | 247 | And from local names or identifiers to global identifiers: |
paul@810 | 248 | |
paul@810 | 249 | || '''Identifier''' || '''Global Identifier (Content Digest)''' || |
paul@810 | 250 | || `__main__.$c0` || `OeyoRfPI__XqwJcPgTDTItX9v__OM` || |
paul@810 | 251 | || `__main__.$c1` || `qPRjUheGUngBhhVaHVqYNbBu4m8` || |
paul@810 | 252 | |
paul@810 | 253 | == Optimisation Products == |
paul@810 | 254 | |
paul@861 | 255 | The optimisation process should produce catalogues of attribute and parameter |
paul@861 | 256 | codes plus positioning information for object tables, object structures and |
paul@861 | 257 | parameter tables. It should also produce a catalogue of distinct constant |
paul@861 | 258 | identifiers. |