ULA

Annotated ULA.txt

140:ab28336894bc
18 months ago Paul Boddie Added a suggested text mode approach.
paul@71 1
The Acorn Electron ULA
paul@71 2
======================
paul@71 3
paul@46 4
Principal Design and Feature Constraints
paul@46 5
----------------------------------------
paul@46 6
paul@116 7
The features of the ULA are limited in sophistication by the amount of time
paul@116 8
and resources that can be allocated to each activity supporting the
paul@116 9
fundamental features and obligations of the unit. Maintaining a screen display
paul@116 10
based on the contents of RAM itself requires the ULA to have exclusive access
paul@116 11
to various hardware resources for a significant period of time.
paul@116 12
paul@116 13
Whilst other elements of the ULA can in principle run in parallel with the
paul@116 14
display refresh activity, they cannot also access the RAM at the same time.
paul@116 15
Consequently, other features that might use the RAM must accept a reduced
paul@116 16
allocation of that resource in comparison to a hypothetical architecture where
paul@116 17
concurrent RAM access is possible at all times.
paul@46 18
paul@46 19
Thus, the principal constraint for many features is bandwidth. The duration of
paul@46 20
access to hardware resources is one aspect of this; the rate at which such
paul@46 21
resources can be accessed is another. For example, the RAM is not fast enough
paul@46 22
to support access more frequently than one byte per 2MHz cycle, and for screen
paul@46 23
modes involving 80 bytes of screen data per scanline, there are no free cycles
paul@46 24
for anything other than the production of pixel output during the active
paul@46 25
scanline periods.
paul@46 26
paul@116 27
Another constraint is imposed by the method of RAM access provided by the ULA.
paul@116 28
The ULA is able to access RAM by fetching 4 bits at a time and thus managing
paul@116 29
to transfer 8 bits within a single 2MHz cycle, this being sufficient to
paul@116 30
provide display data for the most demanding screen modes. However, this
paul@116 31
mechanism's timing requirements are beyond the capabilities of the CPU when
paul@116 32
running at 2MHz.
paul@116 33
paul@116 34
Consequently, the CPU will only ever be able to access RAM via the ULA at
paul@116 35
1MHz, even when the ULA is not accessing the RAM. Fortunately, when needing to
paul@116 36
refresh the display, the ULA is still able to make use of the idle part of
paul@116 37
each 1MHz cycle (or, rather, the idle 2MHz cycle unused by the CPU) to itself
paul@116 38
access the RAM at a rate of 1 byte per 1MHz cycle (or 1 byte every other 2MHz
paul@116 39
cycle), thus supporting the less demanding screen modes.
paul@116 40
paul@22 41
Timing
paul@22 42
------
paul@22 43
paul@40 44
According to 15.3.2 in the Advanced User Guide, there are 312 scanlines, 256
paul@40 45
of which are used to generate pixel data. At 50Hz, this means that 128 cycles
paul@40 46
are spent on each scanline (2000000 cycles / 50 = 40000 cycles; 40000 cycles /
paul@40 47
312 ~= 128 cycles). This is consistent with the observation that each scanline
paul@37 48
requires at most 80 bytes of data, and that the ULA is apparently busy for 40
paul@37 49
out of 64 microseconds in each scanline.
paul@22 50
paul@78 51
(In fact, since the ULA is seeking to provide an image for an interlaced
paul@78 52
625-line display, there are in fact two "fields" involved, one providing 312
paul@78 53
scanlines and one providing 313 scanlines. See below for a description of the
paul@78 54
video system.)
paul@78 55
paul@33 56
Access to RAM involves accessing four 64Kb dynamic RAM devices (IC4 to IC7,
paul@33 57
each providing two bits of each byte) using two cycles within the 500ns period
paul@36 58
of the 2MHz clock to complete each access operation. Since the CPU and ULA
paul@36 59
have to take turns in accessing the RAM in MODE 4, 5 and 6, the CPU must
paul@36 60
effectively run at 1MHz (since every other 500ns period involves the ULA
paul@115 61
accessing RAM) during transfers of screen data.
paul@33 62
paul@115 63
The CPU is driven by an external clock (IC8) whose 16MHz frequency is divided
paul@138 64
by the ULA (IC1) depending on the screen mode in use. Each 16MHz cycle is
paul@115 65
approximately 62.5ns. To access the memory, the following patterns
paul@115 66
corresponding to 16MHz cycles are required:
paul@37 67
paul@99 68
     Time (ns):  0-------------- 500------------- ...
paul@99 69
   2 MHz cycle:  0               1                ...
paul@99 70
  16 MHz cycle:  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7  ...
paul@99 71
                 /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
paul@100 72
          ~RAS:  /---\___________/---\___________ ...
paul@100 73
          ~CAS:  /-----\___/-\___/-----\___/-\___ ...
paul@101 74
Address events:      A B     C       A B     C    ...
paul@139 75
   Data events:        ...F  ...S      ...F  ...S ...
paul@139 76
           ~WE:        W               W          ...
paul@37 77
paul@101 78
      ~RAS ops:  1   0           1   0            ...
paul@101 79
      ~CAS ops:  1     0   1 0   1     0   1 0    ...
paul@101 80
paul@138 81
   Address ops:     a.b.    c.      a.b.    c.    ...
paul@101 82
      Data ops:  s         f     s         f      ...
paul@101 83
paul@139 84
       PHI OUT:  ----\_______/------------------- ...
paul@139 85
     CPU (RAM):      .....L  ....D                ...
paul@139 86
           RnW:      .....R                       ...
paul@99 87
paul@139 88
       PHI OUT:  ----\_______/-------\_______/--- ...
paul@139 89
     CPU (ROM):  D   .....L  ....D   .....L  .... ...
paul@139 90
           RnW:      .....R          .....R       ...
paul@97 91
paul@101 92
~RAS must be high for 100ns, ~CAS must be high for 50ns.
paul@101 93
~RAS must be low for 150ns, ~CAS must be low for 90ns.
paul@101 94
Data is available 150ns after ~RAS goes low, 90ns after ~CAS goes low.
paul@101 95
paul@64 96
Here, "A" and "B" respectively indicate the row and first column addresses
paul@64 97
being latched into the RAM (on a negative edge for ~RAS and ~CAS
paul@64 98
respectively), and "C" indicates the second column address being latched into
paul@64 99
the RAM. Presumably, the first and second half-bytes can be read at "F" and
paul@64 100
"S" respectively, and the row and column addresses must be made available at
paul@138 101
"a" and "b" (and "c") respectively at the latest. The TM4164EC4 datasheet
paul@138 102
suggests that the addresses can be made available as the ~RAS and ~CAS levels
paul@138 103
are brought low. Data can be read at "f" and "s" for the first and second
paul@138 104
half-bytes respectively.
paul@64 105
paul@64 106
The TM4164EC4-15 has a row address access time of 150ns (maximum) and a column
paul@99 107
address access time of 90ns (maximum), which appears to mean that ~RAS must be
paul@99 108
held low for at least 150ns and that ~CAS must be held low for at least 90ns
paul@99 109
before data becomes available. 150ns is 2.4 cycles (at 16MHz) and 90ns is 1.44
paul@99 110
cycles. Thus, "A" to "F" is 2.5 cycles, "B" to "F" is 1.5 cycles, "C" to "S"
paul@99 111
is 1.5 cycles.
paul@37 112
paul@38 113
Note that the Service Manual refers to the negative edge of RAS and CAS, but
paul@38 114
the datasheet for the similar TM4164EC4 product shows latching on the negative
paul@38 115
edge of ~RAS and ~CAS. It is possible that the Service Manual also intended to
paul@38 116
communicate the latter behaviour. In the TM4164EC4 datasheet, it appears that
paul@38 117
"page mode" provides the appropriate behaviour for that particular product.
paul@38 118
paul@76 119
The CPU, when accessing the RAM alone, apparently does not make use of the
paul@76 120
vacated "slot" that the ULA would otherwise use (when interleaving accesses in
paul@76 121
MODE 4, 5 and 6). It only employs a full 2MHz access frequency to memory when
paul@103 122
accessing ROM (and potentially sideways RAM). The principal limitation is the
paul@103 123
amount of time needed between issuing an address and receiving an entire byte
paul@103 124
from the RAM, which is approximately 7 cycles (at 16MHz): much longer than the
paul@103 125
4 cycles that would be required for 2MHz operation.
paul@76 126
paul@139 127
Write operations expose some uncertainty about the relationship between the
paul@139 128
ULA's RAM access schedule and the PHI OUT clock. The Service Manual shows PHI
paul@139 129
IN (which should be the ULA's PHI OUT signal) as being synchronised with ~RAS.
paul@139 130
Since the CPU makes its address available potentially as late as 140ns after
paul@139 131
its PHI2 clock goes low (this clock being broadly similar to PHI OUT), it
paul@139 132
would make no sense to expect the ULA to be able perform a memory access
paul@139 133
immediately. What seems more likely is that the CPU makes data available, and
paul@139 134
this is written during the next 2MHz cycle.
paul@139 135
paul@139 136
For the CPU, "L" indicates the point at which an address is taken from the CPU
paul@139 137
address bus, following a negative edge of PHI OUT, with "D" being the point at
paul@139 138
which data may be asserted for writing, following a positive edge of PHI OUT.
paul@139 139
Here, PHI OUT is driven at 1MHz. Given that ~WE needs to be driven low for
paul@139 140
writing or high for reading, and thus propagates RnW from the CPU, this would
paul@139 141
need to be done before data would be retrieved and, according to the TM4164EC4
paul@139 142
datasheet, even as late as the column address is presented and ~CAS brought
paul@139 143
low.
paul@139 144
paul@139 145
It must be concluded that where accesses are interleaved between the CPU and
paul@139 146
ULA, the CPU access begins concurrently with the ULA access, with the CPU
paul@139 147
address and data retained by the ULA, and after the ULA access, the rest of
paul@139 148
the CPU transaction occurs in the following 2MHz cycle.
paul@139 149
paul@57 150
See: Acorn Electron Advanced User Guide
paul@57 151
See: Acorn Electron Service Manual
paul@115 152
     http://chrisacorns.computinghistory.org.uk/docs/Acorn/Manuals/Acorn_ElectronSM.pdf
paul@57 153
See: http://mdfs.net/Docs/Comp/Electron/Techinfo.htm
paul@76 154
See: http://stardot.org.uk/forums/viewtopic.php?p=120438#p120438
paul@121 155
See: One of the Most Popular 65,536-Bit (64K) Dynamic RAMs The TMS 4164
paul@121 156
     http://smithsonianchips.si.edu/augarten/p64.htm
paul@139 157
See: https://www.mups.co.uk/project/hardware/acorn_electron/
paul@139 158
See: Rockwell R650X and R651X Microprocessors (CPU)
paul@139 159
See: http://wilsonminesco.com/6502primer/
paul@76 160
paul@119 161
A Note on 8-Bit Wide RAM Access
paul@119 162
-------------------------------
paul@119 163
paul@119 164
It is worth considering the timing when 8 bits of data can be obtained at once
paul@119 165
from the RAM chips:
paul@119 166
paul@119 167
     Time (ns):  0-------------- 500------------- ...
paul@119 168
   2 MHz cycle:  0               1                ...
paul@119 169
   8 MHz cycle:  0   1   2   3   0   1   2   3    ...
paul@119 170
                 /-\_/-\_/-\_/-\_/-\_/-\_/-\_/-\_ ...
paul@119 171
          ~RAS:  /---\___________/---\___________ ...
paul@119 172
          ~CAS:  /-------\_______/-------\_______ ...
paul@119 173
Address events:      A   B           A   B        ...
paul@139 174
   Data events:          ...E            ...E     ...
paul@139 175
           ~WE:          W               W        ...
paul@119 176
paul@119 177
      ~RAS ops:  1   0           1   0            ...
paul@119 178
      ~CAS ops:  1       0       1       0        ...
paul@119 179
paul@139 180
   Address ops:     a.  b.          a.  b.        ...
paul@119 181
      Data ops:            f     s         f      ...
paul@119 182
paul@139 183
       PHI OUT:  ----\_______/-------\_______/--- ...
paul@139 184
           CPU:  D   .....L  ....D   .....L  .... ...
paul@139 185
           RnW:      .....R          .....R        ...
paul@119 186
paul@120 187
Here, "E" indicates the availability of an entire byte.
paul@120 188
paul@119 189
Since only one fetch is required per 2MHz cycle, instead of two fetches for
paul@119 190
the 4-bit wide RAM arrangement, it seems likely that longer 8MHz cycles could
paul@119 191
be used to coordinate the necessary signalling.
paul@119 192
paul@120 193
Another conceivable simplification from using an 8-bit wide RAM access channel
paul@120 194
with a single access within each 2MHz cycle is the possibility of allowing the
paul@120 195
CPU to signal directly to the RAM instead of having the ULA perform the access
paul@124 196
signalling on the CPU's behalf. Note that it is this more leisurely signalling
paul@124 197
that would allow the CPU to conduct accesses at 2MHz: the "compressed"
paul@124 198
signalling being beyond the capabilities of the CPU.
paul@120 199
paul@122 200
Note that 16MHz cycles would still be needed for the pixel clock in MODE 0,
paul@122 201
which needs to output eight pixels per 2MHz cycle, producing 640 monochrome
paul@122 202
pixels per 80-byte line.
paul@122 203
paul@124 204
An obvious consideration with regard to 8-bit wide access is whether the ULA
paul@124 205
could still conduct the "compressed" signalling for its own RAM accesses:
paul@124 206
paul@124 207
     Time (ns):  0-------------- 500------------- ...
paul@124 208
   2 MHz cycle:  0               1                ...
paul@124 209
  16 MHz cycle:  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7  ...
paul@124 210
                 /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
paul@124 211
          ~RAS:  /---\___________/---\___________ ...
paul@124 212
          ~CAS:  /-----\___/-\___/-----\___/-\___ ...
paul@124 213
Address events:      A B     C       A B     C    ...
paul@139 214
   Data events:        ...1  ...2      ...1  ...2 ...
paul@139 215
           ~WE:        W               W          ...
paul@124 216
paul@124 217
      ~RAS ops:  1   0           1   0            ...
paul@124 218
      ~CAS ops:  1     0   1 0   1     0   1 0    ...
paul@124 219
paul@139 220
   Address ops:     a.b.    c       a.b.    c     ...
paul@124 221
      Data ops:  s         f     s         f      ...
paul@124 222
paul@139 223
       PHI OUT:  ----\_______/-------\_______/--- ...
paul@139 224
           CPU:  D   .....L  ....D   .....L  .... ...
paul@139 225
           RnW:      .....R          .....R        ...
paul@124 226
paul@124 227
Here, "1" and "2" in the data events correspond to whole byte accesses,
paul@124 228
effectively upgrading the half-byte "F" and "S" events in the existing ULA
paul@124 229
arrangement.
paul@124 230
paul@124 231
Although the provision of access for the CPU would adhere to the relevant
paul@124 232
timing constraints, providing only one byte per 2MHz cycle, the ULA could
paul@124 233
obtain two bytes per cycle. This would then free up bandwidth for the CPU in
paul@124 234
screen modes where the ULA would normally be dominant (MODE 0 to 3), albeit at
paul@124 235
the cost of extra buffering. Such buffering could also be done for modes where
paul@124 236
the bandwidth is shared (MODE 4 to 6), consolidating pairs of ULA accesses into
paul@124 237
single cycles and freeing up an extra cycle for CPU accesses.
paul@124 238
paul@131 239
A further consideration is whether the CPU and ULA could access the memory on
paul@131 240
interleaved 4MHz cycles, thus replicating the arrangement used by the CPU and
paul@131 241
Video ULA on the BBC Micro. One potential obstacle is that the apparent 4MHz
paul@131 242
access rate employed by the ULA does not involve the complete process for
paul@131 243
accessing the RAM: upon setting up the address and issuing the ~RAS signal,
paul@131 244
the ULA is able to make a pair of column accesses on the same "row" of memory,
paul@134 245
effectively achieving an average access rate of 4MHz in an 8-bit
paul@134 246
configuration.
paul@131 247
paul@131 248
However, if arbitrary pairs of column accesses were to be attempted, as would
paul@131 249
be required by CPU and ULA interleaving, the ~RAS signal would need to be
paul@131 250
re-issued with different addresses being set up. This would expand the time to
paul@131 251
access a memory location to beyond the period of a 4MHz cycle, making it
paul@131 252
impossible to employ interleaved accesses at such a rate.
paul@131 253
paul@134 254
In conclusion, a strict interleaving strategy is not possible, but by using
paul@134 255
pixel data buffering and employing two ULA accesses per 2MHz cycle to obtain
paul@134 256
two bytes in that cycle, each adjacent 2MHz cycle can be given to the CPU,
paul@134 257
thus achieving an effective throughput during display update periods of 3
paul@134 258
bytes for every pair of cycles (2 bytes for the ULA, 1 byte for the CPU), and
paul@134 259
thus 1.5 bytes per cycle, giving an illusion of 3MHz access to RAM.
paul@134 260
paul@135 261
Some other considerations apply to introducing 8-bit wide access. The ULA
paul@135 262
employs four pins for data transfer to and from the memory devices (RAM0..3),
paul@135 263
and obviously another four pins would be needed in an 8-bit wide scheme.
paul@135 264
However, there may have been a physical limitation on the number of pins
paul@135 265
permissible on a ULA package or the device's socket. This would necessitate
paul@135 266
the reassignment of pins, although few are readily available for such
paul@135 267
reassignment.
paul@135 268
paul@135 269
One approach might involve connecting the RAM devices to the CPU data bus,
paul@135 270
with each line connecting to a different RAM chip. The signalling of the RAM
paul@135 271
would remain under the control of the ULA, thus preventing the RAM devices
paul@135 272
from interfering with other memory transfer operations, with the ROM
paul@135 273
signalling also remaining under the ULA's control. One potential disadvantage
paul@135 274
of this scheme would involve the elimination of the separate data paths
paul@135 275
between the CPU and ROM and between the ULA and RAM.
paul@135 276
paul@135 277
Another approach might involve reclaiming the keyboard input pins (KBD0..3) as
paul@135 278
data pins for ULA access to RAM. This would necessitate the reorganisation of
paul@135 279
the keyboard interface, perhaps integrating the keyboard matrix more directly
paul@135 280
as a kind of ROM device. A bus transceiver could be used to isolate the
paul@135 281
keyboard inputs, with a pin being used to control the transceiver, since the
paul@135 282
keyboard data lines are pulled high. In effect, the transceiver would act as a
paul@135 283
kind of output enable for the keyboard.
paul@135 284
paul@135 285
To make the matrix appear within the sideways ROM region of the memory map,
paul@135 286
A15 would need to be set to a high value and A14 to a low value. Signals A13
paul@135 287
to A0 would then be brought low to select the appropriate column, with the
paul@135 288
individual key states being made available via data lines, perhaps D3 to D0.
paul@135 289
This mostly retains the existing addressing arrangement and scanning
paul@135 290
mechanism. Internally, the ULA would continue to enable access to the keyboard
paul@135 291
through the ROM paging mechanism, but instead of integrating separate data
paul@135 292
pins into the CPU's data path, it would integrate the keyboard inputs using
paul@135 293
the transceiver.
paul@135 294
paul@135 295
Enhancement: Keyboard Matrix Scanning
paul@135 296
-------------------------------------
paul@135 297
paul@135 298
The keyboard scanning mechanism is presumably designed to be as inexpensive as
paul@135 299
possible, being driven by software and avoiding extra logic, but at the
paul@135 300
expense of occupying large regions of the memory map when paged in. A more
paul@135 301
efficient mapping of the keyboard columns could possibly be done using
paul@135 302
decoders such as the 74xx138 part which permits the decoding of three inputs
paul@135 303
to select one of eight outputs. Using two of these parts, six address lines
paul@135 304
would be dedicated to the keyboard columns as follows:
paul@135 305
paul@135 306
  A5...A3 select up to eight columns via one decoder
paul@135 307
  A2...A0 select up to eight columns via another decoder
paul@135 308
paul@135 309
In this arrangement, only one of the two ranges of pins would be used at any
paul@135 310
given time. If the ULA were to require a certain combination of the remaining
paul@135 311
address bits, a region as small as 64 bytes could be dedicated to the
paul@135 312
keyboard.
paul@135 313
paul@135 314
A more efficient arrangement could be used by introducing logic that allows
paul@135 315
the decoders to work together to address the keyboard:
paul@135 316
paul@135 317
  A2...A0 select up to eight columns via both decoders
paul@135 318
  A3 would enable one decoder if low and the other decoder if high
paul@135 319
paul@135 320
With ULA constraints on the remaining address bits, a 16-byte region could be
paul@135 321
used to represent the keyboard.
paul@135 322
paul@135 323
A further refinement might involve combining the existing columns into groups
paul@135 324
of eight keys. This would reduce the number of columns to seven, requiring
paul@135 325
only three address lines, with all eight data lines being used to read the
paul@135 326
matrix.
paul@135 327
paul@135 328
On the BBC Micro, the system 6522 VIA is used to monitor and read from the
paul@135 329
keyboard. The memory locations involved with this chip are located in the
paul@135 330
region from &FE40 to &FE7F inclusive, although the memory is allocated in a
paul@135 331
way that is appropriate to operate that chip, as opposed to merely exposing
paul@135 332
the keyboard matrix.
paul@135 333
paul@135 334
Enhancement: Hardware Device Selection
paul@135 335
--------------------------------------
paul@135 336
paul@135 337
An alternative to the existing, rather cumbersome, sideways ROM mapping of the
paul@135 338
keyboard might involve making it accessible via a hardware-related memory page
paul@135 339
like page FE. With ULA addresses confined to FE0x, and with the ULA itself
paul@135 340
having to trap accesses to page FE, the page selection signal might be brought
paul@135 341
out of the ULA instead of any dedicated signal for the keyboard. Various
paul@135 342
address lines corresponding to A7 through A4, or a subset of these, could be
paul@135 343
fed into a decoder to permit the selection of other devices, with the keyboard
paul@135 344
being one of these.
paul@135 345
paul@135 346
Meanwhile, a more efficient keyboard mapping using the above matrix
paul@135 347
enhancement would permit the different keyboard columns to appear as a group
paul@135 348
of sixteen or eight bytes. Thus:
paul@135 349
paul@135 350
  A15...A8 select page FE
paul@135 351
   A7...A4 select a device or peripheral
paul@135 352
   A3...A0 select a register or keyboard column
paul@135 353
paul@135 354
Conceivably, devices such as sound generators could be mapped to device
paul@135 355
regions.
paul@135 356
paul@110 357
CPU Clock Notes
paul@110 358
---------------
paul@110 359
paul@111 360
"The 6502 receives an external square-wave clock input signal on pin 37, which
paul@111 361
is usually labeled PHI0. [...] This clock input is processed within the 6502
paul@111 362
to form two clock outputs: PHI1 and PHI2 (pins 3 and 39, respectively). PHI2
paul@111 363
is essentially a copy of PHI0; more specifically, PHI2 is PHI0 after it's been
paul@111 364
through two inverters and a push-pull amplifier. The same network of
paul@111 365
transistors within the 6502 which generates PHI2 is also tied to PHI1, and
paul@111 366
generates PHI1 as the inverse of PHI0. The reason why PHI1 and PHI2 are made
paul@111 367
available to external devices is so that they know when they can access the
paul@111 368
CPU. When PHI1 is high, this means that external devices can read from the
paul@111 369
address bus or data bus; when PHI2 is high, this means that external devices
paul@111 370
can write to the data bus."
paul@111 371
paul@111 372
See: http://lateblt.livejournal.com/88105.html
paul@111 373
paul@110 374
"The 6502 has a synchronous memory bus where the master clock is divided into
paul@110 375
two phases (Phase 1 and Phase 2). The address is always generated during Phase
paul@110 376
1 and all memory accesses take place during Phase 2."
paul@110 377
paul@111 378
See: http://www.jmargolin.com/vgens/vgens.htm
paul@110 379
paul@111 380
Thus, the inverse of PHI OUT provides the "other phase" of the clock. "During
paul@111 381
Phase 1" means when PHI0 - really PHI2 - is high and "during Phase 2" means
paul@111 382
when PHI1 is high.
paul@110 383
paul@76 384
Bandwidth Figures
paul@76 385
-----------------
paul@76 386
paul@76 387
Using an observation of 128 2MHz cycles per scanline, 256 active lines and 312
paul@76 388
total lines, with 80 cycles occurring in the active periods of display
paul@76 389
scanlines, the following bandwidth calculations can be performed:
paul@76 390
paul@76 391
Total theoretical maximum:
paul@76 392
       128 cycles * 312 lines
paul@76 393
     = 39936 bytes
paul@76 394
paul@76 395
MODE 0, 1, 2:
paul@76 396
ULA:    80 cycles * 256 lines
paul@76 397
     = 20480 bytes
paul@76 398
CPU:    48 cycles / 2 * 256 lines
paul@76 399
     + 128 cycles / 2 * (312 - 256) lines
paul@76 400
     = 9728 bytes
paul@76 401
paul@76 402
MODE 3:
paul@76 403
ULA:    80 cycles * 24 rows * 8 lines
paul@76 404
     = 15360 bytes
paul@76 405
CPU:    48 cycles / 2 * 24 rows * 8 lines
paul@76 406
     + 128 cycles / 2 * (312 - (24 rows * 8 lines))
paul@76 407
     = 12288 bytes
paul@76 408
paul@76 409
MODE 4, 5:
paul@76 410
ULA:    40 cycles * 256 lines
paul@76 411
     = 10240 bytes
paul@76 412
CPU:   (40 cycles + 48 cycles / 2) * 256 lines
paul@76 413
     + 128 cycles / 2 * (312 - 256) lines
paul@76 414
     = 19968 bytes
paul@76 415
paul@76 416
MODE 6:
paul@76 417
ULA:    40 cycles * 24 rows * 8 lines
paul@76 418
     = 7680 bytes
paul@76 419
CPU:   (40 cycles + 48 cycles / 2) * 24 rows * 8 lines
paul@76 420
     + 128 cycles / 2 * (312 - (24 rows * 8 lines))
paul@76 421
     = 19968 bytes
paul@76 422
paul@76 423
Here, the division of 2 for CPU accesses is performed to indicate that the CPU
paul@76 424
only uses every other access opportunity even in uncontended periods. See the
paul@76 425
2MHz RAM Access enhancement below for bandwidth calculations that consider
paul@76 426
this limitation removed.
paul@57 427
paul@123 428
A summary of the bandwidth figures is as follows (with extra timing details
paul@123 429
described below):
paul@123 430
paul@123 431
                Standard ULA    % Total   Slowdown  BBC-10s BBC-34s
paul@123 432
MODE 0, 1, 2    9728 bytes      24%       4.11      43s     105s
paul@123 433
MODE 3          12288 bytes     31%       3.25      34s
paul@123 434
MODE 4, 5       19968 bytes     50%       2         20s
paul@123 435
MODE 6          19968 bytes     50%       2         20s     50s
paul@123 436
paul@123 437
The review of the Electron in Practical Computing (October 1983) provides a
paul@123 438
concise overview of the RAM access limitations and gives timing comparisons
paul@123 439
between modes and BBC Micro performance. In the above, "BBC-10s" is the
paul@123 440
measured or stated time given for a program taking 10 seconds on the BBC
paul@123 441
Micro, whereas "BBC-34s" is the apparently measured time given for the
paul@123 442
"Persian" program taking 34 seconds to complete on the BBC Micro, with a
paul@123 443
"quick" mode presumably switching to MODE 6 using the ULA directly in order to
paul@123 444
reduce display bandwidth usage while the program draws to the screen.
paul@123 445
Evidently, the measured slowdown is slightly lower than the theoretical
paul@123 446
slowdown, most likely due to the running time not being entirely dominated by
paul@123 447
RAM access performance characteristics.
paul@123 448
paul@40 449
Video Timing
paul@40 450
------------
paul@40 451
paul@40 452
According to 8.7 in the Service Manual, and the PAL Wikipedia page,
paul@40 453
approximately 4.7µs is used for the sync pulse, 5.7µs for the "back porch"
paul@40 454
(including the "colour burst"), and 1.65µs for the "front porch", totalling
paul@40 455
12.05µs and thus leaving 51.95µs for the active video signal for each
paul@40 456
scanline. As the Service Manual suggests in the oscilloscope traces, the
paul@40 457
display information is transmitted more or less centred within the active
paul@40 458
video period since the ULA will only be providing pixel data for 40µs in each
paul@40 459
scanline.
paul@39 460
paul@39 461
Each 62.5ns cycle happens to correspond to 64µs divided by 1024, meaning that
paul@39 462
each scanline can be divided into 1024 cycles, although only 640 at most are
paul@40 463
actively used to provide pixel data. Pixel data production should only occur
paul@40 464
within a certain period on each scanline, approximately 262 cycles after the
paul@40 465
start of hsync:
paul@40 466
paul@40 467
  active video period = 51.95µs
paul@40 468
  pixel data period = 40µs
paul@40 469
  total silent period = 51.95µs - 40µs = 11.95µs
paul@40 470
  silent periods (before and after) = 11.95µs / 2 = 5.975µs
paul@40 471
  hsync and back porch period = 4.7µs + 5.7µs = 10.4µs
paul@40 472
  time before pixel data period = 10.4µs + 5.975µs = 16.375µs
paul@40 473
  pixel data period start cycle = 16.375µs / 62.5ns = 262
paul@40 474
paul@40 475
By choosing a number divisible by 8, the RAM access mechanism can be
paul@84 476
synchronised with the pixel production. Thus, 256 is a more appropriate start
paul@84 477
cycle, where the HS (horizontal sync) signal corresponding to the 4µs sync
paul@84 478
pulse (or "normal sync" pulse as described by the "PAL TV timing and voltages"
paul@84 479
document) occurs at cycle 0.
paul@84 480
paul@84 481
To summarise:
paul@84 482
paul@84 483
  HS signal starts at cycle 0 on each horizontal scanline
paul@84 484
  HS signal ends approximately 4µs later at cycle 64
paul@84 485
  Pixel data starts approximately 12µs later at cycle 256
paul@84 486
paul@84 487
"Re: Electron Memory Contention" provides measurements that appear consistent
paul@84 488
with these calculations.
paul@40 489
paul@40 490
The "vertical blanking period", meaning the period before picture information
paul@78 491
in each field is 25 lines out of 312 (or 313) and thus lasts for 1.6ms. Of
paul@78 492
this, 2.5 lines occur before the vsync (field sync) which also lasts for 2.5
paul@78 493
lines. Thus, the first visible scanline on the first field of a frame occurs
paul@84 494
half way through the 23rd scanline period measured from the start of vsync
paul@84 495
(indicated by "V" in the diagrams below):
paul@40 496
paul@40 497
                                        10                  20    23
paul@40 498
  Line in frame:       1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
paul@40 499
    Line from 1:       0                                          22 3
paul@40 500
 Line on screen: .:::::VVVVV:::::                                   12233445566
paul@40 501
                  |_________________________________________________|
paul@40 502
                           25 line vertical blanking period
paul@40 503
paul@40 504
In the second field of a frame, the first visible scanline coincides with the
paul@40 505
24th scanline period measured from the start of line 313 in the frame:
paul@40 506
paul@40 507
               310                                                 336
paul@40 508
  Line in frame: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
paul@78 509
  Line from 313:       0                                            23 4
paul@40 510
 Line on screen: 88:::::VVVVV::::                                    11223344
paul@40 511
               288 |                                                 |
paul@40 512
                   |_________________________________________________|
paul@40 513
                            25 line vertical blanking period
paul@40 514
paul@40 515
In order to consider only full lines, we might consider the start of each
paul@40 516
frame to occur 23 lines after the start of vsync.
paul@40 517
paul@40 518
Again, it is likely that pixel data production should only occur on scanlines
paul@40 519
within a certain period on each frame. The "625/50" document indicates that
paul@40 520
only a certain region is "safe" to use, suggesting a vertically centred region
paul@84 521
with approximately 15 blank lines above and below the picture. However, the
paul@84 522
"PAL TV timing and voltages" document suggests 28 blank lines above and below
paul@84 523
the picture. This would centre the 256 lines within the 312 lines of each
paul@84 524
field and thus provide a start of picture approximately 5.5 or 5 lines after
paul@84 525
the end of the blanking period or 28 or 27.5 lines after the start of vsync.
paul@84 526
paul@84 527
To summarise:
paul@84 528
paul@84 529
  CSYNC signal starts at cycle 0
paul@84 530
  CSYNC signal ends approximately 160µs (2.5 lines) later at cycle 2560
paul@84 531
  Start of line occurs approximately 1632µs (5.5 lines) later at cycle 28672
paul@40 532
paul@57 533
See: http://en.wikipedia.org/wiki/PAL
paul@57 534
See: http://en.wikipedia.org/wiki/Analog_television#Structure_of_a_video_signal
paul@57 535
See: The 625/50 PAL Video Signal and TV Compatible Graphics Modes
paul@57 536
     http://lipas.uwasa.fi/~f76998/video/modes/
paul@57 537
See: PAL TV timing and voltages
paul@57 538
     http://www.retroleum.co.uk/electronics-articles/pal-tv-timing-and-voltages/
paul@57 539
See: Line Standards
paul@57 540
     http://www.pembers.freeserve.co.uk/World-TV-Standards/Line-Standards.html
paul@84 541
See: Horizontal Blanking Interval of 405-, 525-, 625- and 819-Line Standards
paul@84 542
     http://www.pembers.freeserve.co.uk/World-TV-Standards/HBI.pdf
paul@84 543
See: Re: Electron Memory Contention
paul@84 544
     http://www.stardot.org.uk/forums/viewtopic.php?p=134109#p134109
paul@57 545
paul@56 546
RAM Integrated Circuits
paul@56 547
-----------------------
paul@56 548
paul@65 549
Unicorn Electronics appears to offer 4164 RAM chips (as well as 6502 series
paul@65 550
CPUs such as the 6502, 6502A, 6502B and 65C02). These 4164 devices are
paul@65 551
available in 100ns (4164-100), 120ns (4164-120) and 150ns (4164-150) variants,
paul@73 552
have 16 pins and address 65536 bits through a 1-bit wide channel. Similarly,
paul@73 553
ByteDelight.com sell 4164 devices primarily for the ZX Spectrum.
paul@65 554
paul@56 555
The documentation for the Electron mentions 4164-15 RAM chips for IC4-7, and
paul@64 556
the Samsung-produced KM41464 series is apparently equivalent to the Texas
paul@56 557
Instruments 4164 chips presumably used in the Electron.
paul@56 558
paul@56 559
The TM4164EC4 series combines 4 64K x 1b units into a single package and
paul@57 560
appears similar to the TM4164EA4 featured on the Electron's circuit diagram
paul@57 561
(in the Advanced User Guide but not the Service Manual), and it also has 22
paul@56 562
pins providing 3 additional inputs and 3 additional outputs over the 16 pins
paul@57 563
of the individual 4164-15 modules, presumably allowing concurrent access to
paul@57 564
the packaged memory units.
paul@56 565
paul@56 566
As far as currently available replacements are concerned, the NTE4164 is a
paul@57 567
potential candidate: according to the Vetco Electronics entry, it is
paul@57 568
supposedly a replacement for the TMS4164-15 amongst many other parts. Similar
paul@57 569
parts include the NTE2164 and the NTE6664, both of which appear to have
paul@57 570
largely the same performance and connection characteristics. Meanwhile, the
paul@58 571
NTE21256 appears to be a 16-pin replacement with four times the capacity that
paul@58 572
maintains the single data input and output pins. Using the NTE21256 as a
paul@57 573
replacement for all ICs combined would be difficult because of the single bit
paul@57 574
output.
paul@56 575
paul@57 576
Another device equivalent to the 4164-15 appears to be available under the
paul@57 577
code 41662 from Jameco Electronics as the Siemens HYB 4164-2. The Jameco Web
paul@57 578
site lists data sheets for other devices on the same page, but these are
paul@57 579
different and actually appear to be provided under the 41574 product code (but
paul@57 580
are listed under 41464-10) and appear to be replacements for the TM4164EC4:
paul@57 581
the Samsung KM41464A-15 and NEC µPD41464 employ 18 pins, eliminating 4 pins by
paul@57 582
employing 4 pins for both input and output.
paul@57 583
paul@64 584
            Pins    I/O pins    Row access  Column access
paul@64 585
            ----    --------    ----------  -------------
paul@64 586
TM4164EC4   22      4 + 4       150ns (15)  90ns (15)
paul@64 587
KM41464AP   18      4           150ns (15)  75ns (15)
paul@64 588
NTE21256    16      1 + 1       150ns       75ns
paul@64 589
HYB 4164-2  16      1 + 1       150ns       100ns
paul@64 590
µPD41464    18      4           120ns (12)  60ns (12)
paul@64 591
paul@40 592
See: TM4164EC4 65,536 by 4-Bit Dynamic RAM Module
paul@136 593
     https://www.rocelec.com/part/REITM4164EC4-15L
paul@65 594
See: Dynamic RAMS
paul@65 595
     http://www.unicornelectronics.com/IC/DYNAMIC.html
paul@73 596
See: New old stock 8x 4164 chips
paul@73 597
     http://www.bytedelight.com/?product=8x-4164-chips-new-old-stock
paul@56 598
See: KM4164B 64K x 1 Bit Dynamic RAM with Page Mode
paul@56 599
     http://images.ihscontent.net/vipimages/VipMasterIC/IC/SAMS/SAMSD020/SAMSD020-45.pdf
paul@57 600
See: NTE2164 Integrated Circuit 65,536 X 1 Bit Dynamic Random Access Memory
paul@57 601
     http://www.vetco.net/catalog/product_info.php?products_id=2806
paul@56 602
See: NTE4164 - IC-NMOS 64K DRAM 150NS
paul@56 603
     http://www.vetco.net/catalog/product_info.php?products_id=3680
paul@56 604
See: NTE21256 - IC-256K DRAM 150NS
paul@56 605
     http://www.vetco.net/catalog/product_info.php?products_id=2799
paul@56 606
See: NTE21256 262,144-Bit Dynamic Random Access Memory (DRAM)
paul@56 607
     http://www.nteinc.com/specs/21000to21999/pdf/nte21256.pdf
paul@57 608
See: NTE6664 - IC-MOS 64K DRAM 150NS
paul@57 609
     http://www.vetco.net/catalog/product_info.php?products_id=5213
paul@57 610
See: NTE6664 Integrated Circuit 64K-Bit Dynamic RAM
paul@57 611
     http://www.nteinc.com/specs/6600to6699/pdf/nte6664.pdf
paul@57 612
See: 4164-150: MAJOR BRANDS
paul@57 613
     http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41662_-1
paul@57 614
See: HYB 4164-1, HYB 4164-2, HYB 4164-3 65,536-Bit Dynamic Random Access Memory (RAM)
paul@57 615
     http://www.jameco.com/Jameco/Products/ProdDS/41662SIEMENS.pdf
paul@57 616
See: KM41464A NMOS DRAM 64K x 4 Bit Dynamic RAM with Page Mode
paul@57 617
     http://www.jameco.com/Jameco/Products/ProdDS/41662SAM.pdf
paul@57 618
See: NEC µ41464 65,536 x 4-Bit Dynamic NMOS RAM
paul@57 619
     http://www.jameco.com/Jameco/Products/ProdDS/41662NEC.pdf
paul@57 620
See: 41464-10: MAJOR BRANDS
paul@57 621
     http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41574_-1
paul@39 622
paul@43 623
Interrupts
paul@43 624
----------
paul@43 625
paul@43 626
The ULA generates IRQs (maskable interrupts) according to certain conditions
paul@43 627
and these conditions are controlled by location &FE00:
paul@43 628
paul@43 629
  * Vertical sync (bottom of displayed screen)
paul@43 630
  * 50MHz real time clock
paul@43 631
  * Transmit data empty
paul@43 632
  * Receive data full
paul@43 633
  * High tone detect
paul@43 634
paul@43 635
The ULA is also used to clear interrupt conditions through location &FE05. Of
paul@43 636
particular significance is bit 7, which must be set if an NMI (non-maskable
paul@43 637
interrupt) has occurred and has thus suspended ULA access to memory, restoring
paul@43 638
the normal function of the ULA.
paul@43 639
paul@43 640
ROM Paging
paul@43 641
----------
paul@43 642
paul@43 643
Accessing different ROMs involves bits 0 to 3 of &FE05. Some special ROM
paul@43 644
mappings exist:
paul@43 645
paul@43 646
   8    keyboard
paul@43 647
   9    keyboard (duplicate)
paul@43 648
  10    BASIC ROM
paul@43 649
  11    BASIC ROM (duplicate)
paul@43 650
paul@43 651
Paging in a ROM involves the following procedure:
paul@43 652
paul@43 653
 1. Assert ROM page enable (bit 3) together with a ROM number n in bits 0 to
paul@43 654
    2, corresponding to ROM number 8+n, such that one of ROMs 12 to 15 is
paul@43 655
    selected.
paul@43 656
 2. Where a ROM numbered from 0 to 7 is to be selected, set bit 3 to zero
paul@43 657
    whilst writing the desired ROM number n in bits 0 to 2.
paul@43 658
paul@81 659
See: http://stardot.org.uk/forums/viewtopic.php?p=136686#p136686
paul@81 660
paul@117 661
Keyboard Access
paul@117 662
---------------
paul@117 663
paul@117 664
The keyboard pages appear to be accessed at 1MHz just like the RAM.
paul@117 665
paul@117 666
See: https://stardot.org.uk/forums/viewtopic.php?p=254155#p254155
paul@117 667
paul@37 668
Shadow/Expanded Memory
paul@37 669
----------------------
paul@37 670
paul@37 671
The Electron exposes all sixteen address lines and all eight data lines
paul@37 672
through the expansion bus. Using such lines, it is possible to provide
paul@37 673
additional memory - typically sideways ROM and RAM - on expansion cards and
paul@37 674
through cartridges, although the official cartridge specification provides
paul@37 675
fewer address lines and only seeks to provide access to memory in 16K units.
paul@37 676
paul@37 677
Various modifications and upgrades were developed to offer "turbo"
paul@37 678
capabilities to the Electron, permitting the CPU to access a separate 8K of
paul@37 679
RAM at 2MHz, presumably preventing access to the low 8K of RAM accessible via
paul@37 680
the ULA through additional logic. However, an enhanced ULA might support
paul@37 681
independent CPU access to memory over the expansion bus by allowing itself to
paul@37 682
be discharged from providing access to memory, potentially for a range of
paul@37 683
addresses, and for the CPU to communicate with external memory uninterrupted.
paul@33 684
paul@72 685
Sideways RAM/ROM and Upper Memory Access
paul@72 686
----------------------------------------
paul@72 687
paul@72 688
Although the ULA controls the CPU clock, effectively slowing or stopping the
paul@72 689
CPU when the ULA needs to access screen memory, it is apparently able to allow
paul@72 690
the CPU to access addresses of &8000 and above - the upper region of memory -
paul@72 691
at 2MHz independently of any access to RAM that the ULA might be performing,
paul@72 692
only blocking the CPU if it attempts to access addresses of &7FFF and below
paul@72 693
during any ULA memory access - the lower region of memory - by stopping or
paul@72 694
stalling its clock.
paul@72 695
paul@72 696
Thus, the ULA remains aware of the level of the A15 line, only inhibiting the
paul@72 697
CPU clock if the line goes low, when the CPU is attempting to access the lower
paul@72 698
region of memory.
paul@72 699
paul@79 700
Hardware Scrolling (and Enhancement)
paul@79 701
------------------------------------
paul@0 702
paul@0 703
On the standard ULA, &FE02 and &FE03 map to a 9 significant bits address with
paul@0 704
the least significant 5 bits being zero, thus limiting the scrolling
paul@0 705
resolution to 64 bytes. An enhanced ULA could support a resolution of 2 bytes
paul@0 706
using the same layout of these addresses.
paul@0 707
paul@0 708
|--&FE02--------------| |--&FE03--------------|
paul@0 709
XX XX 14 13 12 11 10 09 08 07 06 XX XX XX XX XX
paul@0 710
paul@0 711
   XX 14 13 12 11 10 09 08 07 06 05 04 03 02 01 XX
paul@0 712
paul@4 713
Arguably, a resolution of 8 bytes is more useful, since the mapping of screen
paul@4 714
memory to pixel locations is character oriented. A change in 8 bytes would
paul@4 715
permit a horizontal scrolling resolution of 2 pixels in MODE 2, 4 pixels in
paul@4 716
MODE 1 and 5, and 8 pixels in MODE 0, 3 and 6. This resolution is actually
paul@4 717
observed on the BBC Micro (see 18.11.2 in the BBC Microcomputer Advanced User
paul@4 718
Guide).
paul@4 719
paul@4 720
One argument for a 2 byte resolution is smooth vertical scrolling. A pitfall
paul@4 721
of changing the screen address by 2 bytes is the change in the number of lines
paul@4 722
from the initial and final character rows that need reading by the ULA, which
paul@9 723
would need to maintain this state information (although this is a relatively
paul@9 724
trivial change). Another pitfall is the complication that might be introduced
paul@9 725
to software writing bitmaps of character height to the screen.
paul@4 726
paul@81 727
See: http://pastraiser.com/computers/acornelectron/acornelectron.html
paul@81 728
paul@82 729
Enhancement: Mode Layouts
paul@82 730
-------------------------
paul@82 731
paul@82 732
Merely changing the screen memory mappings in order to have Archimedes-style
paul@82 733
row-oriented screen addresses (instead of character-oriented addresses) could
paul@82 734
be done for the existing modes, but this might not be sufficiently beneficial,
paul@82 735
especially since accessing regions of the screen would involve incrementing
paul@82 736
pointers by amounts that are inconvenient on an 8-bit CPU.
paul@82 737
paul@82 738
However, instead of using a Archimedes-style mapping, column-oriented screen
paul@82 739
addresses could be more feasibly employed: incrementing the address would
paul@82 740
reference the vertical screen location below the currently-referenced location
paul@82 741
(just as occurs within characters using the existing ULA); instead of
paul@82 742
returning to the top of the character row and referencing the next horizontal
paul@82 743
location after eight bytes, the address would reference the next character row
paul@82 744
and continue to reference locations downwards over the height of the screen
paul@82 745
until reaching the bottom; at the bottom, the next location would be the next
paul@82 746
horizontal location at the top of the screen.
paul@82 747
paul@82 748
In other words, the memory layout for the screen would resemble the following
paul@82 749
(for MODE 2):
paul@82 750
paul@82 751
  &3000 &3100       ... &7F00
paul@82 752
  &3001 &3101
paul@82 753
  ...   ...
paul@82 754
  &3007
paul@82 755
  &3008
paul@82 756
  ...
paul@82 757
  ...                   ...
paul@82 758
  &30FF             ... &7FFF
paul@82 759
paul@82 760
Since there are 256 pixel rows, each column of locations would be addressable
paul@82 761
using the low byte of the address. Meanwhile, the high byte would be
paul@82 762
incremented to address different columns. Thus, addressing screen locations
paul@82 763
would become a lot more convenient and potentially much more efficient for
paul@82 764
certain kinds of graphical output.
paul@82 765
paul@82 766
One potential complication with this simplified addressing scheme arises with
paul@82 767
hardware scrolling. Vertical hardware scrolling by one pixel row (not supported
paul@82 768
with the existing ULA) would be achieved by incrementing or decrementing the
paul@82 769
screen start address; by one character row, it would involve adding or
paul@82 770
subtracting 8. However, the ULA only supports multiples of 64 when changing the
paul@82 771
screen start address. Thus, if such a scheme were to be adopted, three
paul@82 772
additional bits would need to be supported in the screen start register (see
paul@82 773
"Hardware Scrolling (and Enhancement)" for more details). However, horizontal
paul@82 774
scrolling would be much improved even under the severe constraints of the
paul@82 775
existing ULA: only adjustments of 256 to the screen start address would be
paul@82 776
required to produce single-location scrolling of as few as two pixels in MODE 2
paul@82 777
(four pixels in MODEs 1 and 5, eight pixels otherwise).
paul@82 778
paul@82 779
More disruptive is the effect of this alternative layout on software.
paul@82 780
Presumably, compatibility with the BBC Micro was the primary goal of the
paul@82 781
Electron's hardware design. With the character-oriented screen layout in
paul@82 782
place, system software (and application software accessing the screen
paul@82 783
directly) would be relying on this layout to run on the Electron with little
paul@82 784
or no modification. Although it might have been possible to change the system
paul@82 785
software to use this column-oriented layout instead, this would have incurred
paul@82 786
a development cost and caused additional work porting things like games to the
paul@82 787
Electron. Moreover, a separate branch of the software from that supporting the
paul@82 788
BBC Micro and closer derivatives would then have needed maintaining.
paul@82 789
paul@82 790
The decision to use the character-oriented layout in the BBC Micro may have
paul@82 791
been related to the choice of circuitry and to facilitate a convenient
paul@82 792
hardware implementation, and by the time the Electron was planned, it was too
paul@82 793
late to do anything about this somewhat unfortunate choice.
paul@82 794
paul@89 795
Pixel Layouts
paul@89 796
-------------
paul@89 797
paul@89 798
The pixel layouts are as follows:
paul@89 799
paul@89 800
  Modes         Depth (bpp)     Pixels (from bits)
paul@89 801
  -----         -----------     ------------------
paul@89 802
  0, 3, 4, 6    1               7 6 5 4 3 2 1 0
paul@89 803
  1, 5          2               73 62 51 40
paul@89 804
  2             4               7531 6420
paul@89 805
paul@89 806
Since the ULA reads a half-byte at a time, one might expect it to attempt to
paul@89 807
produce pixels for every half-byte, as opposed to handling entire bytes.
paul@89 808
However, the pixel layout is not conducive to producing pixels as soon as a
paul@89 809
half-byte has been read for a given full-byte location: in 1bpp modes the
paul@89 810
first four pixels can indeed be produced, but in 2bpp and 4bpp modes the pixel
paul@89 811
data is spread across the entire byte in different ways.
paul@89 812
paul@89 813
An alternative arrangement might be as follows:
paul@89 814
paul@89 815
  Modes         Depth (bpp)     Pixels (from bits)
paul@89 816
  -----         -----------     ------------------
paul@89 817
  0, 3, 4, 6    1               7 6 5 4 3 2 1 0
paul@89 818
  1, 5          2               76 54 32 10
paul@89 819
  2             4               7654 3210
paul@89 820
paul@89 821
Just as the mode layouts were presumably decided by compatibility with the BBC
paul@89 822
Micro, the pixel layouts will have been maintained for similar reasons.
paul@89 823
Unfortunately, this layout prevents any optimisation of the ULA for handling
paul@89 824
half-byte pixel data generally.
paul@89 825
paul@79 826
Enhancement: The Missing MODE 4
paul@79 827
-------------------------------
paul@79 828
paul@79 829
The Electron inherits its screen mode selection from the BBC Micro, where MODE
paul@79 830
3 is a text version of MODE 0, and where MODE 6 is a text version of MODE 4.
paul@79 831
Neither MODE 3 nor MODE 6 is a genuine character-based text mode like MODE 7,
paul@79 832
however, and they are merely implemented by skipping two scanlines in every
paul@79 833
ten after the eight required to produce a character line. Thus, such modes
paul@79 834
provide a 24-row display.
paul@79 835
paul@79 836
In principle, nothing prevents this "text mode" effect being applied to other
paul@79 837
modes. The 20-column modes are not well-suited to displaying text, which
paul@79 838
leaves MODE 1 which, unlike MODEs 3 and 6, can display 4 colours rather than
paul@79 839
2. Although the need for a non-monochrome 40-column text mode is addressed by
paul@79 840
MODE 7 on the BBC Micro, the Electron lacks such a mode.
paul@79 841
paul@79 842
If the 4-colour, 24-row variant of MODE 1 were to be provided, logically it
paul@79 843
would occupy MODE 4 instead of the current MODE 4:
paul@79 844
paul@79 845
  Screen mode  Size (kilobytes)  Colours  Rows  Resolution
paul@79 846
  -----------  ----------------  -------  ----  ----------
paul@79 847
  0            20                2        32    640x256
paul@79 848
  1            20                4        32    320x256
paul@79 849
  2            20                16       32    160x256
paul@79 850
  3            16                2        24    640x256
paul@79 851
  4 (new)      16                4        24    320x256
paul@79 852
  4 (old)      10                2        32    320x256
paul@79 853
  5            10                4        32    160x256
paul@79 854
  6            8                 2        24    320x256
paul@79 855
paul@79 856
Thus, for increasing mode numbers, the size of each mode would be the same or
paul@79 857
less than the preceding mode.
paul@79 858
paul@128 859
Enhancement: Display Mode Property Control
paul@128 860
------------------------------------------
paul@128 861
paul@128 862
It is rather curious that the ULA supports the mode numbers directly in bits 3
paul@128 863
to 5 of &FE07 since these would presumably need to be decoded in order to set
paul@128 864
the fundamental properties of the display mode. These properties are as
paul@128 865
follows:
paul@128 866
paul@128 867
 * Screen data retrieval rate: number of fetches per pair of 2MHz cycles
paul@128 868
 * Pixel colour depth
paul@128 869
 * Text mode vertical spacing
paul@128 870
paul@128 871
From these, the following properties emerge:
paul@128 872
paul@129 873
  Property                        Influences
paul@129 874
  --------                        ----------
paul@129 875
  Character row size (bytes)      Retrieval rate
paul@129 876
paul@129 877
  Number of character rows        Text mode setting
paul@129 878
paul@129 879
  Display size (bytes)            Retrieval rate (character row size)
paul@129 880
                                  Text mode setting (number of rows)
paul@129 881
paul@129 882
  Pixel frequency                 Retrieval rate
paul@129 883
  Horizontal resolution (pixels)  Colour depth
paul@128 884
paul@128 885
One can imagine a register bitfield arrangement as follows:
paul@128 886
paul@129 887
  Field             Values                  Formula
paul@129 888
  -----             ------                  -------
paul@129 889
  Pixel depth       00: 1 bit per pixel     log2(depth)
paul@129 890
                    01: 2 bits per pixel
paul@129 891
                    10: 4 bits per pixel
paul@129 892
paul@129 893
  Retrieval rate     0: twice               2 - fetches per cycle pair
paul@129 894
                     1: once
paul@129 895
paul@129 896
  Text mode enable   0: disable/off         text mode enabled
paul@129 897
                     1: enable/on
paul@128 898
paul@128 899
This arrangement would require four bits. However, one bit in &FE07 is
paul@128 900
seemingly inactive and might possibly be reallocated.
paul@128 901
paul@128 902
The resulting combination of properties would permit all of the existing modes
paul@128 903
plus some additional ones, including the missing MODE 4 mentioned above. With
paul@128 904
the bitfields above ordered from the most significant bits to the least
paul@128 905
significant bits providing the low-level "mode" values, the following table
paul@128 906
can be produced:
paul@128 907
paul@128 908
  Screen mode  Depth Rate   Text  Size (K)  Colours  Rows  Resolution
paul@128 909
  -----------  ----- ----   ----  --------  -------  ----  ----------
paul@128 910
  0  (0000)    1     twice  off   20        2        32    640x256    (MODE 0)
paul@128 911
  1  (0001)    1     twice  on    16        2        24    640x256    (MODE 3)
paul@128 912
  2  (0010)    1     once   off   10        2        32    320x256    (MODE 4)
paul@128 913
  3  (0011)    1     once   on    8         2        24    320x256    (MODE 6)
paul@128 914
  4  (0100)    2     twice  off   20        4        32    320x256    (MODE 1)
paul@128 915
  5  (0101)    2     twice  on    16        4        24    320x256
paul@128 916
  6  (0110)    2     once   off   10        4        32    160x256    (MODE 5)
paul@128 917
  7  (0111)    2     once   on    8         4        24    160x256
paul@128 918
  8  (1000)    4     twice  off   20        16       32    160x256    (MODE 2)
paul@128 919
  9  (1001)    4     twice  on    16        16       24    160x256
paul@128 920
  10 (1010)    4     once   off   10        16       32    80x256
paul@128 921
  11 (1011)    4     once   on    8         16       24    80x256
paul@128 922
paul@128 923
The existing modes would be covered in a way that is incompatible with the
paul@128 924
existing numbering, thus requiring a table in software, but additional text
paul@128 925
modes would be provided for MODE 1, MODE 5 and MODE 2. An additional two lower
paul@128 926
resolution modes would also be conceivable within this scheme, requiring the
paul@128 927
stretching of 16MHz pixels by a factor of eight to yield 80 pixels per
paul@128 928
scanline. The utility of such modes is questionable and such modes might not
paul@128 929
be supported.
paul@128 930
paul@76 931
Enhancement: 2MHz RAM Access
paul@76 932
----------------------------
paul@76 933
paul@76 934
Given that the CPU and ULA both access RAM at 2MHz, but given that the CPU
paul@76 935
when not competing with the ULA only accesses RAM every other 2MHz cycle (as
paul@76 936
if the ULA still needed to access the RAM), one useful enhancement would be a
paul@76 937
mechanism to let the CPU take over the ULA cycles outside the ULA's period of
paul@76 938
activity comparable to the way the ULA takes over the CPU cycles in MODE 0 to
paul@76 939
3.
paul@76 940
paul@76 941
Thus, the RAM access cycles would resemble the following in MODE 0 to 3:
paul@76 942
paul@76 943
  Upon a transition from display cycles: UUUUCCCC (instead of UUUUC_C_)
paul@76 944
  On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)
paul@76 945
paul@76 946
In MODE 4 to 6:
paul@76 947
 
paul@76 948
  Upon a transition from display cycles: CUCUCCCC (instead of CUCUC_C_)
paul@76 949
  On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)
paul@76 950
paul@76 951
This would improve CPU bandwidth as follows:
paul@76 952
paul@118 953
                Standard ULA    Enhanced ULA    % Total Bandwidth   Speedup
paul@118 954
MODE 0, 1, 2    9728 bytes      19456 bytes     24% -> 49%          2
paul@118 955
MODE 3          12288 bytes     24576 bytes     31% -> 62%          2
paul@118 956
MODE 4, 5       19968 bytes     29696 bytes     50% -> 74%          1.5
paul@118 957
MODE 6          19968 bytes     32256 bytes     50% -> 81%          1.6
paul@76 958
paul@118 959
(Here, the uncontended total 2MHz bandwidth for a display period would be
paul@118 960
39936 bytes, being 128 cycles per line over 312 lines.)
paul@115 961
paul@76 962
With such an enhancement, MODE 0 to 3 experience a doubling of CPU bandwidth
paul@76 963
because all access opportunities to RAM are doubled. Meanwhile, in the other
paul@76 964
modes, some CPU accesses occur alongside ULA accesses and thus cannot be
paul@76 965
doubled, but the CPU bandwidth increase is still significant.
paul@76 966
paul@103 967
Unfortunately, the mechanism for accessing the RAM is too slow to provide data
paul@109 968
within the time constraints of 2MHz operation. There is no time remaining in a
paul@118 969
2MHz cycle for the CPU to receive and process any retrieved data once the
paul@124 970
necessary signalling has been performed.
paul@124 971
paul@124 972
The only way for the CPU to be able to access the RAM quickly enough would be
paul@124 973
to do away with the double 4-bit access mechanism and to have a single 8-bit
paul@124 974
channel to the memory. This would require twice as many 1-bit RAM chips or a
paul@124 975
different kind of RAM chip, but it would also potentially simplify the ULA.
paul@124 976
paul@124 977
The section on 8-bit wide RAM access discusses the possibilities around
paul@124 978
changing the memory architecture, also describing the possibility of ULA
paul@124 979
accesses achieving two bytes per 2MHz cycle due to the doubling of the memory
paul@124 980
channel, leaving every other access free for the CPU during the display period
paul@124 981
in MODE 0 to 3...
paul@124 982
paul@124 983
  Standard display period: UUUUUUUU
paul@124 984
  Modified display period: UCUCUCUC
paul@124 985
paul@124 986
...and consolidating accesses in MODE 4 to 6:
paul@124 987
paul@124 988
  Standard display period: UCUCUCUC
paul@124 989
  Modified display period: UCCCUCCC
paul@124 990
paul@124 991
Together with the enhancements for non-display periods, such an "Enhanced+ ULA"
paul@124 992
would perform as follows:
paul@124 993
paul@124 994
                Standard ULA    Enhanced+ ULA   % Total Bandwidth   Speedup
paul@124 995
MODE 0, 1, 2    9728 bytes      29696 bytes     24% -> 74%          3.1
paul@124 996
MODE 3          12288 bytes     32256 bytes     31% -> 81%          2.6
paul@124 997
MODE 4, 5       19968 bytes     34816 bytes     50% -> 87%          1.7
paul@124 998
MODE 6          19968 bytes     36096 bytes     50% -> 90%          1.8
paul@124 999
paul@124 1000
Of course, the principal enhancement would be the wider memory channel, with
paul@124 1001
more buffering in the ULA being its contribution to this arrangement.
paul@103 1002
paul@55 1003
Enhancement: Region Blanking
paul@55 1004
----------------------------
paul@4 1005
paul@4 1006
The problem of permitting character-oriented blitting in programs whilst
paul@4 1007
scrolling the screen by sub-character amounts could be mitigated by permitting
paul@4 1008
a region of the display to be blank, such as the final lines of the display.
paul@4 1009
Consider the following vertical scrolling by 2 bytes that would cause an
paul@4 1010
initial character row of 6 lines and a final character row of 2 lines:
paul@4 1011
paul@4 1012
    6 lines - initial, partial character row
paul@4 1013
  248 lines - 31 complete rows
paul@4 1014
    2 lines - final, partial character row
paul@4 1015
paul@4 1016
If a routine were in use that wrote 8 line bitmaps to the partial character
paul@4 1017
row now split in two, it would be advisable to hide one of the regions in
paul@4 1018
order to prevent content appearing in the wrong place on screen (such as
paul@4 1019
content meant to appear at the top "leaking" onto the bottom). Blanking 6
paul@4 1020
lines would be sufficient, as can be seen from the following cases.
paul@4 1021
paul@4 1022
Scrolling up by 2 lines:
paul@4 1023
paul@4 1024
    6 lines - initial, partial character row
paul@4 1025
  240 lines - 30 complete rows
paul@4 1026
    4 lines - part of 1 complete row
paul@4 1027
  -----------------------------------------------------------------
paul@4 1028
    4 lines - part of 1 complete row (hidden to maintain 250 lines)
paul@4 1029
    2 lines - final, partial character row (hidden)
paul@4 1030
paul@4 1031
Scrolling down by 2 lines:
paul@4 1032
paul@4 1033
    2 lines - initial, partial character row
paul@4 1034
  248 lines - 31 complete rows
paul@4 1035
  ----------------------------------------------------------
paul@4 1036
    6 lines - final, partial character row (hidden)
paul@4 1037
paul@24 1038
Thus, in this case, region blanking would impose a 250 line display with the
paul@24 1039
bottom 6 lines blank.
paul@24 1040
paul@55 1041
See the description of the display suspend enhancement for a more efficient
paul@74 1042
way of blanking lines than merely blanking the palette whilst allowing the CPU
paul@74 1043
to perform useful work during the blanking period.
paul@74 1044
paul@74 1045
To control the blanking or suspending of lines at the top and bottom of the
paul@74 1046
display, a memory location could be dedicated to the task: the upper 4 bits
paul@74 1047
could define a blanking region of up to 16 lines at the top of the screen,
paul@74 1048
whereas the lower 4 bits could define such a region at the bottom of the
paul@74 1049
screen. If more lines were required, two locations could be employed, allowing
paul@74 1050
the top and bottom regions to occupy the entire screen.
paul@55 1051
paul@55 1052
Enhancement: Screen Height Adjustment
paul@55 1053
-------------------------------------
paul@24 1054
paul@24 1055
The height of the screen could be configurable in order to reduce screen
paul@24 1056
memory consumption. This is not quite done in MODE 3 and 6 since the start of
paul@24 1057
the screen appears to be rounded down to the nearest page, but by reducing the
paul@24 1058
height by amounts more than a page, savings would be possible. For example:
paul@24 1059
paul@24 1060
  Screen width  Depth  Height  Bytes per line  Saving in bytes  Start address
paul@24 1061
  ------------  -----  ------  --------------  ---------------  -------------
paul@24 1062
  640           1      252     80              320              &3140 -> &3100
paul@24 1063
  640           1      248     80              640              &3280 -> &3200
paul@24 1064
  320           1      240     40              640              &5A80 -> &5A00
paul@24 1065
  320           2      240     80              1280             &3500
paul@0 1066
paul@55 1067
Screen Mode Selection
paul@55 1068
---------------------
paul@55 1069
paul@55 1070
Bits 3, 4 and 5 of address &FE*7 control the selected screen mode. For a wider
paul@55 1071
range of modes, the other bits of &FE*7 (related to sound, cassette
paul@55 1072
input/output and the Caps Lock LED) would need to be reassigned and bit 0
paul@55 1073
potentially being made available for use.
paul@55 1074
paul@58 1075
Enhancement: Palette Definition
paul@58 1076
-------------------------------
paul@0 1077
paul@0 1078
Since all memory accesses go via the ULA, an enhanced ULA could employ more
paul@0 1079
specific addresses than &FE*X to perform enhanced functions. For example, the
paul@0 1080
palette control is done using &FE*8-F and merely involves selecting predefined
paul@0 1081
colours, whereas an enhanced ULA could support the redefinition of all 16
paul@0 1082
colours using specific ranges such as &FE18-F (colours 0 to 7) and &FE28-F
paul@0 1083
(colours 8 to 15), where a single byte might provide 8 bits per pixel colour
paul@0 1084
specifications similar to those used on the Archimedes.
paul@0 1085
paul@4 1086
The principal limitation here is actually the hardware: the Electron has only
paul@4 1087
a single output line for each of the red, green and blue channels, and if
paul@4 1088
those outputs are strictly digital and can only be set to a "high" and "low"
paul@4 1089
value, then only the existing eight colours are possible. If a modern ULA were
paul@81 1090
able to output analogue values (or values at well-defined points between the
paul@81 1091
high and low values, such as the half-on value supported by the Amstrad CPC
paul@81 1092
series), it would still need to be assessed whether the circuitry could
paul@81 1093
successfully handle and propagate such values. Various sources indicate that
paul@81 1094
only "TTL levels" are supported by the RGB output circuit, and since there are
paul@81 1095
74LS08 AND logic gates involved in the RGB component outputs from the ULA, it
paul@81 1096
is likely that the ULA is expected to provide only "high" or "low" values.
paul@4 1097
paul@58 1098
Short of adding extra outputs from the ULA (either additional red, green and
paul@81 1099
blue outputs or a combined intensity output), another approach might involve
paul@81 1100
some kind of modulation where an output value might be encoded in multiple
paul@81 1101
pulses at a higher frequency than the pixel frequency. However, this would
paul@81 1102
demand additional circuitry outside the ULA, and component RGB monitors would
paul@81 1103
probably not be able to take advantage of this feature; only UHF and composite
paul@81 1104
video devices (the latter with the composite video colour support enabled on
paul@81 1105
the Electron's circuit board) would potentially benefit.
paul@58 1106
paul@51 1107
Flashing Colours
paul@51 1108
----------------
paul@51 1109
paul@51 1110
According to the Advanced User Guide, "The cursor and flashing colours are
paul@51 1111
entirely generated in software: This means that all of the logical to physical
paul@51 1112
colour map must be changed to cause colours to flash." This appears to suggest
paul@51 1113
that the palette registers must be updated upon the flash counter - read and
paul@51 1114
written by OSBYTE &C1 (193) - reaching zero and that some way of changing the
paul@51 1115
colour pairs to be any combination of colours might be possible, instead of
paul@52 1116
having colour complements as pairs.
paul@52 1117
paul@52 1118
It is conceivable that the interrupt code responsible does the simple thing
paul@54 1119
and merely inverts the current values for any logical colours (LC) for which
paul@54 1120
the associated physical colour (as supplied as the second parameter to the VDU
paul@54 1121
19 call) has the top bit of its four bit value set. These top bits are not
paul@52 1122
recorded in the palette registers but are presumably recorded separately and
paul@52 1123
used to build bitmaps as follows:
paul@52 1124
paul@54 1125
  LC  2 colour  4 colour  16 colour  4-bit value for inversion
paul@54 1126
  --  --------  --------  ---------  -------------------------
paul@54 1127
   0  00010001  00010001  00010001   1, 1, 1
paul@54 1128
   1  01000100  00100010  00010001   4, 2, 1
paul@54 1129
   2            01000100  00100010      4, 2
paul@54 1130
   3            10001000  00100010      8, 2
paul@54 1131
   4                      00010001         1
paul@54 1132
   5                      00010001         1
paul@54 1133
   6                      00100010         2
paul@54 1134
   7                      00100010         2
paul@54 1135
   8                      01000100         4
paul@54 1136
   9                      01000100         4
paul@54 1137
  10                      10001000         8
paul@54 1138
  11                      10001000         8
paul@54 1139
  12                      01000100         4
paul@54 1140
  13                      01000100         4
paul@54 1141
  14                      10001000         8
paul@54 1142
  15                      10001000         8
paul@54 1143
paul@54 1144
  Inversion value calculation:
paul@54 1145
paul@54 1146
   2 colour formula: 1 << (colour * 2)
paul@54 1147
   4 colour formula: 1 << colour
paul@54 1148
  16 colour formula: 1 << ((colour & 2) + ((colour & 8) * 2))
paul@52 1149
paul@53 1150
For example, where logical colour 0 has been mapped to a physical colour in
paul@53 1151
the range 8 to 15, a bitmap of 00010001 would be chosen as its contribution to
paul@53 1152
the inversion operation. (The lower three bits of the physical colour would be
paul@53 1153
used to set the underlying colour information affected by the inversion
paul@53 1154
operation.)
paul@53 1155
paul@52 1156
An operation in the interrupt code would then combine the bitmaps for all
paul@52 1157
logical colours in 2 and 4 colour modes, with the 16 colour bitmaps being
paul@52 1158
combined for groups of logical colours as follows:
paul@52 1159
paul@54 1160
   Logical colours
paul@54 1161
   ---------------
paul@52 1162
   0,  2,  8, 10
paul@52 1163
   4,  6, 12, 14
paul@52 1164
   5,  7, 13, 15
paul@52 1165
   1,  3,  9, 11
paul@52 1166
paul@52 1167
These combined bitmaps would be EORed with the existing palette register
paul@52 1168
values in order to perform the value inversion necessary to produce the
paul@52 1169
flashing effect.
paul@51 1170
paul@54 1171
Thus, in the VDU 19 operation, the appropriate inversion value would be
paul@54 1172
calculated for the logical colour, and this value would then be combined with
paul@54 1173
other inversion values in a dedicated memory location corresponding to the
paul@54 1174
colour's group as indicated above. Meanwhile, the palette channel values would
paul@54 1175
be derived from the lower three bits of the specified physical colour and
paul@54 1176
combined with other palette data in dedicated memory locations corresponding
paul@54 1177
to the palette registers.
paul@54 1178
paul@72 1179
Interestingly, although flashing colours on the BBC Micro are controlled by
paul@72 1180
toggling bit 0 of the &FE20 control register location for the Video ULA, the
paul@72 1181
actual colour inversion is done in hardware.
paul@72 1182
paul@55 1183
Enhancement: Palette Definition Lists
paul@55 1184
-------------------------------------
paul@4 1185
paul@4 1186
It can be useful to redefine the palette in order to change the colours
paul@4 1187
available for a particular region of the screen, particularly in modes where
paul@4 1188
the choice of colours is constrained, and if an increased colour depth were
paul@4 1189
available, palette redefinition would be useful to give the illusion of more
paul@4 1190
than 16 colours in MODE 2. Traditionally, palette redefinition has been done
paul@4 1191
by using interrupt-driven timers, but a more efficient approach would involve
paul@4 1192
presenting lists of palette definitions to the ULA so that it can change the
paul@4 1193
palette at a particular display line.
paul@4 1194
paul@4 1195
One might define a palette redefinition list in a region of memory and then
paul@4 1196
communicate its contents to the ULA by writing the address and length of the
paul@4 1197
list, along with the display line at which the palette is to be changed, to
paul@4 1198
ULA registers such that the ULA buffers the list and performs the redefinition
paul@4 1199
at the appropriate time. Throughput/bandwidth considerations might impose
paul@4 1200
restrictions on the practical length of such a list, however.
paul@4 1201
paul@128 1202
A simple form of palette definition might be useful in text modes. Within the
paul@128 1203
blank region between lines, the foreground palette could be changed to apply
paul@128 1204
to the next line. Palette values could be read from a table in RAM, perhaps
paul@128 1205
preceding the screen data, with 24 2-byte entries providing palette
paul@128 1206
redefinition support in 2- and 4-colour modes.
paul@128 1207
paul@79 1208
Enhancement: Display Synchronisation Interrupts
paul@79 1209
-----------------------------------------------
paul@79 1210
paul@79 1211
When completing each scanline of the display, the ULA could trigger an
paul@79 1212
interrupt. Since this might impact system performance substantially, the
paul@79 1213
feature would probably need to be configurable, and it might be sufficient to
paul@79 1214
have an interrupt only after a certain number of display lines instead.
paul@79 1215
Permitting the CPU to take action after eight lines would allow palette
paul@79 1216
switching and other effects to occur on a character row basis.
paul@79 1217
paul@79 1218
The ULA provides an interrupt at the end of the display period, presumably so
paul@79 1219
that software can schedule updates to the screen, avoid flickering or tearing,
paul@79 1220
and so on. However, some applications might benefit from an interrupt at, or
paul@79 1221
just before, the start of the display period so that palette modifications or
paul@79 1222
similar effects could be scheduled.
paul@79 1223
paul@55 1224
Enhancement: Palette-Free Modes
paul@55 1225
-------------------------------
paul@4 1226
paul@4 1227
Palette-free modes might be defined where bit values directly correspond to
paul@4 1228
the red, green and blue channels, although this would mostly make sense only
paul@4 1229
for modes with depths greater than the standard 4 bits per pixel, and such
paul@4 1230
modes would require more memory than MODE 2 if they were to have an acceptable
paul@4 1231
resolution.
paul@4 1232
paul@55 1233
Enhancement: Display Suspend
paul@55 1234
----------------------------
paul@4 1235
paul@4 1236
Especially when writing to the screen memory, it could be beneficial to be
paul@4 1237
able to suspend the ULA's access to the memory, instead producing blank values
paul@4 1238
for all screen pixels until a program is ready to reveal the screen. This is
paul@4 1239
different from palette blanking since with a blank palette, the ULA is still
paul@4 1240
reading screen memory and translating its contents into pixel values that end
paul@4 1241
up being blank.
paul@4 1242
paul@4 1243
This function is reminiscent of a capability of the ZX81, albeit necessary on
paul@4 1244
that hardware to reduce the load on the system CPU which was responsible for
paul@62 1245
producing the video output. By allowing display suspend on the Electron, the
paul@62 1246
performance benefit would be derived from giving the CPU full access to the
paul@62 1247
memory bandwidth.
paul@4 1248
paul@125 1249
Note that since the CPU is only able to access RAM at 1MHz, there is no
paul@125 1250
possibility to improve performance beyond that achieved in MODE 4, 5 or 6
paul@125 1251
normally. However, if faster RAM access were to be made possible (see the
paul@125 1252
discussion of 8-bit wide RAM access), the CPU could benefit from freeing up
paul@125 1253
the ULA's access slots entirely.
paul@125 1254
paul@74 1255
The region blanking feature mentioned above could be implemented using this
paul@74 1256
enhancement instead of employing palette blanking for the affected lines of
paul@74 1257
the display.
paul@74 1258
paul@63 1259
Enhancement: Memory Filling
paul@63 1260
---------------------------
paul@63 1261
paul@63 1262
A capability that could be given to an enhanced ULA is that of permitting the
paul@63 1263
ULA to write to screen memory as well being able to read from it. Although
paul@63 1264
such a capability would probably not be useful in conjunction with the
paul@63 1265
existing read operations when producing a screen display, and insufficient
paul@63 1266
bandwidth would exist to do so in high-bandwidth screen modes anyway, the
paul@63 1267
capability could be offered during a display suspend period (as described
paul@63 1268
above), permitting a more efficient mechanism to rapidly fill memory with a
paul@63 1269
predetermined value.
paul@63 1270
paul@63 1271
This capability could also support block filling, where the limits of the
paul@63 1272
filled memory would be defined by the position and size of a screen area,
paul@63 1273
although this would demand the provision of additional registers in the ULA to
paul@63 1274
retain the details of such areas and additional logic to control the fill
paul@63 1275
operation.
paul@63 1276
paul@69 1277
Enhancement: Region Filling
paul@69 1278
---------------------------
paul@69 1279
paul@69 1280
An alternative to memory writing might involve indicating regions using
paul@69 1281
additional registers or memory where the ULA fills regions of the screen with
paul@69 1282
content instead of reading from memory. Unlike hardware sprites which should
paul@69 1283
realistically provide varied content, region filling could employ single
paul@69 1284
colours or patterns, and one advantage of doing so would be that the ULA need
paul@69 1285
not access memory at all within a particular region.
paul@69 1286
paul@69 1287
Regions would be defined on a row-by-row basis. Instead of reading memory and
paul@69 1288
blitting a direct representation to the screen, the ULA would read region
paul@69 1289
definitions containing a start column, region width and colour details. There
paul@69 1290
might be a certain number of definitions allowed per row, or the ULA might
paul@69 1291
just traverse an ordered list of such definitions with each one indicating the
paul@71 1292
row, start column, region width and colour details.
paul@71 1293
paul@71 1294
One could even compress this information further by requiring only the row,
paul@71 1295
start column and colour details with each subsequent definition terminating
paul@71 1296
the effect of the previous one. However, one would also need to consider the
paul@71 1297
convenience of preparing such definitions and whether efficient access to
paul@71 1298
definitions for a particular row might be desirable. It might also be
paul@71 1299
desirable to avoid having to prepare definitions for "empty" areas of the
paul@71 1300
screen, effectively making the definition of the screen contents employ
paul@71 1301
run-length encoding and employ only colour plus length information.
paul@69 1302
paul@69 1303
One application of region filling is that of simple 2D and 3D shape rendering.
paul@69 1304
Although it is entirely possible to plot such shapes to the screen and have
paul@69 1305
the ULA blit the memory contents to the screen, such operations consume
paul@69 1306
bandwidth both in the initial plotting and in the final transfer to the
paul@69 1307
screen. Region filling would reduce such bandwidth usage substantially.
paul@69 1308
paul@71 1309
This way of representing screen images would make certain kinds of images
paul@71 1310
unfeasible to represent - consider alternating single pixel values which could
paul@71 1311
easily occur in some character bitmaps - even if an internal queue of regions
paul@71 1312
were to be supported such that the ULA could read ahead and buffer such
paul@71 1313
"bandwidth intensive" areas. Thus, the ULA might be better served providing
paul@71 1314
this feature for certain areas of the display only as some kind of special
paul@71 1315
graphics window.
paul@71 1316
paul@55 1317
Enhancement: Hardware Sprites
paul@55 1318
-----------------------------
paul@0 1319
paul@0 1320
An enhanced ULA might provide hardware sprites, but this would be done in an
paul@0 1321
way that is incompatible with the standard ULA, since no &FE*X locations are
paul@34 1322
available for allocation. To keep the facility simple, hardware sprites would
paul@34 1323
have a standard byte width and height.
paul@34 1324
paul@34 1325
The specification of sprites could involve the reservation of 16 locations
paul@34 1326
(for example, &FE20-F) specifying a fixed number of eight sprites, with each
paul@34 1327
location pair referring to the sprite data. By limiting the ULA to dealing
paul@34 1328
with a fixed number of sprites, the work required inside the ULA would be
paul@35 1329
reduced since it would avoid having to deal with arbitrary numbers of sprites.
paul@0 1330
paul@35 1331
The principal limitation on providing hardware sprites is that of having to
paul@35 1332
obtain sprite data, given that the ULA is usually required to retrieve screen
paul@35 1333
data, and given the lack of memory bandwidth available to retrieve sprite data
paul@35 1334
(particularly from multiple sprites supposedly at the same position) and
paul@35 1335
screen data simultaneously. Although the ULA could potentially read sprite
paul@35 1336
data and screen data in alternate memory accesses in screen modes where the
paul@35 1337
bandwidth is not already fully utilised, this would result in a degradation of
paul@35 1338
performance.
paul@34 1339
paul@55 1340
Enhancement: Additional Screen Mode Configurations
paul@55 1341
--------------------------------------------------
paul@24 1342
paul@24 1343
Alternative screen mode configurations could be supported. The ULA has to
paul@24 1344
produce 640 pixel values across the screen, with pixel doubling or quadrupling
paul@24 1345
employed to fill the screen width:
paul@24 1346
paul@24 1347
  Screen width      Columns     Scaling     Depth       Bytes
paul@24 1348
  ------------      -------     -------     -----       -----
paul@24 1349
  640               80          x1          1           80
paul@24 1350
  320               40          x2          1, 2        40, 80
paul@24 1351
  160               20          x4          2, 4        40, 80
paul@24 1352
paul@24 1353
It must also use at most 80 byte-sized memory accesses to provide the
paul@24 1354
information for the display. Given that characters must occupy an 8x8 pixel
paul@24 1355
array, if a configuration featuring anything other than 20, 40 or 80 character
paul@24 1356
columns is to be supported, compromises must be made such as the introduction
paul@24 1357
of blank pixels either between characters (such as occurs between rows in MODE
paul@24 1358
3 and 6) or at the end of a scanline (such as occurs at the end of the frame
paul@55 1359
in MODE 3 and 6). Consider the following configuration:
paul@24 1360
paul@24 1361
  Screen width      Columns     Scaling     Depth       Bytes       Blank
paul@24 1362
  ------------      -------     -------     -----       ------      -----
paul@24 1363
  208               26          x3          1, 2        26, 52      16
paul@24 1364
paul@24 1365
Here, if the ULA can triple pixels, a 26 column mode with either 2 or 4
paul@24 1366
colours could be provided, with 16 blank pixel values (out of a total of 640)
paul@24 1367
generated either at the start or end (or split between the start and end) of
paul@24 1368
each scanline.
paul@24 1369
paul@55 1370
Enhancement: Character Attributes
paul@55 1371
---------------------------------
paul@24 1372
paul@24 1373
The BBC Micro MODE 7 employs something resembling character attributes to
paul@24 1374
support teletext displays, but depends on circuitry providing a character
paul@24 1375
generator. The ZX Spectrum, on the other hand, provides character attributes
paul@24 1376
as a means of colouring bitmapped graphics. Although such a feature is very
paul@24 1377
limiting as the sole means of providing multicolour graphics, in situations
paul@24 1378
where the choice is between low resolution multicolour graphics or high
paul@24 1379
resolution monochrome graphics, character attributes provide a potentially
paul@24 1380
useful compromise.
paul@24 1381
paul@24 1382
For each byte read, the ULA must deliver 8 pixel values (out of a total of
paul@24 1383
640) to the video output, doing so by either emptying its pixel buffer on a
paul@24 1384
pixel per cycle basis, or by multiplying pixels and thus holding them for more
paul@24 1385
than one cycle. For example for a screen mode having 640 pixels in width:
paul@24 1386
paul@24 1387
  Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
paul@24 1388
  Reads:    B                               B
paul@24 1389
  Pixels:   0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7
paul@24 1390
paul@24 1391
And for a screen mode having 320 pixels in width:
paul@24 1392
paul@24 1393
  Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
paul@24 1394
  Reads:    B
paul@24 1395
  Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7
paul@24 1396
paul@24 1397
However, in modes where less than 80 bytes are required to generate the pixel
paul@24 1398
values, an enhanced ULA might be able to read additional bytes between those
paul@24 1399
providing the bitmapped graphics data:
paul@24 1400
paul@24 1401
  Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
paul@24 1402
  Reads:    B                               A
paul@24 1403
  Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7
paul@24 1404
paul@24 1405
These additional bytes could provide colour information for the bitmapped data
paul@24 1406
in the following character column (of 8 pixels). Since it would be desirable
paul@24 1407
to apply attribute data to the first column, the initial 8 cycles might be
paul@24 1408
configured to not produce pixel values.
paul@24 1409
paul@35 1410
For an entire character, attribute data need only be read for the first row of
paul@35 1411
pixels for a character. The subsequent rows would have attribute information
paul@35 1412
applied to them, although this would require the attribute data to be stored
paul@35 1413
in some kind of buffer. Thus, the following access pattern would be observed:
paul@35 1414
paul@112 1415
  Reads:    A B _ B _ B _ B _ B _ B _ B _ B ...
paul@112 1416
paul@112 1417
In modes 3 and 6, the blank display lines could be used to retrieve attribute
paul@112 1418
data:
paul@112 1419
paul@112 1420
  Reads (blank):     A _ A _ A _ A _ A _ A _ A _ A _ ...
paul@112 1421
  Reads (active):    B _ B _ B _ B _ B _ B _ B _ B _ ...
paul@112 1422
  Reads (active):    B _ B _ B _ B _ B _ B _ B _ B _ ...
paul@112 1423
                     ...
paul@112 1424
paul@112 1425
See below for a discussion of using this for character data as well.
paul@35 1426
paul@24 1427
A whole byte used for colour information for a whole character would result in
paul@35 1428
a choice of 256 colours, and this might be somewhat excessive. By only reading
paul@35 1429
attribute bytes at every other opportunity, a choice of 16 colours could be
paul@35 1430
applied individually to two characters.
paul@24 1431
paul@24 1432
  Cycle:    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
paul@24 1433
  Reads:    B               A               B               -
paul@24 1434
  Pixels:   0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
paul@24 1435
paul@35 1436
Further reductions in attribute data access, offering 4 colours for every
paul@35 1437
character in a four character block, for example, might also be worth
paul@34 1438
considering.
paul@34 1439
paul@24 1440
Consider the following configurations for screen modes with a colour depth of
paul@24 1441
1 bit per pixel for bitmap information:
paul@24 1442
paul@35 1443
  Screen width  Columns  Scaling  Bytes (B)  Bytes (A)  Colours  Screen start
paul@35 1444
  ------------  -------  -------  ---------  ---------  -------  ------------
paul@35 1445
  320           40       x2       40         40         256      &5300
paul@35 1446
  320           40       x2       40         20         16       &5580 -> &5500
paul@35 1447
  320           40       x2       40         10         4        &56C0 -> &5600
paul@35 1448
  208           26       x3       26         26         256      &62C0 -> &6200
paul@35 1449
  208           26       x3       26         13         16       &6460 -> &6400
paul@34 1450
paul@113 1451
Enhancement: Text-Only Modes using Character and Attribute Data
paul@113 1452
---------------------------------------------------------------
paul@112 1453
paul@112 1454
In modes 3 and 6, the blank display lines could be used to retrieve character
paul@112 1455
and attribute data instead of trying to insert it between bitmap data accesses,
paul@112 1456
but this data would then need to be retained:
paul@112 1457
paul@112 1458
  Reads:    A C A C A C A C A C A C A C A C ...
paul@112 1459
  Reads:    B _ B _ B _ B _ B _ B _ B _ B _ ...
paul@112 1460
paul@112 1461
Only attribute (A) and character (C) reads would require screen memory
paul@112 1462
storage. Bitmap data reads (B) would involve either accesses to memory to
paul@112 1463
obtain character definition details or could, at the cost of special storage
paul@112 1464
in the ULA, involve accesses within the ULA that would then free up the RAM.
paul@112 1465
However, the CPU would not benefit from having any extra access slots due to
paul@112 1466
the limitations of the RAM access mechanism.
paul@112 1467
paul@113 1468
A scheme without caching might be possible. The same line of memory addresses
paul@113 1469
might be visited over and over again for eight display lines, with an index
paul@113 1470
into the bitmap data being incremented from zero to seven. The access patterns
paul@113 1471
would look like this:
paul@113 1472
paul@113 1473
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 0)
paul@113 1474
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 1)
paul@113 1475
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 2)
paul@113 1476
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 3)
paul@113 1477
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 4)
paul@113 1478
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 5)
paul@113 1479
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 6)
paul@113 1480
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 7)
paul@113 1481
paul@113 1482
The bandwidth requirements would be the sum of the accesses to read the
paul@113 1483
character values (repeatedly) and those to read the bitmap data to reproduce
paul@113 1484
the characters on screen.
paul@113 1485
paul@55 1486
Enhancement: MODE 7 Emulation using Character Attributes
paul@55 1487
--------------------------------------------------------
paul@24 1488
paul@24 1489
If the scheme of applying attributes to character regions were employed to
paul@24 1490
emulate MODE 7, in conjunction with the MODE 6 display technique, the
paul@24 1491
following configuration would be required:
paul@24 1492
paul@24 1493
  Screen width  Columns  Rows  Bytes (B)  Bytes (A)  Colours  Screen start
paul@24 1494
  ------------  -------  ----  ---------  ---------  -------  ------------
paul@35 1495
  320           40       25    40         20         16       &5ECC -> &5E00
paul@35 1496
  320           40       25    40         10         4        &5FC6 -> &5F00
paul@24 1497
paul@35 1498
Although this requires much more memory than MODE 7 (8500 bytes versus MODE
paul@35 1499
7's 1000 bytes), it does not need much more memory than MODE 6, and it would
paul@35 1500
at least make a limited 40-column multicolour mode available as a substitute
paul@35 1501
for MODE 7.
paul@24 1502
paul@113 1503
Using the text-only enhancement with caching of data or with repeated reads of
paul@113 1504
the same character data line for eight display lines, the storage requirements
paul@112 1505
would be diminished substantially:
paul@112 1506
paul@112 1507
  Screen width  Columns  Rows  Bytes (C)  Bytes (A)  Colours  Screen start
paul@112 1508
  ------------  -------  ----  ---------  ---------  -------  ------------
paul@112 1509
  320           40       25    40         20         16       &7A94 -> &7A00
paul@112 1510
  320           40       25    40         10         4        &7B1E -> &7B00
paul@112 1511
  320           40       25    40         5          2        &7B9B -> &7B00
paul@112 1512
  320           40       25    40         0          (2)      &7C18 -> &7C00
paul@112 1513
  640           80       25    80         40         16       &7448 -> &7400
paul@112 1514
  640           80       25    80         20         4        &763C -> &7600
paul@112 1515
  640           80       25    80         10         2        &7736 -> &7700
paul@112 1516
  640           80       25    80         0          (2)      &7830 -> &7800
paul@112 1517
paul@112 1518
Note that the colours describe the locally defined attributes for each
paul@112 1519
character. When no attribute information is provided, the colours are defined
paul@112 1520
globally.
paul@112 1521
paul@130 1522
Enhancement: Character Generator Support and Vertical Scaling
paul@130 1523
-------------------------------------------------------------
paul@130 1524
paul@130 1525
When generating a picture, the ULA traverses screen memory, obtaining 40 or 80
paul@130 1526
bytes of pixel data for each scanline. It then proceeds to the next row of
paul@130 1527
pixel data for each successive scanline, with the exception of the text modes
paul@130 1528
where scanlines may be blank (for which the row address does not advance).
paul@130 1529
This arrangement provides a conventional bitmapped graphics display.
paul@130 1530
paul@130 1531
However, the ULA could instead facilitate the use of character generators. The
paul@130 1532
principles involved can be demonstrated by the Jafa Mode 7 Mark 2 Display Unit
paul@130 1533
expansion for the Electron which feeds the pixel data from a MODE 4 screen to
paul@130 1534
a SAA5050 character generator to create a MODE 7 display. The solution adopted
paul@130 1535
involves the replication of 40 bytes of character data across as many pixel
paul@130 1536
rows as is necessary for the character generator to receive the appropriate
paul@130 1537
character data for all scanlines in any given character row. If only a single
paul@130 1538
40-byte row of character data were to be present for the first scanline of a
paul@130 1539
character row, the character generator would only produce the first scanline
paul@130 1540
(or the uppermost pixels of the characters) correctly, with the rest of the
paul@130 1541
character shapes being ill-defined.
paul@130 1542
paul@130 1543
Here, the ULA could facilitate the use of memory-efficient character mode
paul@130 1544
representations (such as MODE 7) by holding the row address for a number of
paul@130 1545
scanlines, thus providing the same row of screen data for those scanlines,
paul@130 1546
then advancing to the next row. Visualised in terms of pixel data, it would be
paul@130 1547
like providing a display with a very low vertical resolution. Indeed, being
paul@130 1548
able to reduce the vertical resolution of a display mode by a factor of eight
paul@130 1549
or ten would be equivalent to the above character generation technique in
paul@130 1550
terms of the ULA's screen reading activities.
paul@130 1551
paul@130 1552
By combining this vertical scaling or scanline replication with a circuit
paul@130 1553
switchable between bitmapped graphics output and character graphics output,
paul@130 1554
MODE 7 support could be made available, potentially as a hardware option
paul@130 1555
separate from the ULA.
paul@130 1556
paul@140 1557
Enhancement: 40-Column Text Modes by Interleaving Screen and Bitmap Accesses
paul@140 1558
----------------------------------------------------------------------------
paul@140 1559
paul@140 1560
Suggested here: https://stardot.org.uk/forums/viewtopic.php?p=393243#p393243
paul@140 1561
paul@140 1562
The ULA could be run in high-bandwidth mode to fetch character codes from
paul@140 1563
screen memory in one cycle and then to use the character code to look up a
paul@140 1564
pixel row of a character bitmap, reading that bitmap slice in the following
paul@140 1565
cycle. The bitmap would be converted to pixel values that would then be
paul@140 1566
emitted over the subsequent two cycles concurrently with the preparation of
paul@140 1567
the next character's pixels.
paul@140 1568
paul@140 1569
  2MHz cycle: 0 1 2 3 4 5 ...
paul@140 1570
  Reads:      C B C B C B ...
paul@140 1571
  Pixels:         a   b   ...
paul@140 1572
paul@140 1573
The memory access to bitmap data would be computed as follows, assuming the
paul@140 1574
normal eight pixel height and single-byte encoding of character bitmaps:
paul@140 1575
paul@140 1576
  bitmap address = bitmap table base + (character code * 8) + bitmap row
paul@140 1577
paul@140 1578
Each successive pixel row on the screen would expose the appropriate row in
paul@140 1579
the character bitmap, with this "bitmap row" looping from 0 to 7 repeatedly.
paul@140 1580
Spacing between character lines could be introduced as already done in MODE 6.
paul@140 1581
paul@112 1582
Enhancement: Compressed Character Data
paul@112 1583
--------------------------------------
paul@112 1584
paul@112 1585
Another observation about text-only modes is that they only need to store a
paul@112 1586
restricted set of bitmapped data values. Encoding this set of values in a
paul@112 1587
smaller unit of storage than a byte could possibly help to reduce the amount
paul@112 1588
of storage and bandwidth required to reproduce the characters on the display.
paul@112 1589
paul@137 1590
Enhancement: High Resolution Graphics and Larger Colour Depths
paul@137 1591
--------------------------------------------------------------
paul@0 1592
paul@82 1593
Screen modes with higher resolutions and larger colour depths might be
paul@82 1594
possible, but this would in most cases involve the allocation of more screen
paul@82 1595
memory, and the ULA would probably then be obliged to page in such memory for
paul@137 1596
the CPU to be able to sensibly access it all. Higher resolutions would also
paul@137 1597
involve a faster pixel clock.
paul@137 1598
paul@137 1599
However, we may consider a doubled colour depth and the need for higher
paul@137 1600
bandwidth transfers by a ULA having an 8-bit data bus to access the RAM,
paul@137 1601
utilising two "page mode" transfers per 2MHz cycle. If such transfers were to
paul@137 1602
access consecutive bytes in the same memory region (for example, bytes &3000
paul@137 1603
and &3001) this would require a change to the arrangement of screen memory,
paul@137 1604
also incurring changes to the memory map for larger modes:
paul@137 1605
paul@137 1606
 (&3000 &3001) (&3010 &3011) ...
paul@137 1607
 (&3002 &3003) (&3012 &3013)
paul@137 1608
 ...           ...
paul@137 1609
 (&300E &300F) (&301E &301F)
paul@137 1610
paul@137 1611
If such transfers were to access two adjacent columns of bytes (for example,
paul@137 1612
bytes &3000 and &3008), this would still require a change in the step size
paul@137 1613
across the screen memory, also incur memory map changes for larger modes, and
paul@137 1614
the method for programs to update the screen would be more complicated:
paul@137 1615
paul@137 1616
 (&3000 &3008) (&3010 &3018) ...
paul@137 1617
 (&3001 &3009) (&3011 &3019)
paul@137 1618
 ...           ...
paul@137 1619
 (&3007 &300F) (&3017 &301F)
paul@137 1620
paul@137 1621
However, such transfers could instead map the device address bit that is
paul@137 1622
toggled between transfers to the most significant system memory address bit.
paul@137 1623
Thus, bits in adjacent locations within each RAM device would actually reside
paul@137 1624
in different memory regions:
paul@137 1625
paul@137 1626
 (&3000 &B000) (&3008 &B008) ...
paul@137 1627
 (&3001 &B001) (&3009 &B009)
paul@137 1628
 ...           ...
paul@137 1629
 (&3007 &B007) (&300F &B00F)
paul@137 1630
paul@137 1631
Since &B000 can also be considered as &3000 combined with &8000, this
paul@137 1632
introducing the asserted uppermost bit, address &B000 can be considered as
paul@137 1633
&3000 in an upper memory bank.
paul@137 1634
paul@137 1635
Other mechanisms might be employed to allow programs to access the uppermost
paul@137 1636
bank, but the ULA would be able to access it trivially and unconditionally.
paul@0 1637
paul@55 1638
Enhancement: Genlock Support
paul@55 1639
----------------------------
paul@46 1640
paul@46 1641
The ULA generates a video signal in conjunction with circuitry producing the
paul@46 1642
output features necessary for the correct display of the screen image.
paul@46 1643
However, it appears that the ULA drives the video synchronisation mechanism
paul@46 1644
instead of reacting to an existing signal. Genlock support might be possible
paul@46 1645
if the ULA were made to be responsive to such external signals, resetting its
paul@46 1646
address generators upon receiving synchronisation events.
paul@46 1647
paul@55 1648
Enhancement: Improved Sound
paul@55 1649
---------------------------
paul@0 1650
paul@55 1651
The standard ULA reserves &FE*6 for sound generation and cassette input/output
paul@55 1652
(with bits 1 and 2 of &FE*7 being used to select either sound generation or
paul@55 1653
cassette I/O), thus making it impossible to support multiple channels within
paul@0 1654
the given framework. The BBC Micro ULA employs &FE40-&FE4F for sound control,
paul@0 1655
and an enhanced ULA could adopt this interface.
paul@0 1656
paul@9 1657
The BBC Micro uses the SN76489 chip to produce sound, and the entire
paul@9 1658
functionality of this chip could be emulated for enhanced sound, with a subset
paul@9 1659
of the functionality exposed via the &FE*6 interface.
paul@9 1660
paul@9 1661
See: http://en.wikipedia.org/wiki/Texas_Instruments_SN76489
paul@81 1662
See: http://www.smspower.org/Development/SN76489
paul@9 1663
paul@55 1664
Enhancement: Waveform Upload
paul@55 1665
----------------------------
paul@0 1666
paul@0 1667
As with a hardware sprite function, waveforms could be uploaded or referenced
paul@0 1668
using locations as registers referencing memory regions.
paul@0 1669
paul@55 1670
Enhancement: Sound Input/Output
paul@55 1671
-------------------------------
paul@46 1672
paul@46 1673
Since the ULA already controls audio input/output for cassette-based data, it
paul@46 1674
would have been interesting to entertain the idea of sampling and output of
paul@46 1675
sounds through the cassette interface. However, a significant amount of
paul@46 1676
circuitry is employed to process the input signal for use by the ULA and to
paul@46 1677
process the output signal for recording.
paul@46 1678
paul@46 1679
See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw_03.htm#3.11
paul@46 1680
paul@55 1681
Enhancement: BBC ULA Compatibility
paul@55 1682
----------------------------------
paul@0 1683
paul@0 1684
Although some new ULA functions could be defined in a way that is also
paul@0 1685
compatible with the BBC Micro, the BBC ULA is itself incompatible with the
paul@0 1686
Electron ULA: &FE00-7 is reserved for the video controller in the BBC memory
paul@0 1687
map, but controls various functions specific to the 6845 video controller;
paul@0 1688
&FE08-F is reserved for the serial controller. It therefore becomes possible
paul@0 1689
to disregard compatibility where compatibility is already disregarded for a
paul@0 1690
particular area of functionality.
paul@0 1691
paul@0 1692
&FE20-F maps to video ULA functionality on the BBC Micro which provides
paul@0 1693
control over the palette (using address &FE21, compared to &FE07-F on the
paul@0 1694
Electron) and other system-specific functions. Since the location usage is
paul@0 1695
generally incompatible, this region could be reused for other purposes.
paul@31 1696
paul@55 1697
Enhancement: Increased RAM, ULA and CPU Performance
paul@55 1698
---------------------------------------------------
paul@49 1699
paul@49 1700
More modern implementations of the hardware might feature faster RAM coupled
paul@49 1701
with an increased ULA clock frequency in order to increase the bandwidth
paul@49 1702
available to the ULA and to the CPU in situations where the ULA is not needed
paul@49 1703
to perform work. A ULA employing a 32MHz clock would be able to complete the
paul@49 1704
retrieval of a byte from RAM in only 250ns and thus be able to enable the CPU
paul@49 1705
to access the RAM for the following 250ns even in display modes requiring the
paul@49 1706
retrieval of a byte for the display every 500ns. The CPU could, subject to
paul@49 1707
timing issues, run at 2MHz even in MODE 0, 1 and 2.
paul@49 1708
paul@49 1709
A scheme such as that described above would have a similar effect to the
paul@49 1710
scheme employed in the BBC Micro, although the latter made use of RAM with a
paul@49 1711
wider bandwidth in order to complete memory transfers within 250ns and thus
paul@49 1712
permit the CPU to run continuously at 2MHz.
paul@49 1713
paul@49 1714
Higher bandwidth could potentially be used to implement exotic features such
paul@49 1715
as RAM-resident hardware sprites or indeed any feature demanding RAM access
paul@49 1716
concurrent with the production of the display image.
paul@49 1717
paul@80 1718
Enhancement: Multiple CPU Stacks and Zero Pages
paul@80 1719
-----------------------------------------------
paul@75 1720
paul@75 1721
The 6502 maintains a stack for subroutine calls and register storage in page
paul@75 1722
&01. Although the stack register can be manipulated using the TSX and TXS
paul@75 1723
instructions, thereby permitting the maintenance of multiple stack regions and
paul@75 1724
thus the potential coexistence of multiple programs each using a separate
paul@75 1725
region, only programs that make little use of the stack (perhaps avoiding
paul@75 1726
deeply-nested subroutine invocations and significant register storage) would
paul@75 1727
be able to coexist without overwriting each other's stacks.
paul@75 1728
paul@75 1729
One way that this issue could be alleviated would involve the provision of a
paul@75 1730
facility to redirect accesses to page &01 to other areas of memory. The ULA
paul@75 1731
would provide a register that defines a physical page for the use of the CPU's
paul@75 1732
"logical" page &01, and upon any access to page &01 by the CPU, the ULA would
paul@75 1733
change the asserted address lines to redirect the access to the appropriate
paul@75 1734
physical region.
paul@75 1735
paul@75 1736
By providing an 8-bit register, mapping to the most significant byte (MSB) of
paul@75 1737
a 16-bit address, the ULA could then replace any MSB equal to &01 with the
paul@75 1738
register value before the access is made. Where multiple programs coexist,
paul@75 1739
upon switching programs, the register would be updated to point the ULA to the
paul@75 1740
appropriate stack location, thus providing a simple memory management unit
paul@75 1741
(MMU) capability.
paul@75 1742
paul@80 1743
In a similar fashion, zero page accesses could also be redirected so that code
paul@80 1744
could run from sideways RAM and have zero page operations redirected to "upper
paul@80 1745
memory" - for example, to page &BE (with stack accesses redirected to page
paul@80 1746
&BF, perhaps) - thereby permitting most CPU operations to occur without
paul@80 1747
inadvertent accesses to "lower memory" (the RAM) which would risk stalling the
paul@80 1748
CPU as it contends with the ULA for memory access.
paul@80 1749
paul@80 1750
Such facilities could also be provided by a separate circuit between the CPU
paul@80 1751
and ULA in a fashion similar to that employed by a "turbo" board, but unlike
paul@80 1752
such boards, no additional RAM would be provided: all memory accesses would
paul@80 1753
occur as normal through the ULA, albeit redirected when configured
paul@80 1754
appropriately.
paul@80 1755
paul@31 1756
ULA Pin Functions
paul@31 1757
-----------------
paul@31 1758
paul@31 1759
The functions of the ULA pins are described in the Electron Service Manual. Of
paul@31 1760
interest to video processing are the following:
paul@31 1761
paul@31 1762
  CSYNC (low during horizontal or vertical synchronisation periods, high
paul@31 1763
         otherwise)
paul@31 1764
paul@31 1765
  HS (low during horizontal synchronisation periods, high otherwise)
paul@31 1766
paul@31 1767
  RED, GREEN, BLUE (pixel colour outputs)
paul@31 1768
paul@31 1769
  CLOCK IN (a 16MHz clock input, 4V peak to peak)
paul@31 1770
paul@31 1771
  PHI OUT (a 1MHz, 2MHz and stopped clock signal for the CPU)
paul@31 1772
paul@31 1773
More general memory access pins:
paul@31 1774
paul@31 1775
  RAM0...RAM3 (data lines to/from the RAM)
paul@31 1776
paul@31 1777
  RA0...RA7 (address lines for sending both row and column addresses to the RAM)
paul@31 1778
paul@38 1779
  RAS (row address strobe setting the row address on a negative edge - see the
paul@38 1780
       timing notes)
paul@31 1781
paul@38 1782
  CAS (column address strobe setting the column address on a negative edge -
paul@38 1783
       see the timing notes)
paul@31 1784
paul@31 1785
  WE (sets write enable with logic 0, read with logic 1)
paul@31 1786
paul@31 1787
  ROM (select data access from ROM)
paul@31 1788
paul@31 1789
CPU-oriented memory access pins:
paul@31 1790
paul@31 1791
  A0...A15 (CPU address lines)
paul@31 1792
paul@31 1793
  PD0...PD7 (CPU data lines)
paul@31 1794
paul@31 1795
  R/W (indicates CPU write with logic 0, CPU read with logic 1)
paul@31 1796
paul@31 1797
Interrupt-related pins:
paul@31 1798
paul@31 1799
  NMI (CPU request for uninterrupted 1MHz access to memory)
paul@31 1800
paul@31 1801
  IRQ (signal event to CPU)
paul@31 1802
paul@31 1803
  POR (power-on reset, resetting the ULA on a positive edge and asserting the
paul@31 1804
       CPU's RST pin)
paul@31 1805
paul@31 1806
  RST (master reset for the CPU signalled on power-up and by the Break key)
paul@31 1807
paul@31 1808
Keyboard-related pins:
paul@31 1809
paul@31 1810
  KBD0...KBD3 (keyboard inputs)
paul@31 1811
paul@31 1812
  CAPS LOCK (control status LED)
paul@31 1813
paul@31 1814
Sound-related pins:
paul@31 1815
paul@31 1816
  SOUND O/P (sound output using internal oscillator)
paul@31 1817
paul@31 1818
Cassette-related pins:
paul@31 1819
paul@31 1820
  CAS IN (cassette circuit input, between 0.5V to 2V peak to peak)
paul@31 1821
paul@31 1822
  CAS OUT (pseudo-sinusoidal output, 1.8V peak to peak)
paul@31 1823
paul@31 1824
  CAS RC (detect high tone)
paul@31 1825
paul@31 1826
  CAS MO (motor relay output)
paul@31 1827
paul@31 1828
  ÷13 IN (~1200 baud clock input)
paul@46 1829
paul@72 1830
ULA Socket
paul@72 1831
----------
paul@72 1832
paul@72 1833
The socket used for the ULA is a 3M/TexTool 268-5400 68-pin socket.
paul@72 1834
paul@46 1835
References
paul@46 1836
----------
paul@46 1837
paul@46 1838
See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw.htm
paul@71 1839
paul@71 1840
About this Document
paul@71 1841
-------------------
paul@71 1842
paul@71 1843
The most recent version of this document and accompanying distribution should
paul@71 1844
be available from the following location:
paul@71 1845
paul@71 1846
http://hgweb.boddie.org.uk/ULA
paul@71 1847
paul@71 1848
Copyright and licence information can be found in the docs directory of this
paul@71 1849
distribution - see docs/COPYING.txt for more information.