ULA (annotate ULA.txt in ab28336894bc)

The Acorn Electron ULA

paul@71

2

======================

Principal Design and Feature Constraints

paul@46

5

----------------------------------------

The features of the ULA are limited in sophistication by the amount of time

paul@116

8

and resources that can be allocated to each activity supporting the

paul@116

9

fundamental features and obligations of the unit. Maintaining a screen display

paul@116

10

based on the contents of RAM itself requires the ULA to have exclusive access

paul@116

11

to various hardware resources for a significant period of time.

Whilst other elements of the ULA can in principle run in parallel with the

paul@116

14

display refresh activity, they cannot also access the RAM at the same time.

paul@116

15

Consequently, other features that might use the RAM must accept a reduced

paul@116

16

allocation of that resource in comparison to a hypothetical architecture where

paul@116

17

concurrent RAM access is possible at all times.

Thus, the principal constraint for many features is bandwidth. The duration of

paul@46

20

access to hardware resources is one aspect of this; the rate at which such

paul@46

21

resources can be accessed is another. For example, the RAM is not fast enough

paul@46

22

to support access more frequently than one byte per 2MHz cycle, and for screen

paul@46

23

modes involving 80 bytes of screen data per scanline, there are no free cycles

paul@46

24

for anything other than the production of pixel output during the active

paul@46

25

scanline periods.

Another constraint is imposed by the method of RAM access provided by the ULA.

paul@116

28

The ULA is able to access RAM by fetching 4 bits at a time and thus managing

paul@116

29

to transfer 8 bits within a single 2MHz cycle, this being sufficient to

paul@116

30

provide display data for the most demanding screen modes. However, this

paul@116

31

mechanism's timing requirements are beyond the capabilities of the CPU when

paul@116

32

running at 2MHz.

Consequently, the CPU will only ever be able to access RAM via the ULA at

paul@116

35

1MHz, even when the ULA is not accessing the RAM. Fortunately, when needing to

paul@116

36

refresh the display, the ULA is still able to make use of the idle part of

paul@116

37

each 1MHz cycle (or, rather, the idle 2MHz cycle unused by the CPU) to itself

paul@116

38

access the RAM at a rate of 1 byte per 1MHz cycle (or 1 byte every other 2MHz

paul@116

39

cycle), thus supporting the less demanding screen modes.

Timing

paul@22

42

------

According to 15.3.2 in the Advanced User Guide, there are 312 scanlines, 256

paul@40

45

of which are used to generate pixel data. At 50Hz, this means that 128 cycles

paul@40

46

are spent on each scanline (2000000 cycles / 50 = 40000 cycles; 40000 cycles /

paul@40

47

312 ~= 128 cycles). This is consistent with the observation that each scanline

paul@37

48

requires at most 80 bytes of data, and that the ULA is apparently busy for 40

paul@37

49

out of 64 microseconds in each scanline.

(In fact, since the ULA is seeking to provide an image for an interlaced

paul@78

52

625-line display, there are in fact two "fields" involved, one providing 312

paul@78

53

scanlines and one providing 313 scanlines. See below for a description of the

paul@78

54

video system.)

Access to RAM involves accessing four 64Kb dynamic RAM devices (IC4 to IC7,

paul@33

57

each providing two bits of each byte) using two cycles within the 500ns period

paul@36

58

of the 2MHz clock to complete each access operation. Since the CPU and ULA

paul@36

59

have to take turns in accessing the RAM in MODE 4, 5 and 6, the CPU must

paul@36

60

effectively run at 1MHz (since every other 500ns period involves the ULA

paul@115

61

accessing RAM) during transfers of screen data.

The CPU is driven by an external clock (IC8) whose 16MHz frequency is divided

paul@138

64

by the ULA (IC1) depending on the screen mode in use. Each 16MHz cycle is

paul@115

65

approximately 62.5ns. To access the memory, the following patterns

paul@115

66

corresponding to 16MHz cycles are required:

     Time (ns):  0-------------- 500------------- ...

paul@99

69

   2 MHz cycle:  0               1                ...

paul@99

70

  16 MHz cycle:  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7  ...

paul@99

71

                 /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...

paul@100

72

          ~RAS:  /---\___________/---\___________ ...

paul@100

73

          ~CAS:  /-----\___/-\___/-----\___/-\___ ...

paul@101

74

Address events:      A B     C       A B     C    ...

paul@139

75

   Data events:        ...F  ...S      ...F  ...S ...

paul@139

76

           ~WE:        W               W          ...

      ~RAS ops:  1   0           1   0            ...

paul@101

79

      ~CAS ops:  1     0   1 0   1     0   1 0    ...

   Address ops:     a.b.    c.      a.b.    c.    ...

paul@101

82

      Data ops:  s         f     s         f      ...

       PHI OUT:  ----\_______/------------------- ...

paul@139

85

     CPU (RAM):      .....L  ....D                ...

paul@139

86

           RnW:      .....R                       ...

       PHI OUT:  ----\_______/-------\_______/--- ...

paul@139

89

     CPU (ROM):  D   .....L  ....D   .....L  .... ...

paul@139

90

           RnW:      .....R          .....R       ...

~RAS must be high for 100ns, ~CAS must be high for 50ns.

paul@101

93

~RAS must be low for 150ns, ~CAS must be low for 90ns.

paul@101

94

Data is available 150ns after ~RAS goes low, 90ns after ~CAS goes low.

Here, "A" and "B" respectively indicate the row and first column addresses

paul@64

97

being latched into the RAM (on a negative edge for ~RAS and ~CAS

paul@64

98

respectively), and "C" indicates the second column address being latched into

paul@64

99

the RAM. Presumably, the first and second half-bytes can be read at "F" and

paul@64

100

"S" respectively, and the row and column addresses must be made available at

paul@138

101

"a" and "b" (and "c") respectively at the latest. The TM4164EC4 datasheet

paul@138

102

suggests that the addresses can be made available as the ~RAS and ~CAS levels

paul@138

103

are brought low. Data can be read at "f" and "s" for the first and second

paul@138

104

half-bytes respectively.

The TM4164EC4-15 has a row address access time of 150ns (maximum) and a column

paul@99

107

address access time of 90ns (maximum), which appears to mean that ~RAS must be

paul@99

108

held low for at least 150ns and that ~CAS must be held low for at least 90ns

paul@99

109

before data becomes available. 150ns is 2.4 cycles (at 16MHz) and 90ns is 1.44

paul@99

110

cycles. Thus, "A" to "F" is 2.5 cycles, "B" to "F" is 1.5 cycles, "C" to "S"

paul@99

111

is 1.5 cycles.

Note that the Service Manual refers to the negative edge of RAS and CAS, but

paul@38

114

the datasheet for the similar TM4164EC4 product shows latching on the negative

paul@38

115

edge of ~RAS and ~CAS. It is possible that the Service Manual also intended to

paul@38

116

communicate the latter behaviour. In the TM4164EC4 datasheet, it appears that

paul@38

117

"page mode" provides the appropriate behaviour for that particular product.

The CPU, when accessing the RAM alone, apparently does not make use of the

paul@76

120

vacated "slot" that the ULA would otherwise use (when interleaving accesses in

paul@76

121

MODE 4, 5 and 6). It only employs a full 2MHz access frequency to memory when

paul@103

122

accessing ROM (and potentially sideways RAM). The principal limitation is the

paul@103

123

amount of time needed between issuing an address and receiving an entire byte

paul@103

124

from the RAM, which is approximately 7 cycles (at 16MHz): much longer than the

paul@103

125

4 cycles that would be required for 2MHz operation.

Write operations expose some uncertainty about the relationship between the

paul@139

128

ULA's RAM access schedule and the PHI OUT clock. The Service Manual shows PHI

paul@139

129

IN (which should be the ULA's PHI OUT signal) as being synchronised with ~RAS.

paul@139

130

Since the CPU makes its address available potentially as late as 140ns after

paul@139

131

its PHI2 clock goes low (this clock being broadly similar to PHI OUT), it

paul@139

132

would make no sense to expect the ULA to be able perform a memory access

paul@139

133

immediately. What seems more likely is that the CPU makes data available, and

paul@139

134

this is written during the next 2MHz cycle.

For the CPU, "L" indicates the point at which an address is taken from the CPU

paul@139

137

address bus, following a negative edge of PHI OUT, with "D" being the point at

paul@139

138

which data may be asserted for writing, following a positive edge of PHI OUT.

paul@139

139

Here, PHI OUT is driven at 1MHz. Given that ~WE needs to be driven low for

paul@139

140

writing or high for reading, and thus propagates RnW from the CPU, this would

paul@139

141

need to be done before data would be retrieved and, according to the TM4164EC4

paul@139

142

datasheet, even as late as the column address is presented and ~CAS brought

paul@139

143

low.

It must be concluded that where accesses are interleaved between the CPU and

paul@139

146

ULA, the CPU access begins concurrently with the ULA access, with the CPU

paul@139

147

address and data retained by the ULA, and after the ULA access, the rest of

paul@139

148

the CPU transaction occurs in the following 2MHz cycle.

See: Acorn Electron Advanced User Guide

paul@57

151

See: Acorn Electron Service Manual

paul@115

152

     http://chrisacorns.computinghistory.org.uk/docs/Acorn/Manuals/Acorn_ElectronSM.pdf

paul@57

153

See: http://mdfs.net/Docs/Comp/Electron/Techinfo.htm

paul@76

154

See: http://stardot.org.uk/forums/viewtopic.php?p=120438#p120438

paul@121

155

See: One of the Most Popular 65,536-Bit (64K) Dynamic RAMs The TMS 4164

paul@121

156

     http://smithsonianchips.si.edu/augarten/p64.htm

paul@139

157

See: https://www.mups.co.uk/project/hardware/acorn_electron/

paul@139

158

See: Rockwell R650X and R651X Microprocessors (CPU)

paul@139

159

See: http://wilsonminesco.com/6502primer/

A Note on 8-Bit Wide RAM Access

paul@119

162

-------------------------------

It is worth considering the timing when 8 bits of data can be obtained at once

paul@119

165

from the RAM chips:

     Time (ns):  0-------------- 500------------- ...

paul@119

168

   2 MHz cycle:  0               1                ...

paul@119

169

   8 MHz cycle:  0   1   2   3   0   1   2   3    ...

paul@119

170

                 /-\_/-\_/-\_/-\_/-\_/-\_/-\_/-\_ ...

paul@119

171

          ~RAS:  /---\___________/---\___________ ...

paul@119

172

          ~CAS:  /-------\_______/-------\_______ ...

paul@119

173

Address events:      A   B           A   B        ...

paul@139

174

   Data events:          ...E            ...E     ...

paul@139

175

           ~WE:          W               W        ...

      ~RAS ops:  1   0           1   0            ...

paul@119

178

      ~CAS ops:  1       0       1       0        ...

   Address ops:     a.  b.          a.  b.        ...

paul@119

181

      Data ops:            f     s         f      ...

       PHI OUT:  ----\_______/-------\_______/--- ...

paul@139

184

           CPU:  D   .....L  ....D   .....L  .... ...

paul@139

185

           RnW:      .....R          .....R        ...

Here, "E" indicates the availability of an entire byte.

Since only one fetch is required per 2MHz cycle, instead of two fetches for

paul@119

190

the 4-bit wide RAM arrangement, it seems likely that longer 8MHz cycles could

paul@119

191

be used to coordinate the necessary signalling.

Another conceivable simplification from using an 8-bit wide RAM access channel

paul@120

194

with a single access within each 2MHz cycle is the possibility of allowing the

paul@120

195

CPU to signal directly to the RAM instead of having the ULA perform the access

paul@124

196

signalling on the CPU's behalf. Note that it is this more leisurely signalling

paul@124

197

that would allow the CPU to conduct accesses at 2MHz: the "compressed"

paul@124

198

signalling being beyond the capabilities of the CPU.

Note that 16MHz cycles would still be needed for the pixel clock in MODE 0,

paul@122

201

which needs to output eight pixels per 2MHz cycle, producing 640 monochrome

paul@122

202

pixels per 80-byte line.

An obvious consideration with regard to 8-bit wide access is whether the ULA

paul@124

205

could still conduct the "compressed" signalling for its own RAM accesses:

     Time (ns):  0-------------- 500------------- ...

paul@124

208

   2 MHz cycle:  0               1                ...

paul@124

209

  16 MHz cycle:  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7  ...

paul@124

210

                 /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...

paul@124

211

          ~RAS:  /---\___________/---\___________ ...

paul@124

212

          ~CAS:  /-----\___/-\___/-----\___/-\___ ...

paul@124

213

Address events:      A B     C       A B     C    ...

paul@139

214

   Data events:        ...1  ...2      ...1  ...2 ...

paul@139

215

           ~WE:        W               W          ...

      ~RAS ops:  1   0           1   0            ...

paul@124

218

      ~CAS ops:  1     0   1 0   1     0   1 0    ...

   Address ops:     a.b.    c       a.b.    c     ...

paul@124

221

      Data ops:  s         f     s         f      ...

       PHI OUT:  ----\_______/-------\_______/--- ...

paul@139

224

           CPU:  D   .....L  ....D   .....L  .... ...

paul@139

225

           RnW:      .....R          .....R        ...

Here, "1" and "2" in the data events correspond to whole byte accesses,

paul@124

228

effectively upgrading the half-byte "F" and "S" events in the existing ULA

paul@124

229

arrangement.

Although the provision of access for the CPU would adhere to the relevant

paul@124

232

timing constraints, providing only one byte per 2MHz cycle, the ULA could

paul@124

233

obtain two bytes per cycle. This would then free up bandwidth for the CPU in

paul@124

234

screen modes where the ULA would normally be dominant (MODE 0 to 3), albeit at

paul@124

235

the cost of extra buffering. Such buffering could also be done for modes where

paul@124

236

the bandwidth is shared (MODE 4 to 6), consolidating pairs of ULA accesses into

paul@124

237

single cycles and freeing up an extra cycle for CPU accesses.

A further consideration is whether the CPU and ULA could access the memory on

paul@131

240

interleaved 4MHz cycles, thus replicating the arrangement used by the CPU and

paul@131

241

Video ULA on the BBC Micro. One potential obstacle is that the apparent 4MHz

paul@131

242

access rate employed by the ULA does not involve the complete process for

paul@131

243

accessing the RAM: upon setting up the address and issuing the ~RAS signal,

paul@131

244

the ULA is able to make a pair of column accesses on the same "row" of memory,

paul@134

245

effectively achieving an average access rate of 4MHz in an 8-bit

paul@134

246

configuration.

However, if arbitrary pairs of column accesses were to be attempted, as would

paul@131

249

be required by CPU and ULA interleaving, the ~RAS signal would need to be

paul@131

250

re-issued with different addresses being set up. This would expand the time to

paul@131

251

access a memory location to beyond the period of a 4MHz cycle, making it

paul@131

252

impossible to employ interleaved accesses at such a rate.

In conclusion, a strict interleaving strategy is not possible, but by using

paul@134

255

pixel data buffering and employing two ULA accesses per 2MHz cycle to obtain

paul@134

256

two bytes in that cycle, each adjacent 2MHz cycle can be given to the CPU,

paul@134

257

thus achieving an effective throughput during display update periods of 3

paul@134

258

bytes for every pair of cycles (2 bytes for the ULA, 1 byte for the CPU), and

paul@134

259

thus 1.5 bytes per cycle, giving an illusion of 3MHz access to RAM.

Some other considerations apply to introducing 8-bit wide access. The ULA

paul@135

262

employs four pins for data transfer to and from the memory devices (RAM0..3),

paul@135

263

and obviously another four pins would be needed in an 8-bit wide scheme.

paul@135

264

However, there may have been a physical limitation on the number of pins

paul@135

265

permissible on a ULA package or the device's socket. This would necessitate

paul@135

266

the reassignment of pins, although few are readily available for such

paul@135

267

reassignment.

One approach might involve connecting the RAM devices to the CPU data bus,

paul@135

270

with each line connecting to a different RAM chip. The signalling of the RAM

paul@135

271

would remain under the control of the ULA, thus preventing the RAM devices

paul@135

272

from interfering with other memory transfer operations, with the ROM

paul@135

273

signalling also remaining under the ULA's control. One potential disadvantage

paul@135

274

of this scheme would involve the elimination of the separate data paths

paul@135

275

between the CPU and ROM and between the ULA and RAM.

Another approach might involve reclaiming the keyboard input pins (KBD0..3) as

paul@135

278

data pins for ULA access to RAM. This would necessitate the reorganisation of

paul@135

279

the keyboard interface, perhaps integrating the keyboard matrix more directly

paul@135

280

as a kind of ROM device. A bus transceiver could be used to isolate the

paul@135

281

keyboard inputs, with a pin being used to control the transceiver, since the

paul@135

282

keyboard data lines are pulled high. In effect, the transceiver would act as a

paul@135

283

kind of output enable for the keyboard.

To make the matrix appear within the sideways ROM region of the memory map,

paul@135

286

A15 would need to be set to a high value and A14 to a low value. Signals A13

paul@135

287

to A0 would then be brought low to select the appropriate column, with the

paul@135

288

individual key states being made available via data lines, perhaps D3 to D0.

paul@135

289

This mostly retains the existing addressing arrangement and scanning

paul@135

290

mechanism. Internally, the ULA would continue to enable access to the keyboard

paul@135

291

through the ROM paging mechanism, but instead of integrating separate data

paul@135

292

pins into the CPU's data path, it would integrate the keyboard inputs using

paul@135

293

the transceiver.

Enhancement: Keyboard Matrix Scanning

paul@135

296

-------------------------------------

The keyboard scanning mechanism is presumably designed to be as inexpensive as

paul@135

299

possible, being driven by software and avoiding extra logic, but at the

paul@135

300

expense of occupying large regions of the memory map when paged in. A more

paul@135

301

efficient mapping of the keyboard columns could possibly be done using

paul@135

302

decoders such as the 74xx138 part which permits the decoding of three inputs

paul@135

303

to select one of eight outputs. Using two of these parts, six address lines

paul@135

304

would be dedicated to the keyboard columns as follows:

  A5...A3 select up to eight columns via one decoder

paul@135

307

  A2...A0 select up to eight columns via another decoder

In this arrangement, only one of the two ranges of pins would be used at any

paul@135

310

given time. If the ULA were to require a certain combination of the remaining

paul@135

311

address bits, a region as small as 64 bytes could be dedicated to the

paul@135

312

keyboard.

A more efficient arrangement could be used by introducing logic that allows

paul@135

315

the decoders to work together to address the keyboard:

  A2...A0 select up to eight columns via both decoders

paul@135

318

  A3 would enable one decoder if low and the other decoder if high

With ULA constraints on the remaining address bits, a 16-byte region could be

paul@135

321

used to represent the keyboard.

A further refinement might involve combining the existing columns into groups

paul@135

324

of eight keys. This would reduce the number of columns to seven, requiring

paul@135

325

only three address lines, with all eight data lines being used to read the

paul@135

326

matrix.

On the BBC Micro, the system 6522 VIA is used to monitor and read from the

paul@135

329

keyboard. The memory locations involved with this chip are located in the

paul@135

330

region from &FE40 to &FE7F inclusive, although the memory is allocated in a

paul@135

331

way that is appropriate to operate that chip, as opposed to merely exposing

paul@135

332

the keyboard matrix.

Enhancement: Hardware Device Selection

paul@135

335

--------------------------------------

An alternative to the existing, rather cumbersome, sideways ROM mapping of the

paul@135

338

keyboard might involve making it accessible via a hardware-related memory page

paul@135

339

like page FE. With ULA addresses confined to FE0x, and with the ULA itself

paul@135

340

having to trap accesses to page FE, the page selection signal might be brought

paul@135

341

out of the ULA instead of any dedicated signal for the keyboard. Various

paul@135

342

address lines corresponding to A7 through A4, or a subset of these, could be

paul@135

343

fed into a decoder to permit the selection of other devices, with the keyboard

paul@135

344

being one of these.

Meanwhile, a more efficient keyboard mapping using the above matrix

paul@135

347

enhancement would permit the different keyboard columns to appear as a group

paul@135

348

of sixteen or eight bytes. Thus:

  A15...A8 select page FE

paul@135

351

   A7...A4 select a device or peripheral

paul@135

352

   A3...A0 select a register or keyboard column

Conceivably, devices such as sound generators could be mapped to device

paul@135

355

regions.

CPU Clock Notes

paul@110

358

---------------

"The 6502 receives an external square-wave clock input signal on pin 37, which

paul@111

361

is usually labeled PHI0. [...] This clock input is processed within the 6502

paul@111

362

to form two clock outputs: PHI1 and PHI2 (pins 3 and 39, respectively). PHI2

paul@111

363

is essentially a copy of PHI0; more specifically, PHI2 is PHI0 after it's been

paul@111

364

through two inverters and a push-pull amplifier. The same network of

paul@111

365

transistors within the 6502 which generates PHI2 is also tied to PHI1, and

paul@111

366

generates PHI1 as the inverse of PHI0. The reason why PHI1 and PHI2 are made

paul@111

367

available to external devices is so that they know when they can access the

paul@111

368

CPU. When PHI1 is high, this means that external devices can read from the

paul@111

369

address bus or data bus; when PHI2 is high, this means that external devices

paul@111

370

can write to the data bus."

See: http://lateblt.livejournal.com/88105.html

"The 6502 has a synchronous memory bus where the master clock is divided into

paul@110

375

two phases (Phase 1 and Phase 2). The address is always generated during Phase

paul@110

376

1 and all memory accesses take place during Phase 2."

See: http://www.jmargolin.com/vgens/vgens.htm

Thus, the inverse of PHI OUT provides the "other phase" of the clock. "During

paul@111

381

Phase 1" means when PHI0 - really PHI2 - is high and "during Phase 2" means

paul@111

382

when PHI1 is high.

Bandwidth Figures

paul@76

385

-----------------

Using an observation of 128 2MHz cycles per scanline, 256 active lines and 312

paul@76

388

total lines, with 80 cycles occurring in the active periods of display

paul@76

389

scanlines, the following bandwidth calculations can be performed:

Total theoretical maximum:

paul@76

392

       128 cycles * 312 lines

paul@76

393

     = 39936 bytes

MODE 0, 1, 2:

paul@76

396

ULA:    80 cycles * 256 lines

paul@76

397

     = 20480 bytes

paul@76

398

CPU:    48 cycles / 2 * 256 lines

paul@76

399

     + 128 cycles / 2 * (312 - 256) lines

paul@76

400

     = 9728 bytes

MODE 3:

paul@76

403

ULA:    80 cycles * 24 rows * 8 lines

paul@76

404

     = 15360 bytes

paul@76

405

CPU:    48 cycles / 2 * 24 rows * 8 lines

paul@76

406

     + 128 cycles / 2 * (312 - (24 rows * 8 lines))

paul@76

407

     = 12288 bytes

MODE 4, 5:

paul@76

410

ULA:    40 cycles * 256 lines

paul@76

411

     = 10240 bytes

paul@76

412

CPU:   (40 cycles + 48 cycles / 2) * 256 lines

paul@76

413

     + 128 cycles / 2 * (312 - 256) lines

paul@76

414

     = 19968 bytes

MODE 6:

paul@76

417

ULA:    40 cycles * 24 rows * 8 lines

paul@76

418

     = 7680 bytes

paul@76

419

CPU:   (40 cycles + 48 cycles / 2) * 24 rows * 8 lines

paul@76

420

     + 128 cycles / 2 * (312 - (24 rows * 8 lines))

paul@76

421

     = 19968 bytes

Here, the division of 2 for CPU accesses is performed to indicate that the CPU

paul@76

424

only uses every other access opportunity even in uncontended periods. See the

paul@76

425

2MHz RAM Access enhancement below for bandwidth calculations that consider

paul@76

426

this limitation removed.

A summary of the bandwidth figures is as follows (with extra timing details

paul@123

429

described below):

                Standard ULA    % Total   Slowdown  BBC-10s BBC-34s

paul@123

432

MODE 0, 1, 2    9728 bytes      24%       4.11      43s     105s

paul@123

433

MODE 3          12288 bytes     31%       3.25      34s

paul@123

434

MODE 4, 5       19968 bytes     50%       2         20s

paul@123

435

MODE 6          19968 bytes     50%       2         20s     50s

The review of the Electron in Practical Computing (October 1983) provides a

paul@123

438

concise overview of the RAM access limitations and gives timing comparisons

paul@123

439

between modes and BBC Micro performance. In the above, "BBC-10s" is the

paul@123

440

measured or stated time given for a program taking 10 seconds on the BBC

paul@123

441

Micro, whereas "BBC-34s" is the apparently measured time given for the

paul@123

442

"Persian" program taking 34 seconds to complete on the BBC Micro, with a

paul@123

443

"quick" mode presumably switching to MODE 6 using the ULA directly in order to

paul@123

444

reduce display bandwidth usage while the program draws to the screen.

paul@123

445

Evidently, the measured slowdown is slightly lower than the theoretical

paul@123

446

slowdown, most likely due to the running time not being entirely dominated by

paul@123

447

RAM access performance characteristics.

Video Timing

paul@40

450

------------

According to 8.7 in the Service Manual, and the PAL Wikipedia page,

paul@40

453

approximately 4.7�s is used for the sync pulse, 5.7�s for the "back porch"

paul@40

454

(including the "colour burst"), and 1.65�s for the "front porch", totalling

paul@40

455

12.05�s and thus leaving 51.95�s for the active video signal for each

paul@40

456

scanline. As the Service Manual suggests in the oscilloscope traces, the

paul@40

457

display information is transmitted more or less centred within the active

paul@40

458

video period since the ULA will only be providing pixel data for 40�s in each

paul@40

459

scanline.

Each 62.5ns cycle happens to correspond to 64�s divided by 1024, meaning that

paul@39

462

each scanline can be divided into 1024 cycles, although only 640 at most are

paul@40

463

actively used to provide pixel data. Pixel data production should only occur

paul@40

464

within a certain period on each scanline, approximately 262 cycles after the

paul@40

465

start of hsync:

  active video period = 51.95�s

paul@40

468

  pixel data period = 40�s

paul@40

469

  total silent period = 51.95�s - 40�s = 11.95�s

paul@40

470

  silent periods (before and after) = 11.95�s / 2 = 5.975�s

paul@40

471

  hsync and back porch period = 4.7�s + 5.7�s = 10.4�s

paul@40

472

  time before pixel data period = 10.4�s + 5.975�s = 16.375�s

paul@40

473

  pixel data period start cycle = 16.375�s / 62.5ns = 262

By choosing a number divisible by 8, the RAM access mechanism can be

paul@84

476

synchronised with the pixel production. Thus, 256 is a more appropriate start

paul@84

477

cycle, where the HS (horizontal sync) signal corresponding to the 4�s sync

paul@84

478

pulse (or "normal sync" pulse as described by the "PAL TV timing and voltages"

paul@84

479

document) occurs at cycle 0.

To summarise:

  HS signal starts at cycle 0 on each horizontal scanline

paul@84

484

  HS signal ends approximately 4�s later at cycle 64

paul@84

485

  Pixel data starts approximately 12�s later at cycle 256

"Re: Electron Memory Contention" provides measurements that appear consistent

paul@84

488

with these calculations.

The "vertical blanking period", meaning the period before picture information

paul@78

491

in each field is 25 lines out of 312 (or 313) and thus lasts for 1.6ms. Of

paul@78

492

this, 2.5 lines occur before the vsync (field sync) which also lasts for 2.5

paul@78

493

lines. Thus, the first visible scanline on the first field of a frame occurs

paul@84

494

half way through the 23rd scanline period measured from the start of vsync

paul@84

495

(indicated by "V" in the diagrams below):

                                        10                  20    23

paul@40

498

  Line in frame:       1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8

paul@40

499

    Line from 1:       0                                          22 3

paul@40

500

 Line on screen: .:::::VVVVV:::::                                   12233445566

paul@40

501

                  |_________________________________________________|

paul@40

502

                           25 line vertical blanking period

In the second field of a frame, the first visible scanline coincides with the

paul@40

505

24th scanline period measured from the start of line 313 in the frame:

               310                                                 336

paul@40

508

  Line in frame: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

paul@78

509

  Line from 313:       0                                            23 4

paul@40

510

 Line on screen: 88:::::VVVVV::::                                    11223344

paul@40

511

               288 |                                                 |

paul@40

512

                   |_________________________________________________|

paul@40

513

                            25 line vertical blanking period

In order to consider only full lines, we might consider the start of each

paul@40

516

frame to occur 23 lines after the start of vsync.

Again, it is likely that pixel data production should only occur on scanlines

paul@40

519

within a certain period on each frame. The "625/50" document indicates that

paul@40

520

only a certain region is "safe" to use, suggesting a vertically centred region

paul@84

521

with approximately 15 blank lines above and below the picture. However, the

paul@84

522

"PAL TV timing and voltages" document suggests 28 blank lines above and below

paul@84

523

the picture. This would centre the 256 lines within the 312 lines of each

paul@84

524

field and thus provide a start of picture approximately 5.5 or 5 lines after

paul@84

525

the end of the blanking period or 28 or 27.5 lines after the start of vsync.

To summarise:

  CSYNC signal starts at cycle 0

paul@84

530

  CSYNC signal ends approximately 160�s (2.5 lines) later at cycle 2560

paul@84

531

  Start of line occurs approximately 1632�s (5.5 lines) later at cycle 28672

See: http://en.wikipedia.org/wiki/PAL

paul@57

534

See: http://en.wikipedia.org/wiki/Analog_television#Structure_of_a_video_signal

paul@57

535

See: The 625/50 PAL Video Signal and TV Compatible Graphics Modes

paul@57

536

     http://lipas.uwasa.fi/~f76998/video/modes/

paul@57

537

See: PAL TV timing and voltages

paul@57

538

     http://www.retroleum.co.uk/electronics-articles/pal-tv-timing-and-voltages/

paul@57

539

See: Line Standards

paul@57

540

     http://www.pembers.freeserve.co.uk/World-TV-Standards/Line-Standards.html

paul@84

541

See: Horizontal Blanking Interval of 405-, 525-, 625- and 819-Line Standards

paul@84

542

     http://www.pembers.freeserve.co.uk/World-TV-Standards/HBI.pdf

paul@84

543

See: Re: Electron Memory Contention

paul@84

544

     http://www.stardot.org.uk/forums/viewtopic.php?p=134109#p134109

RAM Integrated Circuits

paul@56

547

-----------------------

Unicorn Electronics appears to offer 4164 RAM chips (as well as 6502 series

paul@65

550

CPUs such as the 6502, 6502A, 6502B and 65C02). These 4164 devices are

paul@65

551

available in 100ns (4164-100), 120ns (4164-120) and 150ns (4164-150) variants,

paul@73

552

have 16 pins and address 65536 bits through a 1-bit wide channel. Similarly,

paul@73

553

ByteDelight.com sell 4164 devices primarily for the ZX Spectrum.

The documentation for the Electron mentions 4164-15 RAM chips for IC4-7, and

paul@64

556

the Samsung-produced KM41464 series is apparently equivalent to the Texas

paul@56

557

Instruments 4164 chips presumably used in the Electron.

The TM4164EC4 series combines 4 64K x 1b units into a single package and

paul@57

560

appears similar to the TM4164EA4 featured on the Electron's circuit diagram

paul@57

561

(in the Advanced User Guide but not the Service Manual), and it also has 22

paul@56

562

pins providing 3 additional inputs and 3 additional outputs over the 16 pins

paul@57

563

of the individual 4164-15 modules, presumably allowing concurrent access to

paul@57

564

the packaged memory units.

As far as currently available replacements are concerned, the NTE4164 is a

paul@57

567

potential candidate: according to the Vetco Electronics entry, it is

paul@57

568

supposedly a replacement for the TMS4164-15 amongst many other parts. Similar

paul@57

569

parts include the NTE2164 and the NTE6664, both of which appear to have

paul@57

570

largely the same performance and connection characteristics. Meanwhile, the

paul@58

571

NTE21256 appears to be a 16-pin replacement with four times the capacity that

paul@58

572

maintains the single data input and output pins. Using the NTE21256 as a

paul@57

573

replacement for all ICs combined would be difficult because of the single bit

paul@57

574

output.

Another device equivalent to the 4164-15 appears to be available under the

paul@57

577

code 41662 from Jameco Electronics as the Siemens HYB 4164-2. The Jameco Web

paul@57

578

site lists data sheets for other devices on the same page, but these are

paul@57

579

different and actually appear to be provided under the 41574 product code (but

paul@57

580

are listed under 41464-10) and appear to be replacements for the TM4164EC4:

paul@57

581

the Samsung KM41464A-15 and NEC �PD41464 employ 18 pins, eliminating 4 pins by

paul@57

582

employing 4 pins for both input and output.

            Pins    I/O pins    Row access  Column access

paul@64

585

            ----    --------    ----------  -------------

paul@64

586

TM4164EC4   22      4 + 4       150ns (15)  90ns (15)

paul@64

587

KM41464AP   18      4           150ns (15)  75ns (15)

paul@64

588

NTE21256    16      1 + 1       150ns       75ns

paul@64

589

HYB 4164-2  16      1 + 1       150ns       100ns

paul@64

590

�PD41464    18      4           120ns (12)  60ns (12)

See: TM4164EC4 65,536 by 4-Bit Dynamic RAM Module

paul@136

593

     https://www.rocelec.com/part/REITM4164EC4-15L

paul@65

594

See: Dynamic RAMS

paul@65

595

     http://www.unicornelectronics.com/IC/DYNAMIC.html

paul@73

596

See: New old stock 8x 4164 chips

paul@73

597

     http://www.bytedelight.com/?product=8x-4164-chips-new-old-stock

paul@56

598

See: KM4164B 64K x 1 Bit Dynamic RAM with Page Mode

paul@56

599

     http://images.ihscontent.net/vipimages/VipMasterIC/IC/SAMS/SAMSD020/SAMSD020-45.pdf

paul@57

600

See: NTE2164 Integrated Circuit 65,536 X 1 Bit Dynamic Random Access Memory

paul@57

601

     http://www.vetco.net/catalog/product_info.php?products_id=2806

paul@56

602

See: NTE4164 - IC-NMOS 64K DRAM 150NS

paul@56

603

     http://www.vetco.net/catalog/product_info.php?products_id=3680

paul@56

604

See: NTE21256 - IC-256K DRAM 150NS

paul@56

605

     http://www.vetco.net/catalog/product_info.php?products_id=2799

paul@56

606

See: NTE21256 262,144-Bit Dynamic Random Access Memory (DRAM)

paul@56

607

     http://www.nteinc.com/specs/21000to21999/pdf/nte21256.pdf

paul@57

608

See: NTE6664 - IC-MOS 64K DRAM 150NS

paul@57

609

     http://www.vetco.net/catalog/product_info.php?products_id=5213

paul@57

610

See: NTE6664 Integrated Circuit 64K-Bit Dynamic RAM

paul@57

611

     http://www.nteinc.com/specs/6600to6699/pdf/nte6664.pdf

paul@57

612

See: 4164-150: MAJOR BRANDS

paul@57

613

     http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41662_-1

paul@57

614

See: HYB 4164-1, HYB 4164-2, HYB 4164-3 65,536-Bit Dynamic Random Access Memory (RAM)

paul@57

615

     http://www.jameco.com/Jameco/Products/ProdDS/41662SIEMENS.pdf

paul@57

616

See: KM41464A NMOS DRAM 64K x 4 Bit Dynamic RAM with Page Mode

paul@57

617

     http://www.jameco.com/Jameco/Products/ProdDS/41662SAM.pdf

paul@57

618

See: NEC �41464 65,536 x 4-Bit Dynamic NMOS RAM

paul@57

619

     http://www.jameco.com/Jameco/Products/ProdDS/41662NEC.pdf

paul@57

620

See: 41464-10: MAJOR BRANDS

paul@57

621

     http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41574_-1

Interrupts

paul@43

624

----------

The ULA generates IRQs (maskable interrupts) according to certain conditions

paul@43

627

and these conditions are controlled by location &FE00:

  * Vertical sync (bottom of displayed screen)

paul@43

630

  * 50MHz real time clock

paul@43

631

  * Transmit data empty

paul@43

632

  * Receive data full

paul@43

633

  * High tone detect

The ULA is also used to clear interrupt conditions through location &FE05. Of

paul@43

636

particular significance is bit 7, which must be set if an NMI (non-maskable

paul@43

637

interrupt) has occurred and has thus suspended ULA access to memory, restoring

paul@43

638

the normal function of the ULA.

ROM Paging

paul@43

641

----------

Accessing different ROMs involves bits 0 to 3 of &FE05. Some special ROM

paul@43

644

mappings exist:

   8    keyboard

paul@43

647

   9    keyboard (duplicate)

paul@43

648

  10    BASIC ROM

paul@43

649

  11    BASIC ROM (duplicate)

Paging in a ROM involves the following procedure:

 1. Assert ROM page enable (bit 3) together with a ROM number n in bits 0 to

paul@43

654

    2, corresponding to ROM number 8+n, such that one of ROMs 12 to 15 is

paul@43

655

    selected.

paul@43

656

 2. Where a ROM numbered from 0 to 7 is to be selected, set bit 3 to zero

paul@43

657

    whilst writing the desired ROM number n in bits 0 to 2.

See: http://stardot.org.uk/forums/viewtopic.php?p=136686#p136686

Keyboard Access

paul@117

662

---------------

The keyboard pages appear to be accessed at 1MHz just like the RAM.

See: https://stardot.org.uk/forums/viewtopic.php?p=254155#p254155

Shadow/Expanded Memory

paul@37

669

----------------------

The Electron exposes all sixteen address lines and all eight data lines

paul@37

672

through the expansion bus. Using such lines, it is possible to provide

paul@37

673

additional memory - typically sideways ROM and RAM - on expansion cards and

paul@37

674

through cartridges, although the official cartridge specification provides

paul@37

675

fewer address lines and only seeks to provide access to memory in 16K units.

Various modifications and upgrades were developed to offer "turbo"

paul@37

678

capabilities to the Electron, permitting the CPU to access a separate 8K of

paul@37

679

RAM at 2MHz, presumably preventing access to the low 8K of RAM accessible via

paul@37

680

the ULA through additional logic. However, an enhanced ULA might support

paul@37

681

independent CPU access to memory over the expansion bus by allowing itself to

paul@37

682

be discharged from providing access to memory, potentially for a range of

paul@37

683

addresses, and for the CPU to communicate with external memory uninterrupted.

Sideways RAM/ROM and Upper Memory Access

paul@72

686

----------------------------------------

Although the ULA controls the CPU clock, effectively slowing or stopping the

paul@72

689

CPU when the ULA needs to access screen memory, it is apparently able to allow

paul@72

690

the CPU to access addresses of &8000 and above - the upper region of memory -

paul@72

691

at 2MHz independently of any access to RAM that the ULA might be performing,

paul@72

692

only blocking the CPU if it attempts to access addresses of &7FFF and below

paul@72

693

during any ULA memory access - the lower region of memory - by stopping or

paul@72

694

stalling its clock.

Thus, the ULA remains aware of the level of the A15 line, only inhibiting the

paul@72

697

CPU clock if the line goes low, when the CPU is attempting to access the lower

paul@72

698

region of memory.

Hardware Scrolling (and Enhancement)

paul@79

701

------------------------------------

On the standard ULA, &FE02 and &FE03 map to a 9 significant bits address with

paul@0

704

the least significant 5 bits being zero, thus limiting the scrolling

paul@0

705

resolution to 64 bytes. An enhanced ULA could support a resolution of 2 bytes

paul@0

706

using the same layout of these addresses.

|--&FE02--------------| |--&FE03--------------|

paul@0

709

XX XX 14 13 12 11 10 09 08 07 06 XX XX XX XX XX

   XX 14 13 12 11 10 09 08 07 06 05 04 03 02 01 XX

Arguably, a resolution of 8 bytes is more useful, since the mapping of screen

paul@4

714

memory to pixel locations is character oriented. A change in 8 bytes would

paul@4

715

permit a horizontal scrolling resolution of 2 pixels in MODE 2, 4 pixels in

paul@4

716

MODE 1 and 5, and 8 pixels in MODE 0, 3 and 6. This resolution is actually

paul@4

717

observed on the BBC Micro (see 18.11.2 in the BBC Microcomputer Advanced User

paul@4

718

Guide).

One argument for a 2 byte resolution is smooth vertical scrolling. A pitfall

paul@4

721

of changing the screen address by 2 bytes is the change in the number of lines

paul@4

722

from the initial and final character rows that need reading by the ULA, which

paul@9

723

would need to maintain this state information (although this is a relatively

paul@9

724

trivial change). Another pitfall is the complication that might be introduced

paul@9

725

to software writing bitmaps of character height to the screen.

See: http://pastraiser.com/computers/acornelectron/acornelectron.html

Enhancement: Mode Layouts

paul@82

730

-------------------------

Merely changing the screen memory mappings in order to have Archimedes-style

paul@82

733

row-oriented screen addresses (instead of character-oriented addresses) could

paul@82

734

be done for the existing modes, but this might not be sufficiently beneficial,

paul@82

735

especially since accessing regions of the screen would involve incrementing

paul@82

736

pointers by amounts that are inconvenient on an 8-bit CPU.

However, instead of using a Archimedes-style mapping, column-oriented screen

paul@82

739

addresses could be more feasibly employed: incrementing the address would

paul@82

740

reference the vertical screen location below the currently-referenced location

paul@82

741

(just as occurs within characters using the existing ULA); instead of

paul@82

742

returning to the top of the character row and referencing the next horizontal

paul@82

743

location after eight bytes, the address would reference the next character row

paul@82

744

and continue to reference locations downwards over the height of the screen

paul@82

745

until reaching the bottom; at the bottom, the next location would be the next

paul@82

746

horizontal location at the top of the screen.

In other words, the memory layout for the screen would resemble the following

paul@82

749

(for MODE 2):

  &3000 &3100       ... &7F00

paul@82

752

  &3001 &3101

paul@82

753

  ...   ...

paul@82

754

  &3007

paul@82

755

  &3008

paul@82

756

...

paul@82

757

  ...                   ...

paul@82

758

  &30FF             ... &7FFF

Since there are 256 pixel rows, each column of locations would be addressable

paul@82

761

using the low byte of the address. Meanwhile, the high byte would be

paul@82

762

incremented to address different columns. Thus, addressing screen locations

paul@82

763

would become a lot more convenient and potentially much more efficient for

paul@82

764

certain kinds of graphical output.

One potential complication with this simplified addressing scheme arises with

paul@82

767

hardware scrolling. Vertical hardware scrolling by one pixel row (not supported

paul@82

768

with the existing ULA) would be achieved by incrementing or decrementing the

paul@82

769

screen start address; by one character row, it would involve adding or

paul@82

770

subtracting 8. However, the ULA only supports multiples of 64 when changing the

paul@82

771

screen start address. Thus, if such a scheme were to be adopted, three

paul@82

772

additional bits would need to be supported in the screen start register (see

paul@82

773

"Hardware Scrolling (and Enhancement)" for more details). However, horizontal

paul@82

774

scrolling would be much improved even under the severe constraints of the

paul@82

775

existing ULA: only adjustments of 256 to the screen start address would be

paul@82

776

required to produce single-location scrolling of as few as two pixels in MODE 2

paul@82

777

(four pixels in MODEs 1 and 5, eight pixels otherwise).

More disruptive is the effect of this alternative layout on software.

paul@82

780

Presumably, compatibility with the BBC Micro was the primary goal of the

paul@82

781

Electron's hardware design. With the character-oriented screen layout in

paul@82

782

place, system software (and application software accessing the screen

paul@82

783

directly) would be relying on this layout to run on the Electron with little

paul@82

784

or no modification. Although it might have been possible to change the system

paul@82

785

software to use this column-oriented layout instead, this would have incurred

paul@82

786

a development cost and caused additional work porting things like games to the

paul@82

787

Electron. Moreover, a separate branch of the software from that supporting the

paul@82

788

BBC Micro and closer derivatives would then have needed maintaining.

The decision to use the character-oriented layout in the BBC Micro may have

paul@82

791

been related to the choice of circuitry and to facilitate a convenient

paul@82

792

hardware implementation, and by the time the Electron was planned, it was too

paul@82

793

late to do anything about this somewhat unfortunate choice.

Pixel Layouts

paul@89

796

-------------

The pixel layouts are as follows:

  Modes         Depth (bpp)     Pixels (from bits)

paul@89

801

  -----         -----------     ------------------

paul@89

802

  0, 3, 4, 6    1               7 6 5 4 3 2 1 0

paul@89

803

  1, 5          2               73 62 51 40

paul@89

804

  2             4               7531 6420

Since the ULA reads a half-byte at a time, one might expect it to attempt to

paul@89

807

produce pixels for every half-byte, as opposed to handling entire bytes.

paul@89

808

However, the pixel layout is not conducive to producing pixels as soon as a

paul@89

809

half-byte has been read for a given full-byte location: in 1bpp modes the

paul@89

810

first four pixels can indeed be produced, but in 2bpp and 4bpp modes the pixel

paul@89

811

data is spread across the entire byte in different ways.

An alternative arrangement might be as follows:

  Modes         Depth (bpp)     Pixels (from bits)

paul@89

816

  -----         -----------     ------------------

paul@89

817

  0, 3, 4, 6    1               7 6 5 4 3 2 1 0

paul@89

818

  1, 5          2               76 54 32 10

paul@89

819

  2             4               7654 3210

Just as the mode layouts were presumably decided by compatibility with the BBC

paul@89

822

Micro, the pixel layouts will have been maintained for similar reasons.

paul@89

823

Unfortunately, this layout prevents any optimisation of the ULA for handling

paul@89

824

half-byte pixel data generally.

Enhancement: The Missing MODE 4

paul@79

827

-------------------------------

The Electron inherits its screen mode selection from the BBC Micro, where MODE

paul@79

830

3 is a text version of MODE 0, and where MODE 6 is a text version of MODE 4.

paul@79

831

Neither MODE 3 nor MODE 6 is a genuine character-based text mode like MODE 7,

paul@79

832

however, and they are merely implemented by skipping two scanlines in every

paul@79

833

ten after the eight required to produce a character line. Thus, such modes

paul@79

834

provide a 24-row display.

In principle, nothing prevents this "text mode" effect being applied to other

paul@79

837

modes. The 20-column modes are not well-suited to displaying text, which

paul@79

838

leaves MODE 1 which, unlike MODEs 3 and 6, can display 4 colours rather than

paul@79

839

2. Although the need for a non-monochrome 40-column text mode is addressed by

paul@79

840

MODE 7 on the BBC Micro, the Electron lacks such a mode.

If the 4-colour, 24-row variant of MODE 1 were to be provided, logically it

paul@79

843

would occupy MODE 4 instead of the current MODE 4:

  Screen mode  Size (kilobytes)  Colours  Rows  Resolution

paul@79

846

  -----------  ----------------  -------  ----  ----------

paul@79

847

  0            20                2        32    640x256

paul@79

848

  1            20                4        32    320x256

paul@79

849

  2            20                16       32    160x256

paul@79

850

  3            16                2        24    640x256

paul@79

851

  4 (new)      16                4        24    320x256

paul@79

852

  4 (old)      10                2        32    320x256

paul@79

853

  5            10                4        32    160x256

paul@79

854

  6            8                 2        24    320x256

Thus, for increasing mode numbers, the size of each mode would be the same or

paul@79

857

less than the preceding mode.

Enhancement: Display Mode Property Control

paul@128

860

------------------------------------------

It is rather curious that the ULA supports the mode numbers directly in bits 3

paul@128

863

to 5 of &FE07 since these would presumably need to be decoded in order to set

paul@128

864

the fundamental properties of the display mode. These properties are as

paul@128

865

follows:

 * Screen data retrieval rate: number of fetches per pair of 2MHz cycles

paul@128

868

 * Pixel colour depth

paul@128

869

 * Text mode vertical spacing

From these, the following properties emerge:

  Property                        Influences

paul@129

874

  --------                        ----------

paul@129

875

  Character row size (bytes)      Retrieval rate

  Number of character rows        Text mode setting

  Display size (bytes)            Retrieval rate (character row size)

paul@129

880

                                  Text mode setting (number of rows)

  Pixel frequency                 Retrieval rate

paul@129

883

  Horizontal resolution (pixels)  Colour depth

One can imagine a register bitfield arrangement as follows:

  Field             Values                  Formula

paul@129

888

  -----             ------                  -------

paul@129

889

  Pixel depth       00: 1 bit per pixel     log2(depth)

paul@129

890

                    01: 2 bits per pixel

paul@129

891

                    10: 4 bits per pixel

  Retrieval rate     0: twice               2 - fetches per cycle pair

paul@129

894

                     1: once

  Text mode enable   0: disable/off         text mode enabled

paul@129

897

                     1: enable/on

This arrangement would require four bits. However, one bit in &FE07 is

paul@128

900

seemingly inactive and might possibly be reallocated.

The resulting combination of properties would permit all of the existing modes

paul@128

903

plus some additional ones, including the missing MODE 4 mentioned above. With

paul@128

904

the bitfields above ordered from the most significant bits to the least

paul@128

905

significant bits providing the low-level "mode" values, the following table

paul@128

906

can be produced:

  Screen mode  Depth Rate   Text  Size (K)  Colours  Rows  Resolution

paul@128

909

  -----------  ----- ----   ----  --------  -------  ----  ----------

paul@128

910

  0  (0000)    1     twice  off   20        2        32    640x256    (MODE 0)

paul@128

911

  1  (0001)    1     twice  on    16        2        24    640x256    (MODE 3)

paul@128

912

  2  (0010)    1     once   off   10        2        32    320x256    (MODE 4)

paul@128

913

  3  (0011)    1     once   on    8         2        24    320x256    (MODE 6)

paul@128

914

  4  (0100)    2     twice  off   20        4        32    320x256    (MODE 1)

paul@128

915

  5  (0101)    2     twice  on    16        4        24    320x256

paul@128

916

  6  (0110)    2     once   off   10        4        32    160x256    (MODE 5)

paul@128

917

  7  (0111)    2     once   on    8         4        24    160x256

paul@128

918

  8  (1000)    4     twice  off   20        16       32    160x256    (MODE 2)

paul@128

919

  9  (1001)    4     twice  on    16        16       24    160x256

paul@128

920

  10 (1010)    4     once   off   10        16       32    80x256

paul@128

921

  11 (1011)    4     once   on    8         16       24    80x256

The existing modes would be covered in a way that is incompatible with the

paul@128

924

existing numbering, thus requiring a table in software, but additional text

paul@128

925

modes would be provided for MODE 1, MODE 5 and MODE 2. An additional two lower

paul@128

926

resolution modes would also be conceivable within this scheme, requiring the

paul@128

927

stretching of 16MHz pixels by a factor of eight to yield 80 pixels per

paul@128

928

scanline. The utility of such modes is questionable and such modes might not

paul@128

929

be supported.

Enhancement: 2MHz RAM Access

paul@76

932

----------------------------

Given that the CPU and ULA both access RAM at 2MHz, but given that the CPU

paul@76

935

when not competing with the ULA only accesses RAM every other 2MHz cycle (as

paul@76

936

if the ULA still needed to access the RAM), one useful enhancement would be a

paul@76

937

mechanism to let the CPU take over the ULA cycles outside the ULA's period of

paul@76

938

activity comparable to the way the ULA takes over the CPU cycles in MODE 0 to

paul@76

939

3.

Thus, the RAM access cycles would resemble the following in MODE 0 to 3:

  Upon a transition from display cycles: UUUUCCCC (instead of UUUUC_C_)

paul@76

944

  On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)

In MODE 4 to 6:

  Upon a transition from display cycles: CUCUCCCC (instead of CUCUC_C_)

paul@76

949

  On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)

This would improve CPU bandwidth as follows:

                Standard ULA    Enhanced ULA    % Total Bandwidth   Speedup

paul@118

954

MODE 0, 1, 2    9728 bytes      19456 bytes     24% -> 49%          2

paul@118

955

MODE 3          12288 bytes     24576 bytes     31% -> 62%          2

paul@118

956

MODE 4, 5       19968 bytes     29696 bytes     50% -> 74%          1.5

paul@118

957

MODE 6          19968 bytes     32256 bytes     50% -> 81%          1.6

(Here, the uncontended total 2MHz bandwidth for a display period would be

paul@118

960

39936 bytes, being 128 cycles per line over 312 lines.)

With such an enhancement, MODE 0 to 3 experience a doubling of CPU bandwidth

paul@76

963

because all access opportunities to RAM are doubled. Meanwhile, in the other

paul@76

964

modes, some CPU accesses occur alongside ULA accesses and thus cannot be

paul@76

965

doubled, but the CPU bandwidth increase is still significant.

Unfortunately, the mechanism for accessing the RAM is too slow to provide data

paul@109

968

within the time constraints of 2MHz operation. There is no time remaining in a

paul@118

969

2MHz cycle for the CPU to receive and process any retrieved data once the

paul@124

970

necessary signalling has been performed.

The only way for the CPU to be able to access the RAM quickly enough would be

paul@124

973

to do away with the double 4-bit access mechanism and to have a single 8-bit

paul@124

974

channel to the memory. This would require twice as many 1-bit RAM chips or a

paul@124

975

different kind of RAM chip, but it would also potentially simplify the ULA.

The section on 8-bit wide RAM access discusses the possibilities around

paul@124

978

changing the memory architecture, also describing the possibility of ULA

paul@124

979

accesses achieving two bytes per 2MHz cycle due to the doubling of the memory

paul@124

980

channel, leaving every other access free for the CPU during the display period

paul@124

981

in MODE 0 to 3...

  Standard display period: UUUUUUUU

paul@124

984

  Modified display period: UCUCUCUC

...and consolidating accesses in MODE 4 to 6:

  Standard display period: UCUCUCUC

paul@124

989

  Modified display period: UCCCUCCC

Together with the enhancements for non-display periods, such an "Enhanced+ ULA"

paul@124

992

would perform as follows:

                Standard ULA    Enhanced+ ULA   % Total Bandwidth   Speedup

paul@124

995

MODE 0, 1, 2    9728 bytes      29696 bytes     24% -> 74%          3.1

paul@124

996

MODE 3          12288 bytes     32256 bytes     31% -> 81%          2.6

paul@124

997

MODE 4, 5       19968 bytes     34816 bytes     50% -> 87%          1.7

paul@124

998

MODE 6          19968 bytes     36096 bytes     50% -> 90%          1.8

Of course, the principal enhancement would be the wider memory channel, with

paul@124

1001

more buffering in the ULA being its contribution to this arrangement.

Enhancement: Region Blanking

paul@55

1004

----------------------------

The problem of permitting character-oriented blitting in programs whilst

paul@4

1007

scrolling the screen by sub-character amounts could be mitigated by permitting

paul@4

1008

a region of the display to be blank, such as the final lines of the display.

paul@4

1009

Consider the following vertical scrolling by 2 bytes that would cause an

paul@4

1010

initial character row of 6 lines and a final character row of 2 lines:

    6 lines - initial, partial character row

paul@4

1013

  248 lines - 31 complete rows

paul@4

1014

    2 lines - final, partial character row

If a routine were in use that wrote 8 line bitmaps to the partial character

paul@4

1017

row now split in two, it would be advisable to hide one of the regions in

paul@4

1018

order to prevent content appearing in the wrong place on screen (such as

paul@4

1019

content meant to appear at the top "leaking" onto the bottom). Blanking 6

paul@4

1020

lines would be sufficient, as can be seen from the following cases.

Scrolling up by 2 lines:

    6 lines - initial, partial character row

paul@4

1025

  240 lines - 30 complete rows

paul@4

1026

    4 lines - part of 1 complete row

paul@4

1027

  -----------------------------------------------------------------

paul@4

1028

    4 lines - part of 1 complete row (hidden to maintain 250 lines)

paul@4

1029

    2 lines - final, partial character row (hidden)

Scrolling down by 2 lines:

    2 lines - initial, partial character row

paul@4

1034

  248 lines - 31 complete rows

paul@4

1035

  ----------------------------------------------------------

paul@4

1036

    6 lines - final, partial character row (hidden)

Thus, in this case, region blanking would impose a 250 line display with the

paul@24

1039

bottom 6 lines blank.

See the description of the display suspend enhancement for a more efficient

paul@74

1042

way of blanking lines than merely blanking the palette whilst allowing the CPU

paul@74

1043

to perform useful work during the blanking period.

To control the blanking or suspending of lines at the top and bottom of the

paul@74

1046

display, a memory location could be dedicated to the task: the upper 4 bits

paul@74

1047

could define a blanking region of up to 16 lines at the top of the screen,

paul@74

1048

whereas the lower 4 bits could define such a region at the bottom of the

paul@74

1049

screen. If more lines were required, two locations could be employed, allowing

paul@74

1050

the top and bottom regions to occupy the entire screen.

Enhancement: Screen Height Adjustment

paul@55

1053

-------------------------------------

The height of the screen could be configurable in order to reduce screen

paul@24

1056

memory consumption. This is not quite done in MODE 3 and 6 since the start of

paul@24

1057

the screen appears to be rounded down to the nearest page, but by reducing the

paul@24

1058

height by amounts more than a page, savings would be possible. For example:

  Screen width  Depth  Height  Bytes per line  Saving in bytes  Start address

paul@24

1061

  ------------  -----  ------  --------------  ---------------  -------------

paul@24

1062

  640           1      252     80              320              &3140 -> &3100

paul@24

1063

  640           1      248     80              640              &3280 -> &3200

paul@24

1064

  320           1      240     40              640              &5A80 -> &5A00

paul@24

1065

  320           2      240     80              1280             &3500

Screen Mode Selection

paul@55

1068

---------------------

Bits 3, 4 and 5 of address &FE*7 control the selected screen mode. For a wider

paul@55

1071

range of modes, the other bits of &FE*7 (related to sound, cassette

paul@55

1072

input/output and the Caps Lock LED) would need to be reassigned and bit 0

paul@55

1073

potentially being made available for use.

Enhancement: Palette Definition

paul@58

1076

-------------------------------

Since all memory accesses go via the ULA, an enhanced ULA could employ more

paul@0

1079

specific addresses than &FE*X to perform enhanced functions. For example, the

paul@0

1080

palette control is done using &FE*8-F and merely involves selecting predefined

paul@0

1081

colours, whereas an enhanced ULA could support the redefinition of all 16

paul@0

1082

colours using specific ranges such as &FE18-F (colours 0 to 7) and &FE28-F

paul@0

1083

(colours 8 to 15), where a single byte might provide 8 bits per pixel colour

paul@0

1084

specifications similar to those used on the Archimedes.

The principal limitation here is actually the hardware: the Electron has only

paul@4

1087

a single output line for each of the red, green and blue channels, and if

paul@4

1088

those outputs are strictly digital and can only be set to a "high" and "low"

paul@4

1089

value, then only the existing eight colours are possible. If a modern ULA were

paul@81

1090

able to output analogue values (or values at well-defined points between the

paul@81

1091

high and low values, such as the half-on value supported by the Amstrad CPC

paul@81

1092

series), it would still need to be assessed whether the circuitry could

paul@81

1093

successfully handle and propagate such values. Various sources indicate that

paul@81

1094

only "TTL levels" are supported by the RGB output circuit, and since there are

paul@81

1095

74LS08 AND logic gates involved in the RGB component outputs from the ULA, it

paul@81

1096

is likely that the ULA is expected to provide only "high" or "low" values.

Short of adding extra outputs from the ULA (either additional red, green and

paul@81

1099

blue outputs or a combined intensity output), another approach might involve

paul@81

1100

some kind of modulation where an output value might be encoded in multiple

paul@81

1101

pulses at a higher frequency than the pixel frequency. However, this would

paul@81

1102

demand additional circuitry outside the ULA, and component RGB monitors would

paul@81

1103

probably not be able to take advantage of this feature; only UHF and composite

paul@81

1104

video devices (the latter with the composite video colour support enabled on

paul@81

1105

the Electron's circuit board) would potentially benefit.

Flashing Colours

paul@51

1108

----------------

According to the Advanced User Guide, "The cursor and flashing colours are

paul@51

1111

entirely generated in software: This means that all of the logical to physical

paul@51

1112

colour map must be changed to cause colours to flash." This appears to suggest

paul@51

1113

that the palette registers must be updated upon the flash counter - read and

paul@51

1114

written by OSBYTE &C1 (193) - reaching zero and that some way of changing the

paul@51

1115

colour pairs to be any combination of colours might be possible, instead of

paul@52

1116

having colour complements as pairs.

It is conceivable that the interrupt code responsible does the simple thing

paul@54

1119

and merely inverts the current values for any logical colours (LC) for which

paul@54

1120

the associated physical colour (as supplied as the second parameter to the VDU

paul@54

1121

19 call) has the top bit of its four bit value set. These top bits are not

paul@52

1122

recorded in the palette registers but are presumably recorded separately and

paul@52

1123

used to build bitmaps as follows:

  LC  2 colour  4 colour  16 colour  4-bit value for inversion

paul@54

1126

  --  --------  --------  ---------  -------------------------

paul@54

1127

   0  00010001  00010001  00010001   1, 1, 1

paul@54

1128

   1  01000100  00100010  00010001   4, 2, 1

paul@54

1129

   2            01000100  00100010      4, 2

paul@54

1130

   3            10001000  00100010      8, 2

paul@54

1131

   4                      00010001         1

paul@54

1132

   5                      00010001         1

paul@54

1133

   6                      00100010         2

paul@54

1134

   7                      00100010         2

paul@54

1135

   8                      01000100         4

paul@54

1136

   9                      01000100         4

paul@54

1137

  10                      10001000         8

paul@54

1138

  11                      10001000         8

paul@54

1139

  12                      01000100         4

paul@54

1140

  13                      01000100         4

paul@54

1141

  14                      10001000         8

paul@54

1142

  15                      10001000         8

  Inversion value calculation:

   2 colour formula: 1 << (colour * 2)

paul@54

1147

   4 colour formula: 1 << colour

paul@54

1148

  16 colour formula: 1 << ((colour & 2) + ((colour & 8) * 2))

For example, where logical colour 0 has been mapped to a physical colour in

paul@53

1151

the range 8 to 15, a bitmap of 00010001 would be chosen as its contribution to

paul@53

1152

the inversion operation. (The lower three bits of the physical colour would be

paul@53

1153

used to set the underlying colour information affected by the inversion

paul@53

1154

operation.)

An operation in the interrupt code would then combine the bitmaps for all

paul@52

1157

logical colours in 2 and 4 colour modes, with the 16 colour bitmaps being

paul@52

1158

combined for groups of logical colours as follows:

   Logical colours

paul@54

1161

   ---------------

paul@52

1162

   0,  2,  8, 10

paul@52

1163

   4,  6, 12, 14

paul@52

1164

   5,  7, 13, 15

paul@52

1165

   1,  3,  9, 11

These combined bitmaps would be EORed with the existing palette register

paul@52

1168

values in order to perform the value inversion necessary to produce the

paul@52

1169

flashing effect.

Thus, in the VDU 19 operation, the appropriate inversion value would be

paul@54

1172

calculated for the logical colour, and this value would then be combined with

paul@54

1173

other inversion values in a dedicated memory location corresponding to the

paul@54

1174

colour's group as indicated above. Meanwhile, the palette channel values would

paul@54

1175

be derived from the lower three bits of the specified physical colour and

paul@54

1176

combined with other palette data in dedicated memory locations corresponding

paul@54

1177

to the palette registers.

Interestingly, although flashing colours on the BBC Micro are controlled by

paul@72

1180

toggling bit 0 of the &FE20 control register location for the Video ULA, the

paul@72

1181

actual colour inversion is done in hardware.

Enhancement: Palette Definition Lists

paul@55

1184

-------------------------------------

It can be useful to redefine the palette in order to change the colours

paul@4

1187

available for a particular region of the screen, particularly in modes where

paul@4

1188

the choice of colours is constrained, and if an increased colour depth were

paul@4

1189

available, palette redefinition would be useful to give the illusion of more

paul@4

1190

than 16 colours in MODE 2. Traditionally, palette redefinition has been done

paul@4

1191

by using interrupt-driven timers, but a more efficient approach would involve

paul@4

1192

presenting lists of palette definitions to the ULA so that it can change the

paul@4

1193

palette at a particular display line.

One might define a palette redefinition list in a region of memory and then

paul@4

1196

communicate its contents to the ULA by writing the address and length of the

paul@4

1197

list, along with the display line at which the palette is to be changed, to

paul@4

1198

ULA registers such that the ULA buffers the list and performs the redefinition

paul@4

1199

at the appropriate time. Throughput/bandwidth considerations might impose

paul@4

1200

restrictions on the practical length of such a list, however.

A simple form of palette definition might be useful in text modes. Within the

paul@128

1203

blank region between lines, the foreground palette could be changed to apply

paul@128

1204

to the next line. Palette values could be read from a table in RAM, perhaps

paul@128

1205

preceding the screen data, with 24 2-byte entries providing palette

paul@128

1206

redefinition support in 2- and 4-colour modes.

Enhancement: Display Synchronisation Interrupts

paul@79

1209

-----------------------------------------------

When completing each scanline of the display, the ULA could trigger an

paul@79

1212

interrupt. Since this might impact system performance substantially, the

paul@79

1213

feature would probably need to be configurable, and it might be sufficient to

paul@79

1214

have an interrupt only after a certain number of display lines instead.

paul@79

1215

Permitting the CPU to take action after eight lines would allow palette

paul@79

1216

switching and other effects to occur on a character row basis.

The ULA provides an interrupt at the end of the display period, presumably so

paul@79

1219

that software can schedule updates to the screen, avoid flickering or tearing,

paul@79

1220

and so on. However, some applications might benefit from an interrupt at, or

paul@79

1221

just before, the start of the display period so that palette modifications or

paul@79

1222

similar effects could be scheduled.

Enhancement: Palette-Free Modes

paul@55

1225

-------------------------------

Palette-free modes might be defined where bit values directly correspond to

paul@4

1228

the red, green and blue channels, although this would mostly make sense only

paul@4

1229

for modes with depths greater than the standard 4 bits per pixel, and such

paul@4

1230

modes would require more memory than MODE 2 if they were to have an acceptable

paul@4

1231

resolution.

Enhancement: Display Suspend

paul@55

1234

----------------------------

Especially when writing to the screen memory, it could be beneficial to be

paul@4

1237

able to suspend the ULA's access to the memory, instead producing blank values

paul@4

1238

for all screen pixels until a program is ready to reveal the screen. This is

paul@4

1239

different from palette blanking since with a blank palette, the ULA is still

paul@4

1240

reading screen memory and translating its contents into pixel values that end

paul@4

1241

up being blank.

This function is reminiscent of a capability of the ZX81, albeit necessary on

paul@4

1244

that hardware to reduce the load on the system CPU which was responsible for

paul@62

1245

producing the video output. By allowing display suspend on the Electron, the

paul@62

1246

performance benefit would be derived from giving the CPU full access to the

paul@62

1247

memory bandwidth.

Note that since the CPU is only able to access RAM at 1MHz, there is no

paul@125

1250

possibility to improve performance beyond that achieved in MODE 4, 5 or 6

paul@125

1251

normally. However, if faster RAM access were to be made possible (see the

paul@125

1252

discussion of 8-bit wide RAM access), the CPU could benefit from freeing up

paul@125

1253

the ULA's access slots entirely.

The region blanking feature mentioned above could be implemented using this

paul@74

1256

enhancement instead of employing palette blanking for the affected lines of

paul@74

1257

the display.

Enhancement: Memory Filling

paul@63

1260

---------------------------

A capability that could be given to an enhanced ULA is that of permitting the

paul@63

1263

ULA to write to screen memory as well being able to read from it. Although

paul@63

1264

such a capability would probably not be useful in conjunction with the

paul@63

1265

existing read operations when producing a screen display, and insufficient

paul@63

1266

bandwidth would exist to do so in high-bandwidth screen modes anyway, the

paul@63

1267

capability could be offered during a display suspend period (as described

paul@63

1268

above), permitting a more efficient mechanism to rapidly fill memory with a

paul@63

1269

predetermined value.

This capability could also support block filling, where the limits of the

paul@63

1272

filled memory would be defined by the position and size of a screen area,

paul@63

1273

although this would demand the provision of additional registers in the ULA to

paul@63

1274

retain the details of such areas and additional logic to control the fill

paul@63

1275

operation.

Enhancement: Region Filling

paul@69

1278

---------------------------

An alternative to memory writing might involve indicating regions using

paul@69

1281

additional registers or memory where the ULA fills regions of the screen with

paul@69

1282

content instead of reading from memory. Unlike hardware sprites which should

paul@69

1283

realistically provide varied content, region filling could employ single

paul@69

1284

colours or patterns, and one advantage of doing so would be that the ULA need

paul@69

1285

not access memory at all within a particular region.

Regions would be defined on a row-by-row basis. Instead of reading memory and

paul@69

1288

blitting a direct representation to the screen, the ULA would read region

paul@69

1289

definitions containing a start column, region width and colour details. There

paul@69

1290

might be a certain number of definitions allowed per row, or the ULA might

paul@69

1291

just traverse an ordered list of such definitions with each one indicating the

paul@71

1292

row, start column, region width and colour details.

One could even compress this information further by requiring only the row,

paul@71

1295

start column and colour details with each subsequent definition terminating

paul@71

1296

the effect of the previous one. However, one would also need to consider the

paul@71

1297

convenience of preparing such definitions and whether efficient access to

paul@71

1298

definitions for a particular row might be desirable. It might also be

paul@71

1299

desirable to avoid having to prepare definitions for "empty" areas of the

paul@71

1300

screen, effectively making the definition of the screen contents employ

paul@71

1301

run-length encoding and employ only colour plus length information.

One application of region filling is that of simple 2D and 3D shape rendering.

paul@69

1304

Although it is entirely possible to plot such shapes to the screen and have

paul@69

1305

the ULA blit the memory contents to the screen, such operations consume

paul@69

1306

bandwidth both in the initial plotting and in the final transfer to the

paul@69

1307

screen. Region filling would reduce such bandwidth usage substantially.

This way of representing screen images would make certain kinds of images

paul@71

1310

unfeasible to represent - consider alternating single pixel values which could

paul@71

1311

easily occur in some character bitmaps - even if an internal queue of regions

paul@71

1312

were to be supported such that the ULA could read ahead and buffer such

paul@71

1313

"bandwidth intensive" areas. Thus, the ULA might be better served providing

paul@71

1314

this feature for certain areas of the display only as some kind of special

paul@71

1315

graphics window.

Enhancement: Hardware Sprites

paul@55

1318

-----------------------------

An enhanced ULA might provide hardware sprites, but this would be done in an

paul@0

1321

way that is incompatible with the standard ULA, since no &FE*X locations are

paul@34

1322

available for allocation. To keep the facility simple, hardware sprites would

paul@34

1323

have a standard byte width and height.

The specification of sprites could involve the reservation of 16 locations

paul@34

1326

(for example, &FE20-F) specifying a fixed number of eight sprites, with each

paul@34

1327

location pair referring to the sprite data. By limiting the ULA to dealing

paul@34

1328

with a fixed number of sprites, the work required inside the ULA would be

paul@35

1329

reduced since it would avoid having to deal with arbitrary numbers of sprites.

The principal limitation on providing hardware sprites is that of having to

paul@35

1332

obtain sprite data, given that the ULA is usually required to retrieve screen

paul@35

1333

data, and given the lack of memory bandwidth available to retrieve sprite data

paul@35

1334

(particularly from multiple sprites supposedly at the same position) and

paul@35

1335

screen data simultaneously. Although the ULA could potentially read sprite

paul@35

1336

data and screen data in alternate memory accesses in screen modes where the

paul@35

1337

bandwidth is not already fully utilised, this would result in a degradation of

paul@35

1338

performance.

Enhancement: Additional Screen Mode Configurations

paul@55

1341

--------------------------------------------------

Alternative screen mode configurations could be supported. The ULA has to

paul@24

1344

produce 640 pixel values across the screen, with pixel doubling or quadrupling

paul@24

1345

employed to fill the screen width:

  Screen width      Columns     Scaling     Depth       Bytes

paul@24

1348

  ------------      -------     -------     -----       -----

paul@24

1349

  640               80          x1          1           80

paul@24

1350

  320               40          x2          1, 2        40, 80

paul@24

1351

  160               20          x4          2, 4        40, 80

It must also use at most 80 byte-sized memory accesses to provide the

paul@24

1354

information for the display. Given that characters must occupy an 8x8 pixel

paul@24

1355

array, if a configuration featuring anything other than 20, 40 or 80 character

paul@24

1356

columns is to be supported, compromises must be made such as the introduction

paul@24

1357

of blank pixels either between characters (such as occurs between rows in MODE

paul@24

1358

3 and 6) or at the end of a scanline (such as occurs at the end of the frame

paul@55

1359

in MODE 3 and 6). Consider the following configuration:

  Screen width      Columns     Scaling     Depth       Bytes       Blank

paul@24

1362

  ------------      -------     -------     -----       ------      -----

paul@24

1363

  208               26          x3          1, 2        26, 52      16

Here, if the ULA can triple pixels, a 26 column mode with either 2 or 4

paul@24

1366

colours could be provided, with 16 blank pixel values (out of a total of 640)

paul@24

1367

generated either at the start or end (or split between the start and end) of

paul@24

1368

each scanline.

Enhancement: Character Attributes

paul@55

1371

---------------------------------

The BBC Micro MODE 7 employs something resembling character attributes to

paul@24

1374

support teletext displays, but depends on circuitry providing a character

paul@24

1375

generator. The ZX Spectrum, on the other hand, provides character attributes

paul@24

1376

as a means of colouring bitmapped graphics. Although such a feature is very

paul@24

1377

limiting as the sole means of providing multicolour graphics, in situations

paul@24

1378

where the choice is between low resolution multicolour graphics or high

paul@24

1379

resolution monochrome graphics, character attributes provide a potentially

paul@24

1380

useful compromise.

For each byte read, the ULA must deliver 8 pixel values (out of a total of

paul@24

1383

640) to the video output, doing so by either emptying its pixel buffer on a

paul@24

1384

pixel per cycle basis, or by multiplying pixels and thus holding them for more

paul@24

1385

than one cycle. For example for a screen mode having 640 pixels in width:

  Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15

paul@24

1388

  Reads:    B                               B

paul@24

1389

  Pixels:   0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7

And for a screen mode having 320 pixels in width:

  Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15

paul@24

1394

  Reads:    B

paul@24

1395

  Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7

However, in modes where less than 80 bytes are required to generate the pixel

paul@24

1398

values, an enhanced ULA might be able to read additional bytes between those

paul@24

1399

providing the bitmapped graphics data:

  Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15

paul@24

1402

  Reads:    B                               A

paul@24

1403

  Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7

These additional bytes could provide colour information for the bitmapped data

paul@24

1406

in the following character column (of 8 pixels). Since it would be desirable

paul@24

1407

to apply attribute data to the first column, the initial 8 cycles might be

paul@24

1408

configured to not produce pixel values.

For an entire character, attribute data need only be read for the first row of

paul@35

1411

pixels for a character. The subsequent rows would have attribute information

paul@35

1412

applied to them, although this would require the attribute data to be stored

paul@35

1413

in some kind of buffer. Thus, the following access pattern would be observed:

  Reads:    A B _ B _ B _ B _ B _ B _ B _ B ...

In modes 3 and 6, the blank display lines could be used to retrieve attribute

paul@112

1418

data:

  Reads (blank):     A _ A _ A _ A _ A _ A _ A _ A _ ...

paul@112

1421

  Reads (active):    B _ B _ B _ B _ B _ B _ B _ B _ ...

paul@112

1422

  Reads (active):    B _ B _ B _ B _ B _ B _ B _ B _ ...

paul@112

1423

...

See below for a discussion of using this for character data as well.

A whole byte used for colour information for a whole character would result in

paul@35

1428

a choice of 256 colours, and this might be somewhat excessive. By only reading

paul@35

1429

attribute bytes at every other opportunity, a choice of 16 colours could be

paul@35

1430

applied individually to two characters.

  Cycle:    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

paul@24

1433

  Reads:    B               A               B               -

paul@24

1434

  Pixels:   0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7

Further reductions in attribute data access, offering 4 colours for every

paul@35

1437

character in a four character block, for example, might also be worth

paul@34

1438

considering.

Consider the following configurations for screen modes with a colour depth of

paul@24

1441

1 bit per pixel for bitmap information:

  Screen width  Columns  Scaling  Bytes (B)  Bytes (A)  Colours  Screen start

paul@35

1444

  ------------  -------  -------  ---------  ---------  -------  ------------

paul@35

1445

  320           40       x2       40         40         256      &5300

paul@35

1446

  320           40       x2       40         20         16       &5580 -> &5500

paul@35

1447

  320           40       x2       40         10         4        &56C0 -> &5600

paul@35

1448

  208           26       x3       26         26         256      &62C0 -> &6200

paul@35

1449

  208           26       x3       26         13         16       &6460 -> &6400

Enhancement: Text-Only Modes using Character and Attribute Data

paul@113

1452

---------------------------------------------------------------

In modes 3 and 6, the blank display lines could be used to retrieve character

paul@112

1455

and attribute data instead of trying to insert it between bitmap data accesses,

paul@112

1456

but this data would then need to be retained:

  Reads:    A C A C A C A C A C A C A C A C ...

paul@112

1459

  Reads:    B _ B _ B _ B _ B _ B _ B _ B _ ...

Only attribute (A) and character (C) reads would require screen memory

paul@112

1462

storage. Bitmap data reads (B) would involve either accesses to memory to

paul@112

1463

obtain character definition details or could, at the cost of special storage

paul@112

1464

in the ULA, involve accesses within the ULA that would then free up the RAM.

paul@112

1465

However, the CPU would not benefit from having any extra access slots due to

paul@112

1466

the limitations of the RAM access mechanism.

A scheme without caching might be possible. The same line of memory addresses

paul@113

1469

might be visited over and over again for eight display lines, with an index

paul@113

1470

into the bitmap data being incremented from zero to seven. The access patterns

paul@113

1471

would look like this:

  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 0)

paul@113

1474

  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 1)

paul@113

1475

  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 2)

paul@113

1476

  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 3)

paul@113

1477

  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 4)

paul@113

1478

  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 5)

paul@113

1479

  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 6)

paul@113

1480

  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 7)

The bandwidth requirements would be the sum of the accesses to read the

paul@113

1483

character values (repeatedly) and those to read the bitmap data to reproduce

paul@113

1484

the characters on screen.

Enhancement: MODE 7 Emulation using Character Attributes

paul@55

1487

--------------------------------------------------------

If the scheme of applying attributes to character regions were employed to

paul@24

1490

emulate MODE 7, in conjunction with the MODE 6 display technique, the

paul@24

1491

following configuration would be required:

  Screen width  Columns  Rows  Bytes (B)  Bytes (A)  Colours  Screen start

paul@24

1494

  ------------  -------  ----  ---------  ---------  -------  ------------

paul@35

1495

  320           40       25    40         20         16       &5ECC -> &5E00

paul@35

1496

  320           40       25    40         10         4        &5FC6 -> &5F00

Although this requires much more memory than MODE 7 (8500 bytes versus MODE

paul@35

1499

7's 1000 bytes), it does not need much more memory than MODE 6, and it would

paul@35

1500

at least make a limited 40-column multicolour mode available as a substitute

paul@35

1501

for MODE 7.

Using the text-only enhancement with caching of data or with repeated reads of

paul@113

1504

the same character data line for eight display lines, the storage requirements

paul@112

1505

would be diminished substantially:

  Screen width  Columns  Rows  Bytes (C)  Bytes (A)  Colours  Screen start

paul@112

1508

  ------------  -------  ----  ---------  ---------  -------  ------------

paul@112

1509

  320           40       25    40         20         16       &7A94 -> &7A00

paul@112

1510

  320           40       25    40         10         4        &7B1E -> &7B00

paul@112

1511

  320           40       25    40         5          2        &7B9B -> &7B00

paul@112

1512

  320           40       25    40         0          (2)      &7C18 -> &7C00

paul@112

1513

  640           80       25    80         40         16       &7448 -> &7400

paul@112

1514

  640           80       25    80         20         4        &763C -> &7600

paul@112

1515

  640           80       25    80         10         2        &7736 -> &7700

paul@112

1516

  640           80       25    80         0          (2)      &7830 -> &7800

Note that the colours describe the locally defined attributes for each

paul@112

1519

character. When no attribute information is provided, the colours are defined

paul@112

1520

globally.

Enhancement: Character Generator Support and Vertical Scaling

paul@130

1523

-------------------------------------------------------------

When generating a picture, the ULA traverses screen memory, obtaining 40 or 80

paul@130

1526

bytes of pixel data for each scanline. It then proceeds to the next row of

paul@130

1527

pixel data for each successive scanline, with the exception of the text modes

paul@130

1528

where scanlines may be blank (for which the row address does not advance).

paul@130

1529

This arrangement provides a conventional bitmapped graphics display.

However, the ULA could instead facilitate the use of character generators. The

paul@130

1532

principles involved can be demonstrated by the Jafa Mode 7 Mark 2 Display Unit

paul@130

1533

expansion for the Electron which feeds the pixel data from a MODE 4 screen to

paul@130

1534

a SAA5050 character generator to create a MODE 7 display. The solution adopted

paul@130

1535

involves the replication of 40 bytes of character data across as many pixel

paul@130

1536

rows as is necessary for the character generator to receive the appropriate

paul@130

1537

character data for all scanlines in any given character row. If only a single

paul@130

1538

40-byte row of character data were to be present for the first scanline of a

paul@130

1539

character row, the character generator would only produce the first scanline

paul@130

1540

(or the uppermost pixels of the characters) correctly, with the rest of the

paul@130

1541

character shapes being ill-defined.

Here, the ULA could facilitate the use of memory-efficient character mode

paul@130

1544

representations (such as MODE 7) by holding the row address for a number of

paul@130

1545

scanlines, thus providing the same row of screen data for those scanlines,

paul@130

1546

then advancing to the next row. Visualised in terms of pixel data, it would be

paul@130

1547

like providing a display with a very low vertical resolution. Indeed, being

paul@130

1548

able to reduce the vertical resolution of a display mode by a factor of eight

paul@130

1549

or ten would be equivalent to the above character generation technique in

paul@130

1550

terms of the ULA's screen reading activities.

By combining this vertical scaling or scanline replication with a circuit

paul@130

1553

switchable between bitmapped graphics output and character graphics output,

paul@130

1554

MODE 7 support could be made available, potentially as a hardware option

paul@130

1555

separate from the ULA.

Enhancement: 40-Column Text Modes by Interleaving Screen and Bitmap Accesses

paul@140

1558

----------------------------------------------------------------------------

Suggested here: https://stardot.org.uk/forums/viewtopic.php?p=393243#p393243

The ULA could be run in high-bandwidth mode to fetch character codes from

paul@140

1563

screen memory in one cycle and then to use the character code to look up a

paul@140

1564

pixel row of a character bitmap, reading that bitmap slice in the following

paul@140

1565

cycle. The bitmap would be converted to pixel values that would then be

paul@140

1566

emitted over the subsequent two cycles concurrently with the preparation of

paul@140

1567

the next character's pixels.

  2MHz cycle: 0 1 2 3 4 5 ...

paul@140

1570

  Reads:      C B C B C B ...

paul@140

1571

  Pixels:         a   b   ...

The memory access to bitmap data would be computed as follows, assuming the

paul@140

1574

normal eight pixel height and single-byte encoding of character bitmaps:

  bitmap address = bitmap table base + (character code * 8) + bitmap row

Each successive pixel row on the screen would expose the appropriate row in

paul@140

1579

the character bitmap, with this "bitmap row" looping from 0 to 7 repeatedly.

paul@140

1580

Spacing between character lines could be introduced as already done in MODE 6.

Enhancement: Compressed Character Data

paul@112

1583

--------------------------------------

Another observation about text-only modes is that they only need to store a

paul@112

1586

restricted set of bitmapped data values. Encoding this set of values in a

paul@112

1587

smaller unit of storage than a byte could possibly help to reduce the amount

paul@112

1588

of storage and bandwidth required to reproduce the characters on the display.

Enhancement: High Resolution Graphics and Larger Colour Depths

paul@137

1591

--------------------------------------------------------------

Screen modes with higher resolutions and larger colour depths might be

paul@82

1594

possible, but this would in most cases involve the allocation of more screen

paul@82

1595

memory, and the ULA would probably then be obliged to page in such memory for

paul@137

1596

the CPU to be able to sensibly access it all. Higher resolutions would also

paul@137

1597

involve a faster pixel clock.

However, we may consider a doubled colour depth and the need for higher

paul@137

1600

bandwidth transfers by a ULA having an 8-bit data bus to access the RAM,

paul@137

1601

utilising two "page mode" transfers per 2MHz cycle. If such transfers were to

paul@137

1602

access consecutive bytes in the same memory region (for example, bytes &3000

paul@137

1603

and &3001) this would require a change to the arrangement of screen memory,

paul@137

1604

also incurring changes to the memory map for larger modes:

 (&3000 &3001) (&3010 &3011) ...

paul@137

1607

 (&3002 &3003) (&3012 &3013)

paul@137

1608

 ...           ...

paul@137

1609

 (&300E &300F) (&301E &301F)

If such transfers were to access two adjacent columns of bytes (for example,

paul@137

1612

bytes &3000 and &3008), this would still require a change in the step size

paul@137

1613

across the screen memory, also incur memory map changes for larger modes, and

paul@137

1614

the method for programs to update the screen would be more complicated:

 (&3000 &3008) (&3010 &3018) ...

paul@137

1617

 (&3001 &3009) (&3011 &3019)

paul@137

1618

 ...           ...

paul@137

1619

 (&3007 &300F) (&3017 &301F)

However, such transfers could instead map the device address bit that is

paul@137

1622

toggled between transfers to the most significant system memory address bit.

paul@137

1623

Thus, bits in adjacent locations within each RAM device would actually reside

paul@137

1624

in different memory regions:

 (&3000 &B000) (&3008 &B008) ...

paul@137

1627

 (&3001 &B001) (&3009 &B009)

paul@137

1628

 ...           ...

paul@137

1629

 (&3007 &B007) (&300F &B00F)

Since &B000 can also be considered as &3000 combined with &8000, this

paul@137

1632

introducing the asserted uppermost bit, address &B000 can be considered as

paul@137

1633

&3000 in an upper memory bank.

Other mechanisms might be employed to allow programs to access the uppermost

paul@137

1636

bank, but the ULA would be able to access it trivially and unconditionally.

Enhancement: Genlock Support

paul@55

1639

----------------------------

The ULA generates a video signal in conjunction with circuitry producing the

paul@46

1642

output features necessary for the correct display of the screen image.

paul@46

1643

However, it appears that the ULA drives the video synchronisation mechanism

paul@46

1644

instead of reacting to an existing signal. Genlock support might be possible

paul@46

1645

if the ULA were made to be responsive to such external signals, resetting its

paul@46

1646

address generators upon receiving synchronisation events.

Enhancement: Improved Sound

paul@55

1649

---------------------------

The standard ULA reserves &FE*6 for sound generation and cassette input/output

paul@55

1652

(with bits 1 and 2 of &FE*7 being used to select either sound generation or

paul@55

1653

cassette I/O), thus making it impossible to support multiple channels within

paul@0

1654

the given framework. The BBC Micro ULA employs &FE40-&FE4F for sound control,

paul@0

1655

and an enhanced ULA could adopt this interface.

The BBC Micro uses the SN76489 chip to produce sound, and the entire

paul@9

1658

functionality of this chip could be emulated for enhanced sound, with a subset

paul@9

1659

of the functionality exposed via the &FE*6 interface.

See: http://en.wikipedia.org/wiki/Texas_Instruments_SN76489

paul@81

1662

See: http://www.smspower.org/Development/SN76489

Enhancement: Waveform Upload

paul@55

1665

----------------------------

As with a hardware sprite function, waveforms could be uploaded or referenced

paul@0

1668

using locations as registers referencing memory regions.

Enhancement: Sound Input/Output

paul@55

1671

-------------------------------

Since the ULA already controls audio input/output for cassette-based data, it

paul@46

1674

would have been interesting to entertain the idea of sampling and output of

paul@46

1675

sounds through the cassette interface. However, a significant amount of

paul@46

1676

circuitry is employed to process the input signal for use by the ULA and to

paul@46

1677

process the output signal for recording.

See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw_03.htm#3.11

Enhancement: BBC ULA Compatibility

paul@55

1682

----------------------------------

Although some new ULA functions could be defined in a way that is also

paul@0

1685

compatible with the BBC Micro, the BBC ULA is itself incompatible with the

paul@0

1686

Electron ULA: &FE00-7 is reserved for the video controller in the BBC memory

paul@0

1687

map, but controls various functions specific to the 6845 video controller;

paul@0

1688

&FE08-F is reserved for the serial controller. It therefore becomes possible

paul@0

1689

to disregard compatibility where compatibility is already disregarded for a

paul@0

1690

particular area of functionality.

&FE20-F maps to video ULA functionality on the BBC Micro which provides

paul@0

1693

control over the palette (using address &FE21, compared to &FE07-F on the

paul@0

1694

Electron) and other system-specific functions. Since the location usage is

paul@0

1695

generally incompatible, this region could be reused for other purposes.

Enhancement: Increased RAM, ULA and CPU Performance

paul@55

1698

---------------------------------------------------

More modern implementations of the hardware might feature faster RAM coupled

paul@49

1701

with an increased ULA clock frequency in order to increase the bandwidth

paul@49

1702

available to the ULA and to the CPU in situations where the ULA is not needed

paul@49

1703

to perform work. A ULA employing a 32MHz clock would be able to complete the

paul@49

1704

retrieval of a byte from RAM in only 250ns and thus be able to enable the CPU

paul@49

1705

to access the RAM for the following 250ns even in display modes requiring the

paul@49

1706

retrieval of a byte for the display every 500ns. The CPU could, subject to

paul@49

1707

timing issues, run at 2MHz even in MODE 0, 1 and 2.

A scheme such as that described above would have a similar effect to the

paul@49

1710

scheme employed in the BBC Micro, although the latter made use of RAM with a

paul@49

1711

wider bandwidth in order to complete memory transfers within 250ns and thus

paul@49

1712

permit the CPU to run continuously at 2MHz.

Higher bandwidth could potentially be used to implement exotic features such

paul@49

1715

as RAM-resident hardware sprites or indeed any feature demanding RAM access

paul@49

1716

concurrent with the production of the display image.

Enhancement: Multiple CPU Stacks and Zero Pages

paul@80

1719

-----------------------------------------------

The 6502 maintains a stack for subroutine calls and register storage in page

paul@75

1722

&01. Although the stack register can be manipulated using the TSX and TXS

paul@75

1723

instructions, thereby permitting the maintenance of multiple stack regions and

paul@75

1724

thus the potential coexistence of multiple programs each using a separate

paul@75

1725

region, only programs that make little use of the stack (perhaps avoiding

paul@75

1726

deeply-nested subroutine invocations and significant register storage) would

paul@75

1727

be able to coexist without overwriting each other's stacks.

One way that this issue could be alleviated would involve the provision of a

paul@75

1730

facility to redirect accesses to page &01 to other areas of memory. The ULA

paul@75

1731

would provide a register that defines a physical page for the use of the CPU's

paul@75

1732

"logical" page &01, and upon any access to page &01 by the CPU, the ULA would

paul@75

1733

change the asserted address lines to redirect the access to the appropriate

paul@75

1734

physical region.

By providing an 8-bit register, mapping to the most significant byte (MSB) of

paul@75

1737

a 16-bit address, the ULA could then replace any MSB equal to &01 with the

paul@75

1738

register value before the access is made. Where multiple programs coexist,

paul@75

1739

upon switching programs, the register would be updated to point the ULA to the

paul@75

1740

appropriate stack location, thus providing a simple memory management unit

paul@75

1741

(MMU) capability.

In a similar fashion, zero page accesses could also be redirected so that code

paul@80

1744

could run from sideways RAM and have zero page operations redirected to "upper

paul@80

1745

memory" - for example, to page &BE (with stack accesses redirected to page

paul@80

1746

&BF, perhaps) - thereby permitting most CPU operations to occur without

paul@80

1747

inadvertent accesses to "lower memory" (the RAM) which would risk stalling the

paul@80

1748

CPU as it contends with the ULA for memory access.

Such facilities could also be provided by a separate circuit between the CPU

paul@80

1751

and ULA in a fashion similar to that employed by a "turbo" board, but unlike

paul@80

1752

such boards, no additional RAM would be provided: all memory accesses would

paul@80

1753

occur as normal through the ULA, albeit redirected when configured

paul@80

1754

appropriately.

ULA Pin Functions

paul@31

1757

-----------------

The functions of the ULA pins are described in the Electron Service Manual. Of

paul@31

1760

interest to video processing are the following:

  CSYNC (low during horizontal or vertical synchronisation periods, high

paul@31

1763

         otherwise)

  HS (low during horizontal synchronisation periods, high otherwise)

  RED, GREEN, BLUE (pixel colour outputs)

  CLOCK IN (a 16MHz clock input, 4V peak to peak)

  PHI OUT (a 1MHz, 2MHz and stopped clock signal for the CPU)

More general memory access pins:

  RAM0...RAM3 (data lines to/from the RAM)

  RA0...RA7 (address lines for sending both row and column addresses to the RAM)

  RAS (row address strobe setting the row address on a negative edge - see the

paul@38

1780

       timing notes)

  CAS (column address strobe setting the column address on a negative edge -

paul@38

1783

       see the timing notes)

  WE (sets write enable with logic 0, read with logic 1)

  ROM (select data access from ROM)

CPU-oriented memory access pins:

  A0...A15 (CPU address lines)

  PD0...PD7 (CPU data lines)

  R/W (indicates CPU write with logic 0, CPU read with logic 1)

Interrupt-related pins:

  NMI (CPU request for uninterrupted 1MHz access to memory)

  IRQ (signal event to CPU)

  POR (power-on reset, resetting the ULA on a positive edge and asserting the

paul@31

1804

       CPU's RST pin)

  RST (master reset for the CPU signalled on power-up and by the Break key)

Keyboard-related pins:

  KBD0...KBD3 (keyboard inputs)

  CAPS LOCK (control status LED)

Sound-related pins:

  SOUND O/P (sound output using internal oscillator)

Cassette-related pins:

  CAS IN (cassette circuit input, between 0.5V to 2V peak to peak)

  CAS OUT (pseudo-sinusoidal output, 1.8V peak to peak)

  CAS RC (detect high tone)

  CAS MO (motor relay output)

  �13 IN (~1200 baud clock input)

ULA Socket

paul@72

1831

----------

The socket used for the ULA is a 3M/TexTool 268-5400 68-pin socket.

References

paul@46

1836

----------

See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw.htm

About this Document

paul@71

1841

-------------------

The most recent version of this document and accompanying distribution should

paul@71

1844

be available from the following location:

http://hgweb.boddie.org.uk/ULA

Copyright and licence information can be found in the docs directory of this

paul@71

1849

distribution - see docs/COPYING.txt for more information.

ULA

Annotated ULA.txt