ST STE Scanlines

From AtariForumWiki
Jump to navigation Jump to search

This page provides pseudocode descriptions of various processes involved in the generation of each frame of the video output, in both the Atari ST and the STE.

Although this page aims to contain everything that's currently known, it should be considered incomplete.

Model

Two state machines exist in the GLUE, a horizontal machine and a vertical machine. These observe the programmer-set display frequency and produce sync and blank signals appropriately. They also produce an output that indicates when pixels should appear, which is used as a trigger by the Shifter to fetch data.

The vertical state machine is the same between the ST and STE. The horizontal state machine differs, owing to the STE's support for hardware scrolling.

Outputs common to the two state machines are a pixels-enabled flag and signals to indicate blank and sync periods.

Conventionally the flags that enable pixels are called H and V for the horizontal and vertical state machines respectively, the sync signals are HSYNC and VSYNC, and both blank signals are called BLANK. BLANKs are combined with a logical OR, SYNCs with either an OR or an XOR, and H and V are joined with a logical AND into a separate output, DE (for 'Display Enabled').

Latencies apply between the actions of each state machine and the consequences of any change in their outputs.

These state machines model the hardware as observed by a programmer; they do not accurately describe its implementation. This article primarily discusses the model, not the hardware.

Input Latency

Output frequency is set by the programmer using two registers — $ff820a (FREQ) and $ff8260 (RES). These are combined by the GLUE to produce a single value: current output frequency.

However, 820a (FREQ) is latched one cycle later than 8260 (RES); therefore relative to the timings given in this article the GLUE will always be using the value of FREQ that was true one cycle ago.

Vertical state machine

The table below lists the times at which certain tests are applied. If a test is passed the consequence will occur shortly afterwards.

Line Cycle Test Consequence
34 502 IF(60) V = TRUE
47/63* 502 IF(50) V = TRUE
234 502 IF(60) V = FALSE
247/263* 502 IF(50) V = FALSE
258 502 IF(60) VBLANK = TRUE
308 502 IF(50) VBLANK = TRUE


  • 47/63 and 247/263 are valid for "short top border" STs — a few early models had a GLUE revision where the PAL 50Hz screen began higher up.

It's possible to avoid VBLANK at line 308, which will result in two extra full lines of graphics as well as a few pixels on line 311 until VSYNC kicks in hard at cycle ~30.

Current unknowns:

  • Number of lines that will be displayed is likely to be decided when the screen starts, just as number of cycles in a line. Untested.

Horizontal GLUE state machine

ST

This model of the horizontal state machine uses an additional internal piece of information, LINE, which is set early on in the line and used to decide its total length.

Where two consequences are listed in the table below, they will not occur simultaneously.

Cycle Test Consequence
4 IF(71) H = TRUE
24 IF(60) BLANK = FALSE
28 IF(50) BLANK = FALSE
30 Needs to be !71 in non-mono lines. Unknown.
52 IF(60) H = TRUE
54 IF(60 or 50) LINE = 508 IF 60; ELSE LINE = 512
56 IF(50) H = TRUE
164 IF(71) H = FALSE
184 IF(71) BLANK = TRUE
372 IF(60) H = FALSE
376 IF(50) H = FALSE
450 IF(!71) BLANK = TRUE
LINE-50 IF(!71) HSYNC = TRUE, H = FALSE
LINE-10 IF(!71) HSYNC = FALSE

STE

In the STE the GLUE and MMU were combined into a single circuit, the GST MCU. Several changes were made to accommodate hardware scroll support — notably including an earlier check for starting the screen in high res, causing many older border-removing demos to fail.

In addition to H, HSYNC and BLANK the following flag is used by the STE state machine:

VAR PRELOAD MMU starts LOADing Shifter with words for hardware scroll purposes, no screen address changes
Cycle Test Consequence
0 IF(71) PRELOAD = TRUE
24 IF(60) BLANK = FALSE
28 IF(50) BLANK = FALSE
28 Needs to be !71 in non-mono lines. Unknown.
36 IF(60) PRELOAD = TRUE
40 IF(50) PRELOAD = TRUE
56 IF(60 or 50) LINE = 508 IF 60; ELSE LINE = 512
58 Unknown cause. Also related to line length similar to above for 50/60Hz.
164 IF(71) H = FALSE
184 IF(71) BLANK = TRUE
372 IF(60) H = FALSE
376 IF(50) H = FALSE
448 IF(!71) BLANK = TRUE
LINE-52 IF(!71) HSYNC = TRUE, H = FALSE
LINE-12 IF(!71) HSYNC = FALSE

The following pseudo code describes the effect of PRELOAD:

WORDS_READ=0
WHILE(PRELOAD == TRUE) {
  LOAD
  WORDS_READ++
  IF(RES == HIGH AND WORDS_READ=>1) PRELOAD = FALSE
  IF(RES == LO AND WORDS_READ=>4) PRELOAD = FALSE
}
H = TRUE

In regular HI resolution the routine will exit after four cycles (one word) and in LO resolution it will take 16 cycles (four words). This is what makes the STE timings for raised DE match up with ST. It's also why it's possible to create +20 (left border), +4 and +6 (regular) line by disrupting this code, according to the following:

Cycle Action Result
4 IF(RES == LO) PRELOAD will run until cycle 16 (56-16)/2 = +20
44 IF(RES == HI) PRELOAD will exit after 4 cycles (376-44)/2 = +6
48 IF(RES == HI) PRELOAD will exit after 8 cycles (376-48)/2 = +4

Todo: Describe when H becomes DE in cycles as already done for ST MMU

Software-Observeable Latencies

Interrupts

The path from DE via Timer B to the 68000's interrupt input is subject to a substantial delay in the MFP responding to the change in its input. A rule of thumb for the net signalling latency is around 28 cycles at the 68000's clock rate; the time to get into an interrupt routine is longer, depending in part on which instructions the 68000 is executing when the interrupt request is received.

Both vertical and horizontal sync interrupts are autovectored. On a 68000 that means that the processor will internally synchronise with its divide-by-10 E clock. In both cases a latch external to the CPU observes the transition from active to inactive; the state of the latch dictates the interrupt level presented to the CPU and the latches are cleared only by a 68000 interrupt acknowledgement. Therefore software may delay a sync interrupt arbitrarily through selection of a higher interrupt level, and they will not expire of their own volition.

'Wakestates' and Phase

The GLUE and the MMU start up independently when power is applied, and therefore run at an arbitrary relative phase. Each phase is common referred to as a 'wakestate'.

Although cycle timings are given on this page, from a programmer's point of view all absolute positions are subject to a fixed offset of 0–3 cycles, set when the ST is powered on.

Because it is a result of the power-on procedure, wakestate does not change while the ST is running or when it is reset.

Because the STE combines the GLUE and MMU, the two are fully synchronised. Thus there are no GLUE wakeup modes/wakestates known on the STE.

Effects of Wakestate

Depending on the wakestate, the test applied by an ST at cycle 56 might actually occur at any of four different times:

  • Offset 0 reads values at cycle 56 (FREQ) and 57 (RES)
  • Offset 1 reads values at cycle 57 (FREQ) and 58 (RES)
  • Offset 2 reads values at cycle 58 (FREQ) and 59 (RES)
  • Offset 3 reads values at cycle 59 (FREQ) and 60 (RES)

So a program should change the values of GLUE registers by the following cycles:

  • WS1 (DL6): Changes made by 56 (FREQ), 56 (RES)
  • WS3 (DL5): Changes made by 56 (FREQ), 58 (RES)
  • WS4 (DL4): Changes made by 58 (FREQ), 58 (RES)
  • WS2 (DL3): Changes made by 58 (FREQ), 60 (RES)

These four numbered combinations are the known wakestates. Any program performing sufficiently detailed sync manipulation should detect which wakestate the ST is in and modify its timing appropriately.

Since these modes weren't documented until 2006, some classic demos show errors in timing. The TCB "tv-snow" screen in Swedish New Year Demo has a disting logo that's centred only in wakestate 2 - in all others it's offset to the left. [1]

An alternative naming scheme for the wakestates observes the delay is between the GLUE raising DE (Display Enable) and the MMU detecting it [2]. If the MMU detects GLUE DE at cycle 62 (and therefore raises LOAD at cycle 64) then the DL-moniker [3] can be calculated from when GLUE raised DE:

  • 64-58 = 6 = DL6
  • 64-59 = 5 = DL5
  • 64-60 = 4 = DL4
  • 64-61 = 3 = DL3

A side effect of the wakestates is to reposition the graphics part of the display horizontally in increments of one pixel - WS2 (DL3) is leftmost and WS1 (DL6) rightmost. This is caused by monitors using HSYNC to place the screen and depending on wakestate the distance between the HSYNC pulse and the displayed pixels differs.

Shifter state machine

After LOAD it takes 16 cycles, plus 2 due to internal delays, for the Shifter to set the first values on the RGB pins. As long as DE is high LOAD will loop with new data available. For each word read by the Shifter the MMU will increase the video counter.

Examples:

  • Cycle 8, GLUE raised DE. MMU will raise LOAD at cycle 12
  • Cycle 60, GLUE raised DE. MMU will raise LOAD at cycle 64
  • Cycle 380, GLUE lowered DE. MMU will no longer raise LOAD from cycle 384

Regular sync scrolling is made with changes affecting GLUE. 4-pixel sync scrolling as well as the cause for "stable" and "unstable" sync lines is due to Shifter.

Work in progress. Please read through Alien's articles first for terminology. This writeup will be expanded upon in time, and likely contains severe errors.

The Shifter has two types of registers. A four word FIFO buffer ("IR" in Alien's articles) and four words used as shift-registers ("RR") that are shifted out every cycle together with a palette lookup that sets the correct value on the RGB/Mono pins. When Shifter receives LOAD from the MMU it will read a word to the FIFO. The shift-registers are constantly shifting out data. In low resolution all four words shift together, in medium resolution two of them shift together and in high resolution one shift at a time. Every 16 cycles a check is made to see whether the FIFO is full and the contents will then be copied to RR. If DE is not active, LOAD will not be asserted, the FIFO will not be refilled, and RR will not be updated. When RR is not updated with new data, the registers will be 0, and the border color will be displayed.

The concept of a FIFO is hypothesized to be the explanation for "every other 16 pixels black", a Shifters substate that in reality is border color (palette 0), and is only "every other" in low resolution. In high resolution it's every fifth group of 16 pixels. The cause is speculated to be that sometimes the copy from IR to RR simply fails. This will cause all zeros (palette 0) to be shifted out for 16 cycles. Since the next group of 16 pixels are displayed in the correct position one word must've been pushed out from the FIFO and the next have been added correctly.


Sync line lengths

Sync scrollers are created by combining scanlines of different lengths, as read in bytes, to cause the displayed screen to be offset by a chosen amount. It's possible to calculate the amount of bytes lines created by modifying the FREQ and RES registers in GLUE use as follows (see the GLUE state machines for the specific actions needed):

Bytes Method
0 DE never activated
54 DE activated at 60, deactivated at 168. (168-60)/2 = 54
56 DE activated at 56, deactivated at 168. (168-56)/2 = 56
80 DE activated at 8, deactivated at 168. (168-8)/2 = 80
158 DE activated at 60, deactivated at 376. (376-60)/2 = 158
160 DE activated at 56, deactivated at 376. (376-56)/2 = 160
160 DE activated at 60, deactivated at 380. (380-60)/2 = 160
162 DE activated at 56, deactivated at 380. (380-56)/2 = 162
184 DE activated at 8, deactivated at 376. (376-8)/2 = 184
186 DE activated at 8, deactivated at 380. (380-8)/2 = 186
204 DE activated at 60, deactivated at 468. (468-60)/2 = 204
206 DE activated at 56, deactivated at 468. (468-56)/2 = 206
230 DE activated at 8, deactivated at 468. (468-8)/2 = 230

The above examples are PAL, as most demos use. It's however equally easy to calculate how wide (in bytes - multiply by two to get pixels) a left border would be in NTSC compared to PAL:

  • DE activated at 8 compared to regular NTSC line at 56. (56-8)/2 = 24 bytes
  • DE activated at 8 compared to regular PAL line at 60. (60-8)/2 = 26 bytes

And a right border:

  • DE deactivated at 468 compared to regular PAL line at 380. (468-380)/2 = 44 bytes
  • DE deactivated at 464 compared to regular NTSC line at 376. (464-376)/2 = 44 bytes
    • Note: An NTSC line is 508 cycles instead of 512 so the deactivation due to HSYNC will happen at 464

Paulo has written a test program for ST that displays many of the possible combinations, attached to a forum post here: [4]

Clean vs sync disrupting

Some effects possible to make by changing state of the GLUE also cause changes to the signals BLANK and HSYNC. Depending on how forgiving the monitor/TV set used is these effects might seems fully working, or cause "bent" lines or discolorisation of parts of the screen. As a general rule these signals should not be modified, but since they have been in demos this is a brief documentation of their effects (timings used are ST, see STE state machine for counterparts):

  • Delaying BLANK at 450 - caused by the use of HI/LO stabilizer at 444/456. Additional pixel data, if existing, will be displayed and could cause sync signal disruption.
  • Cancelling HSYNC at 462 - black line (no pixels shown) and disrupted sync signal if right border wasn't removed
  • Cancelling HSYNC at 462 - fully displayed line with disrupted sync signal if the right border had been opened, Display Enable constantly set. This is done by mistake by the initialization code in the game Enchanted Lands by TCB [5]
  • Extending HSYNC at 502 - 0 byte line and disrupted sync signal.
  • Extending BLANK at 30 - 0 byte line and disrupted sync signal.

It's possible (currently untested) that using MID/LO stabilizer at 440/456 (also known as "ULM stabilizer") won't delay BLANK. For the other cases there are other ways to black out a line as well as creating 0 byte lines that do not disrupt sync.

Future research / Incomplete

  • the 14 byte line (RES = HI at cycle 32 will cause HSYNC which will cancel DE 4 cycles later. (36-8)/2 = 14) (disrupts sync and shouldn't be used anyway)
  • Shifter state machine

Observations for Emulator Authors

While this page might make it look like an emulator or simulator must be one-cycle accurate fully to capture the potential relative phases, that's unnecessary as far as currently known because sync tricks can only be applied only at two-cycle precision (by using EXG+MOVE and similar instructions). It's therefore possible to emulate all tricks with two cycle emulation granularity.

Unpredictable phase is the hypothesised cause behind what's known as ST "wakeup modes". Two potential modes were known and documented beginning 2006 [6]), and these were more recently divided into four, now known as "wakestates" (2013) [7].

References

This page was originally collected, edited, researched and typed up by Troed of SYNC Information, and therefore owes its greatest debt to him; information in it been sourced from the people below, but a general thanks goes out to everyone who's ever written anything on the subject.

  • Alien of ST Connexion (Overscan Techniques part I and II) [8] [9]
  • Paulo Simoes (posts on Atari-Forum, wakestate discovery and ST documentation) [10]
  • Dio (posts on Atari-Forum, trace diagrams, DE-to-LOAD) [11]
  • Troed of SYNC (GLUE-CPU wakestate hypothesis, STE pre-fetch impact) [12]
  • Ijor (original research and confirmation from chip-decaps) [13]