TEMLIB r4.1

More OSes, fewer bugs !

FIX : CPU : LDSTUB and SWAP instructions (!)
FIX : TS : ESP : Support mismatching SCSI command length
FIX : TS : ESP : Better multiple disks handling
FIX : TS : SCC : RR15 access. Interrupts.
FIX : TS : TIMER : System timer disabling.
FIX : DEBUG software : Data breakpoint issues
ENH : TS : LANCE : support split buffers for emission.
ENH : TS : SCSI : Stubs for unhandled commands.
ENH : TS : TCX : Hardware acceleration : Blitter, Filler, Stippler
ENH : TS : TCX : CG3 emulation mode
ENH : TS : OpenBIOS : Faster boot (rather less slow).
ADD : Cross-clock PLOMB bridge.

This is mainly a bugfix release, with a few enhancements to support more OSes.
Because R5 is not finished yet.

SWAP and LDSTUB

LDSTUB and SWAP bugs were quite severe. The LDSTUB instruction was plain broken.

SWAP and LDSTUB also had issues with the MMU.
These instructions do coupled LOAD then STORE data accesses : They should trigger a MMU fault if a read-only memory area is accessed (Solaris dares to do that often).
Before the fix, the MMU would signal the fault on the second access, the STORE cycle, but as the IU only checks the first access of multicycle instructions (SWAP, LDD, STD…), the fault was ignored.
Both the LOAD and STORE parts of SWAP and LDSTUB must be cancelled to let the instruction be restartable after a trap. The fix is therefore not allowing handling faults on the second access, but instead telling the MCU “ok, here is a load for the cache and memory, but it should be considered as a store by the MMU”

OSes

The other fixes are related to the slightly different ways OSes use the peripherals.
The worst is certainly the serial/keyboard/mouse controller : Each OS has a different way of checking received characters and acknowledging interrupts.

TCX acceleration was not planned, it was necessary for NextSTEP which is not compatible with CG3 (TCX and CG3 are the names of video boards).
Luckily, it is not very complex, this blitter can only copy up to 32 pixels. Main difficulty is dealing with misaligned accesses : I wrote the code, but I’m not sure I really understand how it works.

TCX is not complete yet : No hardware cursor and no 24bits colours.

CG3 was added for SunOS, before realizing that SunOS 4.1.4 is also compatible with TCX, so this is not very
useful.

While debugging, I grew tired of waiting for OpenBIOS boot. OpenBIOS is now a bit faster by activating the cache,
using an optimised memcopy, multisector reads.

With these fixes and enhancements, we can now run SunOS (4.1.4), Solaris (7 & 8) and NextSTEP (3.3) in addition to the already supported Linux, NetBSD and OpenBSD.
There are still issues, certainly related to cache management, so sometimes different configurations are needed to run the OSes, and some features, particularly the network, don’t work yet with all OSes.

6 thoughts on “TEMLIB r4.1

  1. Excellent, congratulations! Love to see SunOS and NeXTstep running… Do you have an estimate for the FPGA utilization (LUTs, BRAMs, etc.) on the SP605 for the current version?

    • SP605 ; Spartan6 XC6SLX45T

      ————————————————————————–
      Post synthesis :
      Number of Slice Registers: 8755 out of 54576 16%
      Number of Slice LUTs: 17082 out of 27288 62%
      Number used as Logic: 16539 out of 27288 60%
      Number used as Memory: 543 out of 6408 8%
      Number used as RAM: 140
      Number used as SRL: 403

      Number of LUT Flip Flop pairs used: 19583
      Number with an unused Flip Flop: 10828 out of 19583 55%
      Number with an unused LUT: 2501 out of 19583 12%
      Number of fully used LUT-FF pairs: 6254 out of 19583 31%
      Number of unique control sets: 383

      Number of IOs: 199
      Number of bonded IOBs: 181 out of 296 61%
      IOB Flip Flops/Latches: 10

      Number of Block RAM/FIFO: 32 out of 116 27%
      Number using Block RAM only: 32
      Number of BUFG/BUFGCTRLs: 8 out of 16 50%
      Number of DSP48A1s: 6 out of 58 10%
      Number of PLL_ADVs: 1 out of 4 25%

      ————————————————————————–
      Post PAR :
      Slice Logic Utilization:
      Number of Slice Registers: 8,772 out of 54,576 16%
      Number of Slice LUTs: 14,649 out of 27,288 53%
      Number used as logic: 14,170 out of 27,288 51%
      Number used as Memory: 346 out of 6,408 5%
      Number used exclusively as route-thrus: 133

      Slice Logic Distribution:
      Number of occupied Slices: 5,052 out of 6,822 74%
      Number of MUXCYs used: 2,400 out of 13,644 17%
      Number of LUT Flip Flop pairs used: 16,052
      Number with an unused Flip Flop: 7,928 out of 16,052 49%
      Number with an unused LUT: 1,403 out of 16,052 8%
      Number of fully used LUT-FF pairs: 6,721 out of 16,052 41%
      Number of slice register sites lost
      to control set restrictions: 0 out of 54,576 0%

      Number of RAMB16BWERs: 28 out of 116 24%
      Number of RAMB8BWERs: 7 out of 232 3%

      ————————————————————————–

      The size could be reduced a bit : Removal of the video acceleration, network interface, second disk interface, fewer cache ways, TLBs, slower FPU, no debugger…

      #####################################################

      There is also an unfinished version (complete but with bugs) for an Altera CycloneV GX (Terasic “Cyclone V GX Starter Kit”)

      +——————————————————————————+
      ; Fitter Summary ;
      +——————————————————————————+
      ; Device ; 5CGXFC5C6F27C7 ;
      ; Logic utilization (in ALMs) ; 9,490 / 29,080 ( 33 % ) ;
      ; Total registers ; 11070 ;
      ; Total pins ; 324 / 364 ( 89 % ) ;
      ; Total block memory bits ; 560,896 / 4,567,040 ( 12 % ) ;
      ; Total RAM Blocks ; 83 / 446 ( 19 % ) ;
      ; Total DSP Blocks ; 5 / 150 ( 3 % ) ;
      ; Total PLLs ; 1 / 12 ( 8 % ) ;
      ; Total DLLs ; 1 / 4 ( 25 % ) ;
      +———————————+——————————————–+

      Altera’s fitter gives detailed sizings. Main blocks are (copy/paste to a spreadsheet) :

      ; Compilation Hierarchy Node ; ALMs needed [=A-B+C] ; [A] ALMs used in final placement ; [B] Estimate of ALMs recoverable by dense packing ; [C] Estimate of ALMs unavailable ; ALMs used for memory ; Combinational ALUTs ; Dedicated Logic Registers ; I/O Registers ; Block Memory Bits ; M10Ks ; DSP Blocks
      ; |c5g_ts ; 9489.5 (24.8) ; 10875.5 (29.7) ; 1526.5 (4.9) ; 140.5 (0.0) ; 40.0 (0.0) ; 13769 (49) ; 10870 (41) ; 200 (200) ; 560896 ; 83 ; 5
      ; |iu:i_iu| ; 3695.6 (1356.5) ; 4078.5 (1511.8) ; 452.7 (194.0) ; 69.8 (38.7) ; 0.0 (0.0) ; 5327 (1878) ; 3285 (1115) ; 0 (0) ; 10752 ; 6 ; 5
      ; |fpu:i_fpu| ; 2095.8 (405.2) ; 2321.7 (445.8) ; 252.9 (46.4) ; 27.0 (5.7) ; 0.0 (0.0) ; 3116 (443) ; 1838 (614) ; 0 (0) ; 2048 ; 4 ; 4
      ; |mcu:i_mcu| ; 1558.2 (1543.6) ; 1738.8 (1719.9) ; 218.9 (214.8) ; 38.4 (38.4) ; 0.0 (0.0) ; 2173 (2141) ; 1739 (1739) ; 0 (0) ; 294144 ; 40 ; 0
      ; |lpddr2:i_lpddr2| ; 1712.5 (0.0) ; 1977.0 (0.0) ; 284.0 (0.0) ; 19.5 (0.0) ; 40.0 (0.0) ; 2407 (0) ; 1659 (0) ; 0 (0) ; 116736 ; 18 ; 0

      (C5G_TS includes everything, IU includes the FPU, MCU includes the cache. LPDDR2 is Altera’s DRAM memory controller. Difference is all the peripherals, bus interconnect and glue logic.)

  2. Hello from 2017 how is the project going?

    I’ve finally gotten around to working on / updating my gentoo sparc32 images… maybe I’ll have something interesting before too long.

    • Hello Chase.

      2016 was awful.

      Now, I’m back doing some VHDL for work, last time was before I started making that Sparc CPU, a long time ago.
      But I learned a lot pursuing that crazy idea.

      2017 begins better. I hope I will be able to re-start TEMLIB soon.

      • Well I’m glad to hear things are better at least!

        Unfortunately gentoo on sparc seems to have some issues at the moment … I was attempting to install on my Ultra 45 and I started getting libraries installed in / and usr instead of the lib folters… so that will have to get sorted out.

Leave a Reply

Your email address will not be published. Required fields are marked *

Before you post, please prove you are sentient.

Convert 123456 to binary