TEMLIB r4.1

More OSes, fewer bugs !

FIX : CPU : LDSTUB and SWAP instructions (!)
FIX : TS : ESP : Support mismatching SCSI command length
FIX : TS : ESP : Better multiple disks handling
FIX : TS : SCC : RR15 access. Interrupts.
FIX : TS : TIMER : System timer disabling.
FIX : DEBUG software : Data breakpoint issues
ENH : TS : LANCE : support split buffers for emission.
ENH : TS : SCSI : Stubs for unhandled commands.
ENH : TS : TCX : Hardware acceleration : Blitter, Filler, Stippler
ENH : TS : TCX : CG3 emulation mode
ENH : TS : OpenBIOS : Faster boot (rather less slow).
ADD : Cross-clock PLOMB bridge.

This is mainly a bugfix release, with a few enhancements to support more OSes.
Because R5 is not finished yet.

SWAP and LDSTUB

LDSTUB and SWAP bugs were quite severe. The LDSTUB instruction was plain broken.

SWAP and LDSTUB also had issues with the MMU.
These instructions do coupled LOAD then STORE data accesses : They should trigger a MMU fault if a read-only memory area is accessed (Solaris dares to do that often).
Before the fix, the MMU would signal the fault on the second access, the STORE cycle, but as the IU only checks the first access of multicycle instructions (SWAP, LDD, STD…), the fault was ignored.
Both the LOAD and STORE parts of SWAP and LDSTUB must be cancelled to let the instruction be restartable after a trap. The fix is therefore not allowing handling faults on the second access, but instead telling the MCU “ok, here is a load for the cache and memory, but it should be considered as a store by the MMU”

OSes

The other fixes are related to the slightly different ways OSes use the peripherals.
The worst is certainly the serial/keyboard/mouse controller : Each OS has a different way of checking received characters and acknowledging interrupts.

TCX acceleration was not planned, it was necessary for NextSTEP which is not compatible with CG3 (TCX and CG3 are the names of video boards).
Luckily, it is not very complex, this blitter can only copy up to 32 pixels. Main difficulty is dealing with misaligned accesses : I wrote the code, but I’m not sure I really understand how it works.

TCX is not complete yet : No hardware cursor and no 24bits colours.

CG3 was added for SunOS, before realizing that SunOS 4.1.4 is also compatible with TCX, so this is not very
useful.

While debugging, I grew tired of waiting for OpenBIOS boot. OpenBIOS is now a bit faster by activating the cache,
using an optimised memcopy, multisector reads.

With these fixes and enhancements, we can now run SunOS (4.1.4), Solaris (7 & 8) and NextSTEP (3.3) in addition to the already supported Linux, NetBSD and OpenBSD.
There are still issues, certainly related to cache management, so sometimes different configurations are needed to run the OSes, and some features, particularly the network, don’t work yet with all OSes.

13 thoughts on “TEMLIB r4.1

  1. Excellent, congratulations! Love to see SunOS and NeXTstep running… Do you have an estimate for the FPGA utilization (LUTs, BRAMs, etc.) on the SP605 for the current version?

    • SP605 ; Spartan6 XC6SLX45T

      ————————————————————————–
      Post synthesis :
      Number of Slice Registers: 8755 out of 54576 16%
      Number of Slice LUTs: 17082 out of 27288 62%
      Number used as Logic: 16539 out of 27288 60%
      Number used as Memory: 543 out of 6408 8%
      Number used as RAM: 140
      Number used as SRL: 403

      Number of LUT Flip Flop pairs used: 19583
      Number with an unused Flip Flop: 10828 out of 19583 55%
      Number with an unused LUT: 2501 out of 19583 12%
      Number of fully used LUT-FF pairs: 6254 out of 19583 31%
      Number of unique control sets: 383

      Number of IOs: 199
      Number of bonded IOBs: 181 out of 296 61%
      IOB Flip Flops/Latches: 10

      Number of Block RAM/FIFO: 32 out of 116 27%
      Number using Block RAM only: 32
      Number of BUFG/BUFGCTRLs: 8 out of 16 50%
      Number of DSP48A1s: 6 out of 58 10%
      Number of PLL_ADVs: 1 out of 4 25%

      ————————————————————————–
      Post PAR :
      Slice Logic Utilization:
      Number of Slice Registers: 8,772 out of 54,576 16%
      Number of Slice LUTs: 14,649 out of 27,288 53%
      Number used as logic: 14,170 out of 27,288 51%
      Number used as Memory: 346 out of 6,408 5%
      Number used exclusively as route-thrus: 133

      Slice Logic Distribution:
      Number of occupied Slices: 5,052 out of 6,822 74%
      Number of MUXCYs used: 2,400 out of 13,644 17%
      Number of LUT Flip Flop pairs used: 16,052
      Number with an unused Flip Flop: 7,928 out of 16,052 49%
      Number with an unused LUT: 1,403 out of 16,052 8%
      Number of fully used LUT-FF pairs: 6,721 out of 16,052 41%
      Number of slice register sites lost
      to control set restrictions: 0 out of 54,576 0%

      Number of RAMB16BWERs: 28 out of 116 24%
      Number of RAMB8BWERs: 7 out of 232 3%

      ————————————————————————–

      The size could be reduced a bit : Removal of the video acceleration, network interface, second disk interface, fewer cache ways, TLBs, slower FPU, no debugger…

      #####################################################

      There is also an unfinished version (complete but with bugs) for an Altera CycloneV GX (Terasic “Cyclone V GX Starter Kit”)

      +——————————————————————————+
      ; Fitter Summary ;
      +——————————————————————————+
      ; Device ; 5CGXFC5C6F27C7 ;
      ; Logic utilization (in ALMs) ; 9,490 / 29,080 ( 33 % ) ;
      ; Total registers ; 11070 ;
      ; Total pins ; 324 / 364 ( 89 % ) ;
      ; Total block memory bits ; 560,896 / 4,567,040 ( 12 % ) ;
      ; Total RAM Blocks ; 83 / 446 ( 19 % ) ;
      ; Total DSP Blocks ; 5 / 150 ( 3 % ) ;
      ; Total PLLs ; 1 / 12 ( 8 % ) ;
      ; Total DLLs ; 1 / 4 ( 25 % ) ;
      +———————————+——————————————–+

      Altera’s fitter gives detailed sizings. Main blocks are (copy/paste to a spreadsheet) :

      ; Compilation Hierarchy Node ; ALMs needed [=A-B+C] ; [A] ALMs used in final placement ; [B] Estimate of ALMs recoverable by dense packing ; [C] Estimate of ALMs unavailable ; ALMs used for memory ; Combinational ALUTs ; Dedicated Logic Registers ; I/O Registers ; Block Memory Bits ; M10Ks ; DSP Blocks
      ; |c5g_ts ; 9489.5 (24.8) ; 10875.5 (29.7) ; 1526.5 (4.9) ; 140.5 (0.0) ; 40.0 (0.0) ; 13769 (49) ; 10870 (41) ; 200 (200) ; 560896 ; 83 ; 5
      ; |iu:i_iu| ; 3695.6 (1356.5) ; 4078.5 (1511.8) ; 452.7 (194.0) ; 69.8 (38.7) ; 0.0 (0.0) ; 5327 (1878) ; 3285 (1115) ; 0 (0) ; 10752 ; 6 ; 5
      ; |fpu:i_fpu| ; 2095.8 (405.2) ; 2321.7 (445.8) ; 252.9 (46.4) ; 27.0 (5.7) ; 0.0 (0.0) ; 3116 (443) ; 1838 (614) ; 0 (0) ; 2048 ; 4 ; 4
      ; |mcu:i_mcu| ; 1558.2 (1543.6) ; 1738.8 (1719.9) ; 218.9 (214.8) ; 38.4 (38.4) ; 0.0 (0.0) ; 2173 (2141) ; 1739 (1739) ; 0 (0) ; 294144 ; 40 ; 0
      ; |lpddr2:i_lpddr2| ; 1712.5 (0.0) ; 1977.0 (0.0) ; 284.0 (0.0) ; 19.5 (0.0) ; 40.0 (0.0) ; 2407 (0) ; 1659 (0) ; 0 (0) ; 116736 ; 18 ; 0

      (C5G_TS includes everything, IU includes the FPU, MCU includes the cache. LPDDR2 is Altera’s DRAM memory controller. Difference is all the peripherals, bus interconnect and glue logic.)

  2. Hello from 2017 how is the project going?

    I’ve finally gotten around to working on / updating my gentoo sparc32 images… maybe I’ll have something interesting before too long.

    • Hello Chase.

      2016 was awful.

      Now, I’m back doing some VHDL for work, last time was before I started making that Sparc CPU, a long time ago.
      But I learned a lot pursuing that crazy idea.

      2017 begins better. I hope I will be able to re-start TEMLIB soon.

      • Well I’m glad to hear things are better at least!

        Unfortunately gentoo on sparc seems to have some issues at the moment … I was attempting to install on my Ultra 45 and I started getting libraries installed in / and usr instead of the lib folters… so that will have to get sorted out.

    • I am busy making the next release, there will be many changes and support for Altera/Intel chips, in addition to Spartan6.

      An Artix7 version should not be very difficult, most of the design is very portable. I’ve used Vivado, it is both better and worse than ISE. Troubles comes from complex interfaces such as the memory controller, HDMI/DVI encoders…

      Which Artix chip do you have ? Maybe a Digilent eval board ?

      I think you need at least an XC7A35T for the SparcStation.
      Of course, a larger chip would be better. You will also need tens of megabytes of RAM, some nonvolatile memory, a video output, a few GPIOs for the keyboard & mouse, an SD card slot, and, eventually, en Ethernet PHY.

  3. No, making my own boards. But have an xc7a100 on it, 256MB DDR3, VGA analogue, Ethernet & USB Phys, PS/2 for keyb/mouse, so I should be good.

    Will wait for your next release 😉

    Cheers & thanks!

  4. I follow your work quite a while, also on Youtube. Remarkable achievement!

    I have to take a look, if it’s possible to reduce your design to SPARC V7 IU to simulate a sun4c kernel architecture with maybe CG3, LANCE on a single FPGA. Next to your FPGA variant I’d like to see a real SBC variant like Raspberry Pi or Arduino. I have no idea how to realize this.

    I hope to see some more updates on your project.
    Keep it up, man!
    Cheers, Stephan

    • Thank you very much Stephan!

      Why do you want SparcV7 and Sun4c ?

      For the CPU, you will practically only avoid the integer multiply and divide instructions, but it’s not the most complex parts. The FPU is much
      larger but, as all SparcStations had an FPU, you cannot probably avoid it without changing the OSes.

      (there were also embedded Cypress 7C601, a rad-hard version “ERC32” for space applications. This is now covered by Gaisler’s LEON CPU).

      I don’t think Sun4c would be much simpler than Sun4m, particularly in this FPGA implementation (there are a few things in the original computers which
      are not present/needed here, for example the FIFO for buffering DMA accesses, or the HDLC and floppy interfaces).
      Finally, the biggest advantage of Sun4m is that there is more software for it. Both original OSes (from SunOS to NextSTEP…) and recent code, such as QEMU
      and OpenBIOS. Being able to re-use OpenBIOS and have disk images generated by QEMU is a big advantage.

      There are a few parts that can be configured to reduce the size : Smaller caches, fewer TLBs, no debuggger, no Ethernet (or no display and remote X11 over Ethernet)…
      The next version (coming soon, promised) will be available for newer FPGA eval boards, much cheaper and smaller than the now obsolete SP605.

      And with QEMU, you can also emulate a SparcStation on a Raspberry Pi!

      • It is also worth noting that sun4c has no support at all on Linux these days as it was dropped as a supported architecture some time ago (years).

        sun4m is still there in the tree but mostly untested, what you really want is something like a sun4m+ with the backported CASA instruction as this is basically what the Leon3 and 4 support and it makes some things simpler for the kernel and toolchain namely atomics. Also you’re probably better off emulating Happy Meal ethernet… as that gets you 100Mbit at lest.

        If you don’t have the CASA instruction you can’t run systemd (not really a bad thing in my book). So must resort to something like mdev which is usually fine as sparc doesn’t have much pluggable hardware and its more lightweight.

        Is there any chance the new release will support SMP or > 100Mhz? Less than 100Mhz would not be too bad either as long as the pipeline has wide enough instruction issue… an Ultra 1 @ 166Mhz is actually quite nice for instance. One issue I’ve seen with most FPGA boards is very limited DDR capacity :/ which makes it impractical to run much real software.

        • Hello !
          Sorry, the next version is a bit faster, but I doubt a 100MHz SPARC could be achievable with current affordable FPGAs.

          Adding the SparcV9/LEON CASA instruction could be quite useful, you’ve convinced me. I will try, it doesn’t seem very difficult, tweaking a bit the hardware around SWAP and LDSTUB.
          There is already the SparcV9 LDFA/STFA instructions, which were added to help with the debugger, so it would not be the first nonstandard extension.

          It’s a bit strange that Sun made so early multi-CPU workstations and servers, but using CPUs without decent atomic instructions. The older MC68K already had a CAS instruction, they could have defined it in SPARC from the beginning.

          Besides useful stuff as CASA, the LEON has a few gratuitous incompatible details in the MMU, compared to old SPARCs, so, while LEON has allowed to keep SPARCv8 alive in GCC and Linux (thanks Mr Gaisler !), it is not quite enough for Sun4m.

          For Ethernet, the current version already supports 100Mbps while emulating the original 10Mbps AMD LANCE chip. IIRC HappyMeal had additional hardware offloading goodies. I’d like first to fix remaining issues with LANCE and have the network working with all OSes.

          [I finally have a Terasic C5G! For Ethernet I will use an “eBay” WaveShare LAN8720 PHY board]

Leave a Reply to cb88 Cancel reply

Your email address will not be published. Required fields are marked *