Designing a benchmark for the UCA – Part #1

A benchmark feature was planned very early in the development process of the Universal Chip Analyzer. Given the vast differences in microarchitecture between a 486 and an 8080, two benchmarks are necessary: one to compare older 8-bit CPUs from various manufacturers with incompatible instruction sets (Motorola 6800, Intel 8080, MOS 6502, etc.), and another for 16- and 32-bit CPUs based on Intel’s x86 ISA. Let’s begin with building an Integer/FP benchmark for “modern” CPUs like the 386 or 486. (I’ll cover how to evaluate older CPUs’ performance in a later post.)

So, what is a CPU benchmark? Essentially, it’s a score derived from the ratio between a piece of code designed to emulate real-world programs (using similar sets of instructions) and the time required to execute that code. While there is no debate about how to measure time, there are endless discussions about the best instructions to use for an effective benchmark. By profiling various common programs, benchmark writers determine how instructions are statistically used and then create synthetic code that mimics a similar instruction distribution.

Let’s see if and how this approach can apply to the UCA.

INT Performance Benchmark

In the 80s and 90s, the industry-standard for measuring integer performance was the “Dhrystone” benchmark, originally published in 1984 in Ada by Reinhold P. Weicker and later ported to C by Rick Richardson. Dhrystone was designed to evaluate overall system performance with a focus on integer operations, as floating-point instructions were rare at that time. The real-world programs used by Weicker to define the instruction distribution for Dhrystone were written in Fortran, Pascal, and long-obsolete languages like ALGOL. A complete description of the instruction statistics behind the C version of Dhrystone can be found in the dhry.h header file.

To implement a similar code in the Universal Chip Analyzer, I first needed to understand exactly what Dhrystone does. I began by compiling the original C code using GCC 4.9, targeting the “i386” architecture with the “-O1” optimization flag to avoid extreme optimization. Then, I used Intel’s Pin Architecture Analysis tool to log every instruction executed by the Dhrystone binary and sorted them. Finally, I grouped the instructions by family.

Key findings include:

      • MOV Instructions: About 30% of executed instructions are MOVs. Of these, 68.8% are used to read memory, 22.7% to write to memory (the usual 2:1 ratio), and 8.5% involve registers only.
      • Integer Operations: Integer ADDs account for 13.7% of total instructions executed, while Integer SUBs account for 3.1%.
      • Bit Operations: Bit operations (Boolean manipulation, shift, rotate, etc.) account for approximately 11% of total instructions, and bit-based comparisons account for 5.5%.
      • String Operations: About 20% of MOVs instructions are related to string operation (like movsd)
      • LEA Instructions: LEA instructions, a compiler trick to optimize basic arithmetic operations using a memory computation-related instruction, remain below 5%.
      • Program Flow Control: JUMPs are mainly conditional, with 70% being jnz (Jump if not zero). Stack operations (push/pop) and function control (call/ret) account for 16.3% of the total instructions.

As an arithmetic benchmark, Dhrystone also performs some multiplication (0.2% of the total) and division (also 0.2%). While this 0.4% may seem insignificant compared to the 16.7% for addition and subtraction, it’s important to understand that ADD and SUB instructions require only 2 cycles on an i486, while IDIV and IMULT instructions can require up to 43 cycles, making them 20 times slower. Consequently, these 0.4% of div/mult operations take as much time as 50% of all add/sub operations. This must be considered to avoid an issue where a single time-consuming instruction skews the final score.

Another crucial point is related to the memory access subsystem (including the cache, when available). A benchmark like Dhrystone doesn’t solely evaluate CPU performance, but rather how the CPU and memory perform together. To what extent? The instruction statistics show that approximately 30% of executed instructions reference a memory location, resulting in a read/write operation. With the slow memory used in the 1980s and 1990s, the final score could be directly linked to the performance of the memory (or the memory controller, or the internal cache). Should this be simulated by the UCA, given that the memory simulated by the UCA is extremely fast with zero wait state? I don’t think so. The goal here is to design a pure CPU benchmark, as independent of memory subsystem speed as possible. Nonetheless, some memory operations are still necessary to consider the latency and bandwidth of very common instructions referencing memory like MOVs.

With these results in mind, and taking into consideration the number of cycles required for instructions on CPUs ranging from the 8086 to the 80486, here is the instruction dispatch I selected for the Integer benchmark of the Universal Chip Analyzer:

      • 25% MOVs: 10% Direct (Reg/Reg), 10% Read (Reg/Mem), 5% Write (Mem/Reg)
      • 16% ADDs + 8% SUB: Basic arithmetic operation.
      • 5% MULT + 0.25% DIV: Same execution time than the 24% ADD/SUB
      • 12% Boolean operation: AND, OR, XOR, INC, DEC, …
      • 6% Rotation/Shift: ROR, ROL, SHL, SHR, …
      • 10% conditional JUMPs: JNZ, JZ, JNE, JE, …
      • ~20% for Flow control and stack management: PUSH, POP, CALL, RET, …

Of course, this code can be changed easily at any time to fit specific benchmarking needs.

FP Performance Benchmark

When considering floating-point benchmarks, the two clear choices were Whetstone and Linpack. I profiled both using various tools. Let’s begin with Linpack to understand why I ultimately preferred Whetstone.

As shown in the instruction statistics charts, the authors of Linpack recognized early on that the FMA (Floating-point Multiply-Add) would become the cornerstone of intensive compute activities for decades to come. Consequently, they fine-tuned Linpack to focus almost exclusively on FMA operations.

Linpack exhibits very few memory dependencies (less than 3%) and even fewer control flow and stack instructions (less than 2%), which could be great for the UCA. However, the floating-point instruction dispatch reveals that only four FP instructions are predominantly used: FADD, FMULT, and FLD/FSTP for loading static values and storing results in registers. Linpack essentially measures the FMA performance of an FP execution unit, which is insufficient for properly evaluating an entire FPU.

Now let’s profile Whetstone the same way, stating with all-instructions statistics:

Whetstone demonstrates a more balanced use of non-FP instructions, though this comes at the expense of a lower volume of FP instructions overall. Memory dependencies, stack usage, and function control are significantly higher compared to Linpack. Next, let’s examine the distribution of FP instructions:

Whetstone utilizes the complete set of instructions available on early x87 FPUs, maintaining a balance between very fast instructions like FADD and much slower functions like FSQRT (square root), as well as even more time-consuming logarithmic, exponential, or trigonometric instructions like FPATAN, FSIN, or FCOS (which can take hundreds of cycles on any 386 or 486!). This comprehensive range is exactly what we need for a thorough evaluation of FPU performance.

To summarize, for the Universal Chip Analyzer FP benchmark, we need the non-FP instruction dispatch statistics of Linpack (with very few memory dependencies and minimal stack/program control flow instructions) combined with the variety of FP operations found in Whetstone (not just FMA but the full suite of FP operations, from square roots to trigonometric functions). Of course, we must balance these instructions to ensure no single operation disproportionately affects the overall performance.

Here is my proposed dispatch that will be implemented as “V1” of the UCA FP Benchmark:

FMAs account for 45% of the total execution time, FDIV/FSQRT for 30% and log/exp/trigonometric functions for 17%. Memory dependencies are reduced to the bare minimum, as well as others program control flow instructions including stack operations.

Now it’s time to implement this!

Related Sources:

1. An overview of common benchmarks by R. Weicker
2. Benchmark Programs and Reports
3. 80×86 Integer Instruction Set (8088 – Pentium)
4. 80×87 Instruction Set (x87 – Pentium)

 

Final Universal Chip Analyzer disclosed!

If you’ve been following my blog since the inception of the Universal Chip Analyzer (UCA) journey, you might recall a post with the same title I published in mid-2021. At that time, I was excited to unveil the second version of the UCA, a project I had been dedicated to since 2017. Unfortunately, shortly after this announcement, the integrated circuit (IC) industry faced a significant global shortage. This crisis led to exorbitant price hikes, severely impacting the production of the UCA’s initial batch. In 2022, I made a pivotal decision to entirely overhaul the UCA project, encompassing both its hardware and software components.

Initially, I questioned the necessity of upgrading beyond the original components, a Xilinx Spartan-6 FPGA and a Cortex-M0 based microcontroller (MCU). These components seemed adequate for the previously stated applications. However, I’ve come to realize the immense benefits of this upgrade. The rationale behind the UCA’s redesign is twofold: versatility and longevity. By incorporating more powerful components, the UCA’s capabilities have been greatly enhanced, truly living up to its “Universal” moniker. This complete modular redesign not only broadens its applications in retro-computing beyond mere component testing but also incorporates every feature I envisioned without the constraints of the original Mojo development platform.

This latest iteration, internally referred to as “v3,” marks a significant evolution from its predecessors. The UCA v1 was a modification of the Mojo v3 from Alchitry, featuring custom firmware and an upgraded Flash chip. The UCA v2 introduced a superior Cortex-M0 based MCU while maintaining the core architecture of the Xilinx Spartan-6 FPGA. Below is an image showing the progression from v1 (right) to v2 (middle), leading up to the latest v3 (left):

UCA v3 Hardware

Let’s delve into the specifics of the final version of the Universal Chip Analyzer (UCA).

    1. FPGA Upgrade – The most notable enhancement in the UCA v3 is the transition from a Spartan-6 to an Artix-7 FPGA. The original Spartan-6 XC6SXL9, a 45nm FPGA, featured 9,152 logic cells and 576 Kb of Block RAM. In stark contrast, the Artix-7 XC7A35T, fabricated on a more advanced 25nm process, boasts 33,280 logic cells and 1.8 Mbit of Block RAM. This represents a threefold increase in size, coupled with improved power efficiency and additional features like an integrated ADC. Despite being available only in BGA packaging, soldering the Artix-7 proved to be surprisingly manageable.

    2. Microcontroller Unit (MCU) – The MCU plays a crucial role in the UCA, responsible for loading the appropriate FPGA firmware, facilitating communication between the FPGA and external devices via USB, managing integrated voltage regulators and their safety protocols, displaying information on the OLED screen, and handling various auxiliary tasks. The MicroMod Teensy, a collaboration between Sparkfun and PCJR, was selected for its ease of use, impressive power, and open-source status. Based on the NXP i.MX RT1062 microcontroller (ARM Cortex-M7 @ 600 MHz), it includes 1024K RAM, 16MB Flash, and supports an extensive range of peripherals. Its integration into the MicroMod ecosystem means it’s housed on a small, replaceable board with a standard pinout and a common M.2 connector. This not only simplifies the design but also reduces costs and facilitates future upgrades if needed.

    3. MicroSD Card Integration – The UCA operates on various FPGA configuration files, known as bitfiles, with each hardware configuration requiring a specific bitfile. Earlier versions used a 256 Mbit (64 MB) SPI Flash IC, limiting the storage to about 100 configurations. To accommodate the larger bitfiles required by the Artix-7 FPGA, which are over 2 MB, a MicroSD card was introduced. This solution offers practically unlimited storage and, thanks to the performance capabilities of the new MCU, enables quick firmware loading – a 2.2 MB firmware can be loaded in just 500 ms.

    4. USB-C Connectivity – First introduced in the UCA v2, the USB-C connector continues to facilitate communication with the companion app for advanced testing and monitoring. The enhanced NXP i.MX RT1062 MCU paves the way for future feature expansions.

    5. DC connector – The inclusion of a standard 2.1mm low profile DC connector is essential for powering the adjustable DC-DC voltage regulator on the Interface (IF) board with a 9V to 12V power source. This is particularly crucial for powering newer CPUs like the 486 DX2/DX4. Two versions of the IF board are planned: a fully-featured variant supporting a wide range of voltages and a simplified version for testing 5V ICs without the need for external DC power.

    6. Enhanced Power Regulators – The shift to the Artix-7 FPGA necessitated a complete overhaul of the UCA’s power management system, given its more complex power requirements compared to the Spartan-6. The board now incorporates two DC-DC switching regulators (one for the main 1.0V FPGA core voltage and another for the 3.3V MCU) and two linear converters (one for the 1.8V FPGA auxiliary voltage, and another as a low-noise converter for 3.3V I/O).

Despite these significant upgrades, the UCA retains its compact dimensions (85×63 mm), similar to a credit card, and continues to be based on a mezzanine stack design. The first layer is the FPGA board (as described), followed by the Interface (IF) Board, which integrates the OLED display, DUT (Device Under Test) power management, and signal conversion. The final layer is the Adapter Board, equipped with a socket suitable for a specific type of chip (CPU, DRAM, etc.).

Here’s a glimpse of the complete UCA v3 with the DIP40 Adapter Board:

UCA v3 Software

From a software perspective, the transition from the Spartan-6 to the Artix-7 FPGA in the UCA v3 brings substantial advantages. Let’s delve into some of the key developments in the FPGA domain. Xilinx, now part of AMD, recently extended the lifespan of the Spartan-6 series to at least 2030. This sounds promising, but there’s a significant caveat: Xilinx discontinued their 6-Series FPGA toolchain and development tools, known as Xilinx ISE, back in … 2013! As a result, anyone looking to code for a Xilinx 6-Series FPGA today is forced to use an outdated, bug-ridden tool that lacks support for modern operating systems like Windows 10. The only viable workaround is running the tool on an old Linux VM. To put it bluntly, working with ISE in 2024 is a nightmare.

The shift to a 7-Series FPGA like the Artix-7, on the other hand, enables the use of Xilinx’s current “Vivado” toolchain, which is under active maintenance. This change is significant. When I initially embarked on the UCA project in 2018, the learning curve was steep, and I relied heavily on Xilinx 6-Series specific primitives and IPs for simplicity. With the move to the Artix-7, I made a strategic decision to rewrite everything in pure (System)Verilog, minimizing the use of specific IPs. This approach not only facilitates future transitions to different FPGA brands or models if necessary, but it also allows for the release of all the code as open-source software.

What’s Next?

The immediate objective is to finalize the UCA v3 hardware validation as quickly as possible. My aim is to ensure the FPGA and IF boards won’t require further modifications for years, allowing me to concentrate on software development, adapter creation, and other innovative features. The initial focus will be on beta testing the DIP40 and 486 adapters. If the UCA v3 successfully runs a 486 DX4 and an 8080, like the UCA v2, I’m confident that the hardware will be robust enough for any CPU released in the 70s, 80s, and up to the mid-90s.

The Universal Chip Analyzer v3 will be available for sale soon, starting with the DIP40 adapter. Subsequent adapters will then be released one-by-one. Pricing details will be announced in the near future. While I’m considering another Kickstarter campaign, akin to my experience with the ATX2AT Smart Converter, I’m mindful of the significant time investment such crowdfunding efforts entail.

Stay tuned for more updates on the UCA, and feel free to share your thoughts and comments!

 

UCA now able to test early DRAMs!

The Universal Chip Analyzer was not called the Universal CPU Analyzer for a reason: from the very beginning of the project back in 2017, I have in mind a tester for various ICs and not just CPUs. The 8087 was the first non-CPU that could be tested on the UCA, but a FPU is technically close to CPU. Another component that often fail on early 70s & 80s computers is the memory (“RAM”) chips. Back in these days, DRAM didn’t come in SIMM or DIMM modules like in the 90s and later, but simply as individual ICs assembled on the motherboard. The most common were the 2164/4164 (64 Kbits) and the 21256/41256 (256 Kbits) as found on the original IBM PC XT motherboard or many other iconic computers like the Commodore 64. Below, a motherboard from a DTK PC clone featuring 18x 41256s in the bottom-right corner totalizing 576 KB of raw memory (512 KB useable).

IBM PC Clone motherboard (DTK PIM-TB10-Z)
IBM PC Clone motherboard (DTK PIM-TB10-Z)

There is a LOT of different DRAM chips from that era, so I had to build an almost universal DRAM shield, able to supply all the common voltages (+5V, -5V and +12V). I choose a universal DIP20 ZIF socket that should cover almost all the 1-bit and 4-bit wide DRAM used in these computers. Unfortunately, an almost infinite variety of pinout existed back then. The easiest solution to route the correct signals to their appropriate pins was to add dedicated “Setup cards”, one for each different pinout. PCBs are cheap these days and these setup cards only use 2-layer PCBs, costing less than $2 each. The type of card inserted is of course automatically detected by the UCA. I started with the 2164/4164 and 21/41256 (1-bit wide DRAMs).For the first proof of concept, I used a standard PCI Express 4x connector. I’m not totally happy with this connector because it requires the plugged-in card to be perfectly perpendicular to ensure a perfect electrical connection. In the other end, it’s also very convenient and easy to use. I successfully tested many different 2164, 4164, 21256 and 41256 chips, as well as other DRAMs that use the same pinout (like IMS2600).

The test routine run by the FPGA includes checking for stuck 0s, stuck 1s and adjacent bits flipping correctly from 0 to 1 and from 1 to 0. I have also added more advanced tests (thanks to the years spent on Memtest86+) like memory retention between refreshes, which can degrade over time. The firmware can detect any errors and report the memory address that failed. It can also detect if the DRAM chip supports advanced features like paging (subsequent column access without toggling RAS) and the obscure nibble mode (a limited paging access).

One of the most interesting features of the UCA is the ability to test DRAMs at their rated speed. A chip originally rated at 150 ns access time can now be limited to 200 ns due to aging for example. I have added 100 ns, 120 ns, 150 ns and 200 ns as standard access time but other frequencies are possible as well. Of course, the current and power consumption of the chip is measured. I’m also thinking about more settings like the refresh time. Here is a logic analyzer dump of the standard test process right now:

I use a 1.5 ms (1500 µs) delay between refreshes because it’s the shortest allowed by some ICs according to their datasheets, but more advanced DRAMs support “Extended Refresh time” that allow up to 4 ms between refreshes without data loss. Adding customizable tREFRESH delay can be helpful in the future. Also, I will build a new UCA™ Setup Card©   for 4116 as they were very common in late 70s computer like the Apple II. They need a specific card because they require -5V and +12V instead of the single +5V supply of later 4164 & 41256. 4-bit wide DRAMs like the 4464 and 44256 will probably follow soon.

Feel free to suggest any different DRAM (with max 20 pins) from that era!

All PGA 386s benchmarked with the UCA

The C&T Super386 being the last of the PGA 386-compatible CPUs ever released to be supported by the Universal Chip Analyzer, it was time to publish some benchmarks! The current integrated benchmark uses a lot of standard x86 operations (mov, add, conditional and unconditional jumps, …) and integer math instructions (add, sub, div, mult). Keep in mind that the UCA can achieve a 0-wait-states-everywhere communication with the CPU, nullifying any added latencies from chipset, RAM, or whatever. The results below are 100% linked to the raw CPU power without any limitation from the subsystem.

First, all 386 compatibles manufacturers which claimed superior performance versus Intel’s 386 actually delivered their promises. AMD’s 386s use the exact same die and consequently offers the same performance. Intel’s own RapidCAD is only 6% faster than the standard 386 on integer operation but comes with an integrated FPU offering much higher speed on floating point operations. The C&T Super386 is significantly faster than Intel’s 386: about 20% higher. C&T claimed its microprocessors were up to 10 percent more powerful than Intel’s, which looks almost correct in real-world applications where added latencies from buses and memory lower the raw gain.

Anyway, the much more advanced Cyrix 486 Core (and its licensed clone from Texas Instrument) takes the lead by a giant margin despite being pin-compatible with the 386 Socket. The slowest Cyrix 486DLC-25 is almost as fast as an Intel/AMD 386 clocked 40 MHz, and the clock-doubled 486DRx2 are twice faster than the fastest Intel 386!

Stay tuned for bigger UCA news tomorrow!

The UCA now supports C&T Super386!

Chips and Technologies (C&T or CHIPS) was a little-known company found in California in 1984. The company first developed one of the first EGA video chipset and some system logic chip sets for IBM’s PC-XT and PC-AT. In September 1991, C&T announced its very first and only x86-compatible CPU: an Intel 386 compatible chip named the “Super386”. It used clean-room implementation process (basically reverse engineering) but Intel sued them almost immediately for patent infringement. Being unable to fight against Chipzilla on a costly trial, the case was settled in 1993, making the Super386 a very short-lived CPU only produced in few quantities in 1992. Later, C&T refocused on laptop graphic chips and was ultimately acquired by Intel in 1997.

Here are the x86-related products from C&T announced in 1991:

    • J38600DX – A PGA CPU pin-compatible with the Intel 386DX at 20, 25, 33 and 40 MHz at $157, $157, $195 and $206. Only the 25 MHz and 33 MHz parts seem to have reached the marked.
    • J38600SX – A 386SX pin-compatible CPU announced in 16MHz, 20MHz and 25MHz versions at $59, $88 and $92 in volume quantities. No retail nor prototypes part ever surfaced.
    • J38605SX/DX – A more advanced 386 with 0.5 KB cache and an innovative feature called SuperState V. Not pin-compatible with Intel 386. Never released but a couple prototypes are known.
    • J38700SX/DX – A 387 math co-processor, by far the more “common” chip of them all. Available in PGA (DX) and QFP (SX) versions at speed ranging from 16 MHz to 40 MHz. Pin compatible with their Intel counterparts.

The Super386 (J38600DX) being the only x86 CPU from Chips and Technologies that reached the stores (albeit in very small quantities), it had to be supported by the UCA!  I was able to finally find one and add support on the UCA!

Running at 25 MHz with a much lower power consumption than the first 386s, the C&T Super386 identifies itself with CPUID 0x300. The same CPUID was used for very early (and very rare) A-Step Intel 386. To distinguish them, you must check for the undocumented 0x0F, 0x18 instruction, only available on the C&T Super386.

I’ll publish some benchmarks tomorrow.

More interesting information about the C&T Super386 can be found here:

 

UCA production postponed due to IC shortage

It’s been quite a long time without any Universal Chip Analyzer news. No worries: I continued to work on the project and added some nice features in the past months. It’s now time to publish some updates. Let’s start with the biggest issue: the final UCA v2 is now ready for production but unfortunately, the global IC shortage is so serious that I’m not able to start producing them right now.

The availability and price for some key components used for the UCA is totally out of control since last summer. The lead time for many ICs is now counted in months, sometimes even more than a year. Prices have gone insane: 2x to 3x increase for DC-DC converters and common MCUs like the ARM Cortex-M0 ATSAMD21 used on the UCA and even much more for the base FPGA.

The exact same FPGA from the same supplier, which I bought $5.9 less than one year ago, now costs $65, an awesome 10x increase! Volume quantities are still available, but from a few brokers (with questionable control quality) and only at indecent prices. Some of them even bought batch of previously assembled boards to salvage the chips and sell them as “used” parts! Based on these crazy prices, a complete UCA (FPGA + IF board without top interface board) would cost in the $200 range, which is way too high to start production.

I’ve studied many workarounds, but unfortunately, none of them can solve the problem quickly. One of the options was to switch from the Spartan-6 line to the newer Spartan-7 or Artix-7 line. Both are still active but unavailable at decent price right now, but I would expect the Xilinx FPGAs based on the 28nm process (7 series) to become available again sooner than the more mature 45nm line (6 series).

While Spartan-6 are not expected to become EOL before 2027 at least, switching to the Spartan-7 will allow access to the newer software development suite from Xilinx (Vivado) instead of the old ISE. On the downside, the higher performance of the Spartan-7 is almost useless in the UCA, they only come in BGA form factor (which is more costly to assemble than the TQFP package used on the Spartan-6) and they also require a much more complex power distribution scheme that will increase the overall BOM by at least $15. To make things even harder, Artix-7/Spartan-7 FPGAs require a much bigger configuration file (bitstream) that will jeopardize the whole programming tricks I use on the UCA. A new MCU based on a more powerful Cortex-M3 will be mandatory to program the FPGA, with bitstreams files (up to 100+ on the UCA!) stored on an SD Card instead of a Flash EEPROM.

All these changes would approximately double the price of the base UCA board without any significant advantage for the end-user. At this time, I’ll stick on the Spartan-6 as they’re the perfect FPGA for the UCA, but I will probably start working on a completely redesigned Spartan-7 base board as a last-resort backup solution.

Here is for the bad thing. Stay tuned for better news tomorrow and even more throughout the week!

Experimental Pentium Overdrive testing with the UCA

When I designed the Universal Chip Analyzer, the goal was to be able to test everything from the 4004 to the 486 DX4-100 (and CPUs sharing the same pinout like AMD/Cyrix 5×86). Any Pentium-class CPU was out of scope due to physical limitation. Even if the UCA architecture can probably handle them from an electrical point of view, the size of adapters is just too small for the 200+ pins Socket needed for a Socket 4 or 5. Even the Socket 3 used by Pentium Overdrive on the PCB can’t fit between the two connectors. So P5 support on the UCA looked really impossible. Really? Wait a minute…

First, let’s have a look at the pinout of Pentium Overdrive, and especially at the outer pins rows :

As we can see, all pins but 8 are reserved (No connection) or used for power supply (VCC/VSS). Pentium ODs requires much more power than any 486, so Intel basically doubled the number of power supply pins to insure stability. The INIT (F19) pin is basically useless because it’s redundant with RESET and supplied with an internal pull-down to avoid spurious trigger. The 7 others pins are related to Write-Back L2 cache, which is useless on the UCA because the internal RAM is as fast as the L1 cache. So maybe the Pentium OD can work without connecting the outer rows?

To check that incredible possibility, I build a 486 adapter with a standard Socket.

The Pentium Overdrive 83 MHz fits perfectly with the outer pins floating.

And … IT WORKS! At the UCA boot frequency (FSB 16 MHz), the CPU was able to run perfectly fine at 40 MHz using its internal 2.5x multiplier!

After adding some code to support the new CPUID (0x1532), the UCA Analyzer tool was able to detect the Pentium Overdrive correctly and run the full test suite without issue.

The Pentium OD doesn’t support JTAG, unfortunately. Power consumption is quite low for a Pentium-class CPU: 865 mA at 40 MHz. The last question is: how high can it go without the additional power supply pins connected? I tried 20 MHz FSB (50 MHz clock) without issue, then I tried 25 MHz FSB for a 62.5 MHz clock (like the Pentium Overdrive 63 MHz)

And it still works! Current consumption rises to ~1.3A to reach 6.4W. I also tried running the Pentium OD at full speed (33.3 MHz FSB for a 83 MHz final clock) but unfortunately, it only runs for a couple seconds before crashing. There is no doubt that the crash come from the missing power supply pins, but being able to run it at 63 MHz on the UCA is quite impressive! 

 

ATX2AT Smart Converter – Firmware 1.21 released

I’ve just released a new firmware (1.21) for the ATX2AT Smart Converter and an update (0.4b) for the Windows companion tool (ATX2AT Configuration tool). Both are available as source and binary on the GitHub page.

Here is the change log :

    • Added a configuration option for AT-Style push button
    • Added a “firmware outdated” version check at startup
    • Added a firmware update feature within the Configuration tool for easy update
    • Solved an issue with Infinite (disabled) screensaver setting
    • Solved an issue with log display

Basically, you just need to download the ATX2AT Configuration tool v0.4b binary package and use the “FW Update” button located on bottom-right corner. The tool should be able to auto-detect the ATX2AT Smart Converter, switch it to bootloader mode then use the embedded avrdude to flash the new firmware. If all goes well, you will see your new Firmware Revision as 1.21 :

You will notice a new option called “Power Button Type” that defaults to the standard ATX-style (momentary push button). Some users asked for a way to use the ATX2AT Smart Converter with a genuine AT case using the standard switch (SPST). So here it is. With the Power Button Type set to “AT”, it’s now possible to wire a standard AT button on the 2-pin EXT_PWR connector (2.54 mm / 0.1″ header).

Universal Chip Analyzer v2 disclosed!

With the development of the PGA Shields (now able to support all Intel CPUs from 80186 to 80486) and the rise of demand from collectors, it was time to think about producing a batch of the Universal Chip Analyzer. In January, I finally decided to rebuild everything from scratch to get rid of old issues and restart from a “clean” foundation. The original Mojo v3 board I used since the very beginning was a fantastic tool, but after way too many patches, I encountered “hard” limitations which would have become major issues later. As I don’t want to rework the base FPGA board nor the main interface (IF) board for years to come, the solution was to build the perfect PCBs one time for all.

So, let me introduce the Universal Chip Analyzer v2!

UCA FPGA Base Board

The Mojo V3 was a great tool, but it’s a 2013 Kickstarter product tailored as a development board.  I hesitated for a long time to replace the Xilinx Spartran-6 FPGA with a “new-gen” Spartran-7 or even an Artix-7. I finally decided to stay with the Spartran-6 for many reasons.

    1. Xilinx 7-Series FPGA are only available in BGA and not in QFP packaging. That mean more complex PCB and higher manufacturing cost.
    2. While 6-series are happy with two simple 3.3V and 1.2V linear regulators, 7-series FPGAs requires 3.3V/1.8V/1.35V and 1.0V. That mean noisy DC-DC buck converters, more filtering, and ultimately MUCH higher BOM and assembly costs.
    3. The speed and logic cells count on the Spartran-6 XC6SLX9-2 are enough for all actual and future uses I can think of. I could have used more Block RAM, but it’s not a limitation.
    4. Xilinx announced that that this FPGA is a “long term product” that will be manufactured at least until 2027. It’s also quite cheap now (< $10).

Switching to a Spartran-7 or Artix-7 would have just significantly increased the price and overall complexity without adding any feature. The only interesting point I will miss is related to the development toolchain. I could have finally got rid of the infamous Xilinx ISE for the new Xilinx Vivado Design Suite. But after all, I’m now quite comfortable with all the damn ISE’s bugs, so…

Here is the new Universal Chip Analyzer board next to the old one.

I kept the overall form factor, just a bit (6 mm) higher, but many components changed.

    1. ARM Main microcontroller – The original 8-bit ATMEGA32U4 (at 16 MHz, with 32 KB Flash & 2.5 KB SRAM) has been replaced with a 32-bit ARM-based ATSAMD21G18. The new MCU is clocked at 48 MHz, Flash capacity is 8x higher (256 KB) and SRAM is now upgraded to 32 KB. It’s also MUCH faster and I have room for many future improvements. While the ATMEGA32U4 was 80% full, the new ATSAMD21G18 is under 20% after a full code rewrite, and with more features added!
    2. 512 Mb Flash Memory – The original Mojo v3 used a 4 Mb SPI Flash able to store a single FPGA configuration file. With the first UCA, I upgraded the flash to 128 Mb to store up to 40 different configuration bit-files. The final UCA now use a 512 Mb Flash to store more than 150 configurations file simultaneously.
    3. EEPROM – A small 64 Kb I2C EEPROM to store calibration constants, configuration, serial numbers, etc. has been added.
    4. USB-C Connector – The good old Micro-USB connector tend to become obsolete. The new reversible USB-C connector will soon become the standard. It is also more robust.
    5. Better XO. The main 50 MHz oscillator has been upgraded to a 20 ppm, low power one for lower jitter and better stability at high frequency.
    6. Stronger power filtering – The filtering/decoupling stage was limited on previous board. It is now much stronger, allowing higher noise immunity and better switching speed for fast CPU like 486s. Thermal dissipation has also been vastly improved.
    7. Power Connector – First prototype of the old UCA v2 used a tiny 1.35mm jack located on the IF board. The final one come with a standard 2.1mm jack with polarity reverse protection. An additional 9V or 12V power supply is mandatory for all supported Ics. I tested some USB to 9V/12V adapter, and they work fine, making testing from a power bank on the field possible.

There are also many layout changes, allowing for example I2C communication from the MCU to IF to Adapter boards.

UCA Interface (IF) Board

The final IF board has been upgraded to perfectly fit on top of the FPGA board. The PCB has been enhanced for reliability while lowering BOM cost. All but one tantalum capacitors have been replaced by MLCC (ceramic) caps. Layout has also been improved for better decoupling efficiency. Along the main voltage transceivers, the UCA IF board includes a 2A DC-DC voltage converter, precision voltage and current monitoring, and adjustable fast overcurrent protection. Voltage can be set by software (25 mV steps). A standard 3-pin fan header is available for high-power CPU like DX4s.

The slightly bigger PCB height allowed an optional 0.91″ 128×64 OLED display to fit on top of the board. It will be used later to display additional information about the test status. Right now, it shows the selected CPU Family and the voltage/current used.

UCA Adapters

The pinout on both 50-pin connectors located on the IF Board as slightly changed to accommodate previous modifications. I added some new signals to avoid future limitations. For example, the I2C is not passed from the ARM MCU to the adapter boards. Adapter’s ID also changed for their final values, so all currently designed adapters required a small layout change.

Let’s see the currently designed adapter and their current status.

    • UCA 80486 Adapter

The 486 adapter has been recently upgraded to support JTAG reading. From a hardware point of view, the adapter is almost finished. There is still a small side feature I would like to add, but it’s a minor modification. The 486 adapter is able to test all 486 ever released, from the Intel 486 SX-16 to the Cyrix 5×86-P133, but also 487s, AMD 586, Ti, UMC and IBM 486s.

    • UCA 80386 Adapter

The 386 Adapter has been the most difficult one to build so far. While the hardware is now almost fine, it still need some work on the FPGA code to fine-tune some timings.

    • UCA 80286 Adapter

Almost finished and working as expected with all kind of 286s. The internal MCU code must be rewritten to accomate the new communication protocol, but it’s not a very complex task.

    • UCA 80186 Adapter

The 186 Adapter was the first adapter to be build directly for the new UCA “v2”. It was used to debug the new communication protocol between the different part of the UCA. Both Hardware and Software are now done. The only missing feature is the automatic detection of 186 vs 188 (currently, you have to select the correct bus type with the DIP Switch)

    • UCA DIP40 Adapter (8088/8086 & more) 

The “iAPX-86 Adapter” has been renamed the “DIP40” Adapter as it is able to also test various other DIP40 IC. Along 8086 & 8088, the UCA DIP40 Adapter can also test 8085s, NSC800s, MCS48 and MCS51 MCUs, RCA “COSMAC” CDP1802s without the need of any adapters. With a specific adapter that plug on top of the DIP40 ZIF, it can also test Zilog Z80s, 8080s, MOS 65xx and Motorola 68xx.

    • UCA 8087 Adapter

The 8087 Adapter has been quickly developed to show the UCA’s ability to also test FPUs. It requires a fixed 8086-compatible CPU that runs in MAX mode (while the DIP40 Adapter uses the MIN Mode).

    • UCA 8080 Adapter

After discussion with fellow CPU collectors, I developed a standalone adapter for Intel 8080s. The price and feature of this one are the same than the Adapter that fit on top of the DIP40 Adapter :


At this time, I’m sure witch solution is the best. Maybe the standalone version is better to avoid mistake with DIP Switches… Leave a comment to give your thoughts!

    • UCA Debug Adapters

These Adapters are just for internal use, but I wanted to share some pictures just for fun.

The left one is fitted with many precision power resistors and is needed to calibrate the power monitoring IC at various current load (10 mA, 50 mA, 100 mA, 250 mA and 2×500 mA). The right one is mainly used to test all signals of a newly-assembled FPGA/IF boards. It can detect shorts to VCC, open-circuits or adjacent-signal shorts. A backplate “Firmware Programmer” board with tiny pogo-pins has also been developed to flash the initial bootloader inside a blank UCA.

Stay tuned for more news about IC support and UCA production soon!

 

The UCA now supports Intel 487 SX

Released in 1991 and marketed as a floating-point coprocessor for the Intel 486 SX, the Intel 80487 was actually a fully featured Intel 486 DX with a slightly different pinout. Intel added an unconnected 169th pin as a mechanical key for the 487 Socket. Another pin known as “MP#” (Math Present) was used to entirely disable the original 486 SX by triggering its “back-off” (from bus) mode. Being almost 100% compatible with the 486 DX, supporting the 486 SX with the Universal Chip Analyzer was trivial. I bought many Socket 168 socket and I just drilled a 1 mm hole and it worked immediately.

 

 

According to the 487’s datasheet, it was rated at 25 MHz maximum, but it also run fine at 33 MHz. It is possible to detect a B0-step Intel 487 by its unique CPUID (0x421). AFAIK, all retail 487s are B0-Step. A0-step are Engineering Sample only (with an unknown CPUID, maybe 0x420).

While testing the 487, I noticed a strange behavior that will deserve more investigation later.  It seems the 487SX needs a longer reset period to initialize properly compared to a standard 486. Technically, it makes sense: this additional delay might be needed to let the original 486SX disable itself and back off properly from the bus (before the 487SX takes full control).