LEON3 area

From RAD Lab

Jump to: navigation, search

Contents

LEON3 area report

Preliminary report (2 LEON3s on Xilinx XC2VP70)

Processor configuration:

  • Version 1.07 LEON3
  • 8 register windows/1 hardware watchpoint
  • Pipelined GRFPU complies with IEEE 754 and full
  • Cache organization (snooping support)
    • LRU 16KB-I cache: 4 sets, 4 KB/set, 32 bytes/line
    • LRU 16KB-D cache: 2 sets, 8 KB/set, 32 bytes/line
  • 8-entry (combined i/d TLB) MMU, increment replacement
  • Serial port debugging unit

Peripherals:

  • AMBA bus: AHB, APB and AHB/APB bridge
  • Ethernet
  • 1 UART
  • 2 32-bit Timers
  • Interrupt controller
  • 8KB on-chip RAM


Xilinx XST 8.1i SP3

53,743 LUTs (81%); 21% BRAM


Mentor Graphic Precision Physical 2005c.115

LUT count breakdown
AHB RAM 23
AMBA APB arbiter 184
AMBA AHB arbiter 392
UART 289
Debug Unit 1090
Timer 376
Ethernet 1459
Interrupt Controller 447
LEON3 (single) HW div 322, HW mul 205, MMU 820, Cache Controller 3257, others(pipeline and etc.) 3999, IEEE 754 FPU (XST netlist) 14,435. Total: 8,603 + 1,4435 = 23,038
Total (2 LEON3) 50,455 (76%)

LEON3 is much bigger than we expected. No MMU/FPU is around 3500 LUTs (similar to fully configured MicroBlaze). Cache controller is large, which is around the size of the baseline processor itself. IEEE 754 FPU is JUMBO (14,435 LUTs for pipelined FPU, 4,012 LUTs for non-pipelined version). No FPU RTL source code, only Xilinx XST netlist is available (Thanks to Xilinx XST's poor result!!).

State of the art FPGA synthesis is much more efficient than Xilinx XST (around *30%* LUTs save).


Mentor Graphic Precision Result (1 LEON)

Due to a Precision optimization bug on Ethernet debugging interface, Mentor Graphics Precision is working in a compatible mode, which leads to 25% more LUTs. In this version, LEON3 and peripherals are updated to version 1.09. DDR memory controller is not included, which is around 1,000-1,500 LUTs.

Peripherals (1 LEON in the system)

AHB RAM (4 KBytes) 22
Boot ROM ++ 223
AMBA APB arbiter + 58
AMBA AHB arbiter + 321
Debug UART 546
Timer (2 32-bit) 533
Ethernet 3158 (w. debugging support)/ 1823 (wo. debugging support)
Interrupt Controller + 238
Total 5099 (w. Ethernet debugging)/ 3764 (wo. debugging support)

+ Size will be increased if adding more LEON.

++ Size depends on the boot ROM code size.


LEON3

MMU with combined I/D TLB
Increment TLB Replacement LRU TLB Replacement
2 entry TLB 550 554
8 entry TLB 895 966
  • Changing MMU configuration doesn't touch the rest of LEON (pipeline, cache controller and etc)


Cache Controller (I/D cache have the same configuration)
Dimension (Way/Total Size (KB)/Block Size (B) LRU Replacement LRR Replacement Random Replacement
1/4/32 (direct map) 1905
2/8/32 2690 2223 2212
4/16/32 4468 2887 2860


The rest of LEON
Debug Unit 761
HW Multiplier 357
HW Divider 427
IU (Integer Pipeline and etc)* 4,900-5,200
IEEE 754 FPU ** 14,435 (Pipelined version)/ 4,012 (nonpipelined version)

* IU will vary slightly with different cache configuration.

** Only result of Xilinx XST netlist is available.


Synplify result (1 LEON on Virtex-2 Pro and Virtex-5)

The results are from Synplify Premier 8.6.1. The target Virtex-5 device is XC5VLX220-3 (34,560*4 6-input LUTs). No Virtex-5 blockram optimizations are applied in the experiements. The results are given in pair with the former for Virtex-2 and later for Virtex-5.

Without any floorplanning and physical synthesis, LEON can work at 120 MHz on Virtex-5 with DDR memory clocked at 200 MHz (DDR400).

Peripherals (1 LEON in the system)

AHB RAM (4 KBytes) 24/19
Boot ROM ++ 0/72
AMBA APB arbiter + 172/130
AMBA AHB arbiter + 395/284
Debug UART 433/341
Timer (2 32-bit) 315/219
Ethernet (w. LEON debugging support) 2410/1863
Interrupt Controller + 234/171
DDR memory controller 1936/1480
Total 5919/4579

+ Size will be increased if adding more LEON.

++ Size depends on the boot ROM code size. On virtex-2, Synplify implements boot rom in blockram. While on virtex-5, LUT based rom is used.


LEON3

MMU with combined I/D TLB
Increment TLB Replacement LRU TLB Replacement
2 entry TLB 471/386 493/389
8 entry TLB 825/616 922/653
  • Changing MMU configuration doesn't touch the rest of LEON (pipeline, cache controller and etc)


Cache Controller (I/D cache have the same configuration)
Dimension (Way/Total Size (KB)/Block Size (B) LRU Replacement LRR Replacement Random Replacement
1/4/32 (direct map) 1414/1085
2/8/32 2037/1582 1952/1424 1907/1403
4/16/32 3063/2258 2544/1935 2427/1830


The rest of LEON
Debug Unit 612/468
HW Multiplier 368/384
HW Divider 378/320
IU (Integer Pipeline and etc)* 4163/3099

* IU will vary slightly with different cache configuration.



Summary

The following table presents an accumulative area report of LEON. The last column is the number of full-configured LEON + Peripheral. The column next to the last is for processor only (used to compare with PPC, MicroBlaze and etc).

The statistics assumes the following LEON3 configuration

  • 4K direct map I/D Cache (separate)
  • MMU with 8-entry incremental replacement TLB
Area summary (in LUTs)
baseline LEON +HW MUL/DIV +LEON debug +MMU +Cache Controller +FPU (Gaisler nonpipelined)* +Peripheral (buses, DDR mctrl, Ethernet)
Virtex-2 Pro 4,163 4,909 5,521 6,346 7,760 11,772 17,691
Virtex-5 LX 3,099 3,803 4,271 4,887 5,972 - 10,551

* FPU is in 4-input LUT netlist only, therefore is not applicable to Virtex-5.


LEON3's Pros:

  • 32-bit SPARCv8 compatible with MMU support
  • IEEE754 compatible FPU, also support Sun Meiko FPU
  • SMP support with snoopy protocol
  • Well-written open source VHDL codes and fully reconfigurable
  • Full development environment (gcc, eclipse, debugging tools)
  • Full Linux 2.6 support through a special distribution (snap-gear)
  • LEON3's mailinglist is the main channel to get support, but the author is quite helpful on it
  • Except DDR memory controller, not much unpleasant experiences like those on MicroBlaze. Besides the author is very responsive in fixing the problem reported on their mailing list.


LEON3's Cons:

  • Larger and slower (in terms of clock frequency) than MicroBlaze. Roughly, double the size (including FPU) and half the clock frequency.
  • Snooping is disabled when used with MMU. (cache tags are virtual address). This will be fixed in the furture release.
  • No source code for FPU. Need to find a smaller and open source one.
  • Document is no better than MicroBlaze, though well-written codes.
  • LEON3 relies on 32-bit ARM AMBA bus now. Some peripheral cores are not 64-bit ready.
  • LEON3 is not virtualizable, no hypervisor mode in SPARCv9.


LEON4 preview:

  • Stay with 32-bit SPARCv8 standard
  • Snooping with MMU enabled
  • Increase processor frontend bus to 64-bit wide. Improve performance and can plug in 64-bit peripherals (e.g. Niagara's 64-bit ALU)
  • Seperate integer and floating point pipeline