LEON3 area
From RAD Lab
Contents |
LEON3 area report
Preliminary report (2 LEON3s on Xilinx XC2VP70)
Processor configuration:
- Version 1.07 LEON3
- 8 register windows/1 hardware watchpoint
- Pipelined GRFPU complies with IEEE 754 and full
- Cache organization (snooping support)
- LRU 16KB-I cache: 4 sets, 4 KB/set, 32 bytes/line
- LRU 16KB-D cache: 2 sets, 8 KB/set, 32 bytes/line
- 8-entry (combined i/d TLB) MMU, increment replacement
- Serial port debugging unit
Peripherals:
- AMBA bus: AHB, APB and AHB/APB bridge
- Ethernet
- 1 UART
- 2 32-bit Timers
- Interrupt controller
- 8KB on-chip RAM
Xilinx XST 8.1i SP3
53,743 LUTs (81%); 21% BRAM
Mentor Graphic Precision Physical 2005c.115
| AHB RAM | 23 |
|---|---|
| AMBA APB arbiter | 184 |
| AMBA AHB arbiter | 392 |
| UART | 289 |
| Debug Unit | 1090 |
| Timer | 376 |
| Ethernet | 1459 |
| Interrupt Controller | 447 |
| LEON3 (single) | HW div 322, HW mul 205, MMU 820, Cache Controller 3257, others(pipeline and etc.) 3999, IEEE 754 FPU (XST netlist) 14,435. Total: 8,603 + 1,4435 = 23,038 |
| Total (2 LEON3) | 50,455 (76%) |
LEON3 is much bigger than we expected. No MMU/FPU is around 3500 LUTs (similar to fully configured MicroBlaze). Cache controller is large, which is around the size of the baseline processor itself. IEEE 754 FPU is JUMBO (14,435 LUTs for pipelined FPU, 4,012 LUTs for non-pipelined version). No FPU RTL source code, only Xilinx XST netlist is available (Thanks to Xilinx XST's poor result!!).
State of the art FPGA synthesis is much more efficient than Xilinx XST (around *30%* LUTs save).
Mentor Graphic Precision Result (1 LEON)
Due to a Precision optimization bug on Ethernet debugging interface, Mentor Graphics Precision is working in a compatible mode, which leads to 25% more LUTs. In this version, LEON3 and peripherals are updated to version 1.09. DDR memory controller is not included, which is around 1,000-1,500 LUTs.
Peripherals (1 LEON in the system)
| AHB RAM (4 KBytes) | 22 |
|---|---|
| Boot ROM ++ | 223 |
| AMBA APB arbiter + | 58 |
| AMBA AHB arbiter + | 321 |
| Debug UART | 546 |
| Timer (2 32-bit) | 533 |
| Ethernet | 3158 (w. debugging support)/ 1823 (wo. debugging support) |
| Interrupt Controller + | 238 |
| Total | 5099 (w. Ethernet debugging)/ 3764 (wo. debugging support) |
+ Size will be increased if adding more LEON.
++ Size depends on the boot ROM code size.
LEON3
| Increment TLB Replacement | LRU TLB Replacement | |
| 2 entry TLB | 550 | 554 |
| 8 entry TLB | 895 | 966 |
- Changing MMU configuration doesn't touch the rest of LEON (pipeline, cache controller and etc)
| Dimension (Way/Total Size (KB)/Block Size (B) | LRU Replacement | LRR Replacement | Random Replacement |
|---|---|---|---|
| 1/4/32 (direct map) | 1905 | ||
| 2/8/32 | 2690 | 2223 | 2212 |
| 4/16/32 | 4468 | 2887 | 2860 |
| Debug Unit | 761 |
|---|---|
| HW Multiplier | 357 |
| HW Divider | 427 |
| IU (Integer Pipeline and etc)* | 4,900-5,200 |
| IEEE 754 FPU ** | 14,435 (Pipelined version)/ 4,012 (nonpipelined version) |
* IU will vary slightly with different cache configuration.
** Only result of Xilinx XST netlist is available.
Synplify result (1 LEON on Virtex-2 Pro and Virtex-5)
The results are from Synplify Premier 8.6.1. The target Virtex-5 device is XC5VLX220-3 (34,560*4 6-input LUTs). No Virtex-5 blockram optimizations are applied in the experiements. The results are given in pair with the former for Virtex-2 and later for Virtex-5.
Without any floorplanning and physical synthesis, LEON can work at 120 MHz on Virtex-5 with DDR memory clocked at 200 MHz (DDR400).
Peripherals (1 LEON in the system)
| AHB RAM (4 KBytes) | 24/19 |
|---|---|
| Boot ROM ++ | 0/72 |
| AMBA APB arbiter + | 172/130 |
| AMBA AHB arbiter + | 395/284 |
| Debug UART | 433/341 |
| Timer (2 32-bit) | 315/219 |
| Ethernet (w. LEON debugging support) | 2410/1863 |
| Interrupt Controller + | 234/171 |
| DDR memory controller | 1936/1480 |
| Total | 5919/4579 |
+ Size will be increased if adding more LEON.
++ Size depends on the boot ROM code size. On virtex-2, Synplify implements boot rom in blockram. While on virtex-5, LUT based rom is used.
LEON3
| Increment TLB Replacement | LRU TLB Replacement | |
| 2 entry TLB | 471/386 | 493/389 |
| 8 entry TLB | 825/616 | 922/653 |
- Changing MMU configuration doesn't touch the rest of LEON (pipeline, cache controller and etc)
| Dimension (Way/Total Size (KB)/Block Size (B) | LRU Replacement | LRR Replacement | Random Replacement |
|---|---|---|---|
| 1/4/32 (direct map) | 1414/1085 | ||
| 2/8/32 | 2037/1582 | 1952/1424 | 1907/1403 |
| 4/16/32 | 3063/2258 | 2544/1935 | 2427/1830 |
| Debug Unit | 612/468 |
|---|---|
| HW Multiplier | 368/384 |
| HW Divider | 378/320 |
| IU (Integer Pipeline and etc)* | 4163/3099 |
* IU will vary slightly with different cache configuration.
Summary
The following table presents an accumulative area report of LEON. The last column is the number of full-configured LEON + Peripheral. The column next to the last is for processor only (used to compare with PPC, MicroBlaze and etc).
The statistics assumes the following LEON3 configuration
- 4K direct map I/D Cache (separate)
- MMU with 8-entry incremental replacement TLB
| baseline LEON | +HW MUL/DIV | +LEON debug | +MMU | +Cache Controller | +FPU (Gaisler nonpipelined)* | +Peripheral (buses, DDR mctrl, Ethernet) | |
| Virtex-2 Pro | 4,163 | 4,909 | 5,521 | 6,346 | 7,760 | 11,772 | 17,691 |
| Virtex-5 LX | 3,099 | 3,803 | 4,271 | 4,887 | 5,972 | - | 10,551 |
* FPU is in 4-input LUT netlist only, therefore is not applicable to Virtex-5.
LEON3's Pros:
- 32-bit SPARCv8 compatible with MMU support
- IEEE754 compatible FPU, also support Sun Meiko FPU
- SMP support with snoopy protocol
- Well-written open source VHDL codes and fully reconfigurable
- Full development environment (gcc, eclipse, debugging tools)
- Full Linux 2.6 support through a special distribution (snap-gear)
- LEON3's mailinglist is the main channel to get support, but the author is quite helpful on it
- Except DDR memory controller, not much unpleasant experiences like those on MicroBlaze. Besides the author is very responsive in fixing the problem reported on their mailing list.
LEON3's Cons:
- Larger and slower (in terms of clock frequency) than MicroBlaze. Roughly, double the size (including FPU) and half the clock frequency.
- Snooping is disabled when used with MMU. (cache tags are virtual address). This will be fixed in the furture release.
- No source code for FPU. Need to find a smaller and open source one.
- Document is no better than MicroBlaze, though well-written codes.
- LEON3 relies on 32-bit ARM AMBA bus now. Some peripheral cores are not 64-bit ready.
- LEON3 is not virtualizable, no hypervisor mode in SPARCv9.
LEON4 preview:
- Stay with 32-bit SPARCv8 standard
- Snooping with MMU enabled
- Increase processor frontend bus to 64-bit wide. Improve performance and can plug in 64-bit peripherals (e.g. Niagara's 64-bit ALU)
- Seperate integer and floating point pipeline
