Github – Chipsalliance / T1

T1 (torrent-1) an enforcement of the Risc-V vector inspired by the cray x1 vector machine, named T0.
T1 aims to implement RISC-V vector to micro-based lane-based architectures, with strong supporting chains and VRFs in SRAM-based in SRAM.
T1 supports the pattern Zve32f
and Zve32x
and VLEN
/DLEN
can be increased to 64K
Hit the RISC-V vector arkitecture bottleneck.
T1 sent significant vector machine parts, eg, paths, ban, and large lsu
T1 is designed laden and release T1Emulator
to users.
T1 uses a guitar version of rocket core as the scalar part of T1. But we don’t officially support it today; This can be replaced with any other RATC-V Scalar CPU.
T1 only supports loading and enforcing metal; The examples of the test appear in tests/
Folder.
Processed T1 vector processors may include any scalar scalar risk.
- Default support for multiple tracks (32-bits per-lane).
- Load a lot of storage execs to load the chain ability.
- Ram-based ram-based Rampilary-configured SRAM with DUPORT, TWOPORT, and SINGLEPORT support.
- Pipelined / Asynchronous vector function unit (VFU) with comprehensive supporting chain. Offering 4 VFU slots per course, many and different VFUs can be included in the corresponding path.
- T1 Lane Putecute can skip mask elements for masked instructions that all masks to accommodate sparsity to the mask.
- We have used a direct connected roadway for
widen
andnarrow
instructions.
- Configured pointed ports in Porthned.
- Instructional Level (OOO) Load / Store, Repair Long Memory Bandwidth in Vector Cache.
- Configured remarkable size to ease memory memory.
- Fully bound by the vector function unit (VFU).
Compared to some core-of-order core designs with advanced specemate schemes, the architectural vector machine is quite straightforward. Instead of being dedicated to a branch prediction unit (BPU), reordering and reordering buffers (Rob) or prefetching. Vector instructions provide enough metadata to allow T1 to run for thousands of elements without needing a speculation.
T1 is designed to balance balance, place, and quantity of VRF, VFU, and LSU. With the T1 generator, it is easy to configure it to achieve high efficacy or high performance, depending on the desired units of quitting, even the desired functional units, even increasing the units of functionality, even increasing units of functionality, even increasing units of functionality, even increasing units of functionality, even increasing units of functionality, even increasing units of functionality, even increasing units of functionality, even increasing units of functionality, even the intended units of functionality, even increasing units of functionality, even the increasing units of functional Units of dismissal, even adding actions to the Actions or Cleanings of the FPU, supporting the FPU, supporting the FPU, supporting the FPU, supporting the FPU, supporting the FPU, supporting the FPU, supporting the FPU, supporting the FPU, supporting the FPU, supporting the FPU, supporting the FPU, supporting the FPU, supporting the FPU, supporting FPU, supporting FPU, supporting FPU, supporting FPU, supporting FPU, supporting FPU, supporting FPU, supporting FPU, that supports FPU, supporting FPU, supporting FPU, supporting Zve32f
and stayed Zve32x
just.
The method for micro-architectural tuning is based on this trade idea:
Total vector core frequency should be limited to VRF memory. Based on this principle, we can sublime the VFU pipeline in several stages to address the target frequency. For a small, efficient coward, designers should choose high-density memory (which usually does not give high volume) and reduces vfu stages. For a long performance wisdom, they need to increase pipeline stages and use the fastest possible SRAM for VRFs.
Bandwidth Bottleneck is limited to VRF SRAM. For each VFU, if it operates, it may experience risks due to limited VRF memory ports. Users can increase banking size in VRFs. The VRF’s cheating forces an all crossbar between VFU and VRF banks, with heavy effects of physical design. Users need to sell Bandwidth in SECF and VRF by limiting connection between killing and VRFs.
LSU bandwidth is restricted to memory ports: LSU also configured to give a crazy memory bandwidth with a small overhead. It consists of these bus limits:
- That requires FIFO (first-in-out-out) bus ordering. If FIFO is not implemented by the bus ip, a large reorder unit will be implemented because of the quantity of extraordinary
sourceId
in tileelink, likeAWID
,,ARID
,,WID
,,RID
,,BID
at the axi protocol. - That requires no-mmu for high-bandwidth-ports, because we can ask
DLEN/32
Elements from the TLB for each cycle in an index mode of loading in the store, while there is an unreasonable mistake in the wrong mistake. Now, these parts are not supported by the current rocket core. - No Covesence Support: Any high cache performance cannot carry T1
DLEN/32
questions.
The important point of T1 LSU is designed to support more memory banks. Each memory bank has 3 MSHRS for the instructions of remarkable memory instructions, while each instruction can record thousands of states in FIFO transaction. T1 also supports interleaved level with interleaved vector load / store to maximize the use of memory ports for High Remember bandwidth.
For tuning ideal vector machines, follow these methods of performance tuning:
- Learn DLEN for your requirement parallelism, aka the required bandwidth for the vector unit.
- The corresponding bandwidth for VRF, VFU, and LSU.
- Based on your work work, learn the needed vlen as it dictates the VRF memory area.
- Choose memory type for VRF, which determines chip frequency.
- Run T1Emulator and PNR for your workloads to study Micro-Architecture.
We have an IP emulator at the bottom of the directory ./t1emu
. Silkey used as the scalar core reference, integrated with Verilat vector IP. Under the online recognition strategy, the emulator compares the load / store and VRF writing between spike and T1 to verify the right T1.
docker pull ghcr.io/chipsalliance/t1-$config:latest
# For example, config with dlen 256 vlen 512 support
docker pull ghcr.io/chipsalliance/t1-blastoise:latest
O Construction of image using Nix and load it to Docker
nix build -L ".#t1.$config.release.docker-image" --out-link docker-image.tar.gz
docker load -i ./docker-image.tar.gz
Using NIX to build Docker-Image required KVM KVM, so this derivation is not available for some platform without support to QEMU / KVM.
We use Nix Flake as our main construction system. If you have not installed Nix, install it following GUIDEand able to flake after wiki. Or you can try the place in the installer provided through determination systems, which can perform flake by default.
T1 includes a hardware design written by chisel and an emulator powered by a veriator. Elabador and Emulator can run with different configurations. Configurations can be represented by your favorite Pokemon! The only limit is used in T1 Type of pokemon To find out DLEN
Aka Lane size, based on the corresponding map:
money
the Bug
The type is reserved to submit the report to the user’s bug.
Users can add their own pokemon to configgen/src/Main.scala
to add configurations to different differences.
You can build its ingredients with the following orders:
$ nix build .#t1.elaborator # the wrapped jar file of the Chisel elaborator
# Build T1
$ nix build .#t1..t1.rtl # the elaborated IP core .sv files
# Build T1 Emu
$ nix build .#t1..t1emu.rtl # the elaborated IP core .sv files
$ nix build .#t1..t1emu.verilator-emu # build the IP core emulator using verilator
$ nix build .#t1..t1emu.vcs-emu --impure # build the IP core emulator using VCS w/ VCS environment locally
$ nix build .#t1..t1emu.vcs-emu-trace --impure # build the IP core emulator using VCS w/ trace support
# Build T1 Rocket emulator
$ nix build .#t1..t1rocketemu.rtl # the elaborated T1 with Rocket core .sv files
$ nix build .#t1..t1rocketemu.verilator-emu # build the t1rocket emulator using verilator
$ nix build .#t1..t1rocketemu.vcs-emu # build the t1rocket emulator using VCS
$ nix build .#t1..t1rocketemu.vcs-emu-trace # build the t1rocket emulator using VCS with trace support
hope
should be replaced by a configuration name, eg blastoise
. Building output will be placed ./result
directory by default.
Now under Attempted Configs:
Config Name | Short summary |
---|---|
Blastiise | DLEN256 VLEN512; FP; VRF p0rw,p1rw bank1; LSU bank8 beatbyte 8 |
Macamp | DLEN512 VLEN1K ; NOFP; VRF p0r,p1w bank2; LSU bank8 beatbyte 16 |
Sandlash | DLEN1K VLEN4K ; NOFP; VRF p0rw bank4; LSU bank16 beatbyte 16 |
Alangkozam | DLEN2K VLEN16K; NOFP; VRF p0rw bank8; LSU bank8 beatbyte 64 |
T1Rocket | Configs that specific to t1rocket |
the
can also be t1rocket
This is a special configuration name that allows chip-chip support for scalar instruction.
To see all possible combinations of
and
Usage:
To run the IP Emulator Testcase, use the following script:
$ nix develop -c t1-helper run -i <top-name> -c <config-name> -e <emulator-type> <case-name>
hope
The configuration name
one witht1emu
,,t1rocketemu
one withverilator-emu
,,verilator-emu-trace
,,vcs-emu
,,vcs-emu-trace
,,vcs-emu-cover
Is the name of a tespase, you can resolve runnable test cases by command:t1-helper listCases -c
For example:
$ nix develop -c t1-helper run -i t1emu -c blastoise -e vcs-emu intrinsic.linear_normalization
To get the waveform, use Trace Emulator
$ nix develop -c t1-helper run -i t1emu -c blastoise -e vcs-emu-trace intrinsic.linear_normalization
the
,,
and
The option can be cached under $XDG_CONFIG_HOME
So if you want to test a number of test cases with the same emulator, you don’t have to add -c
,, -i
and -e
optional each time.
For example:
$ nix develop -c t1-helper run -i t1emu -c blastoise -e vcs-emu-trace intrinsic.linear_normalization
$ nix develop -c t1-helper run pytorch.llama
To get Verbose logging, add the -v
OPTION
$ nix develop -c t1-helper run -v pytorch.lenet
the t1-helper run
The subcombermand is just running the driver without treating the internal situation. In running design verification, use the t1-helper check
Surcommad:
$ nix develop -c t1-helper run -i t1emu -c blastoise -e vcs-emu mlir.hello
$ nix develop -c t1-helper check
the t1-helper check
The subcombermand will read RTL activity made of run
stage, so make sure you run
a test before check
.
To get the cover report, use the vcs-emu-cover
Type of emulator:
$ nix develop -c t1-helper run -i t1emu -c blastoise -e vcs-emu-cover mlir.hello
$ nix run .#t1...omreader # export the contents of the specified key
$ nix run .#t1...emu-omreader # export the contents of the specified key with emulation support
To blow up all the available keys and preview their content:
$ nix run .#t1...omreader -- run --dump-methods
$ nix run .#t1...emu-omreader -- run --dump-methods
Scholon
PLAIN | KIND |
---|---|
(*) |
File |
(*).attributes |
THINGS |
(*).attributes(*) |
File |
(*).attributes(*).description |
lanyard |
(*).attributes(*).identifier |
lanyard |
(*).attributes(*).value |
lanyard |
$ nix develop .#t1.elaborator # bring up scala environment, circt tools, and create submodules
$ nix develop .#t1.elaborator.editable # or if you want submodules editable
$ mill -i elaborator # build and run elaborator
$ nix develop .#t1...vcs-dpi-lib # replace with your configuration name
$ cd difftest
$ cargo build --feature vcs
the tests/
The directory contains all test-ons.
- ASM
- tagaregen
- intrinsic
- mlir
- can
- Pythorch
- Rvv_bench
To see what is available to run, use the t1-helper listCases
Sumage command:
$ nix develop -c t1-helper listCases -c <config-name> -i <top-name> <regexp>
For example,
$ t1-helper listCases -c blastoise -i t1emu mlir
(INFO) Fetching current test cases
* mlir.axpy_masked
* mlir.conv
* mlir.hello
* mlir.matmul
* mlir.maxvl_tail_setvl_front
* mlir.rvv_vp_intrinsic_add
* mlir.rvv_vp_intrinsic_add_scalable
* mlir.stripmining
* mlir.vectoradd
$ t1-helper listCases -c blastoise -i t1emu '.*vslid.*'
(INFO) Fetching current test cases
* codegen.vslide1down_vx
* codegen.vslide1up_vx
* codegen.vslidedown_vi
* codegen.vslidedown_vx
* codegen.vslideup_vi
* codegen.vslideup_vx
To develop a specific testcases, enter the shell shell:
# nix develop .#t1...cases..
#
# For example:
$ nix develop .#t1.blastoise.t1emu.cases.pytorch.llama
Build tests:
# build a single test
$ nix build .#t1...cases.intrinsic.matmul -L
$ ls -al ./result
To develop the scope, use the following steps:
- Write the coverage Description DESCRIPTION file with the same level as the source code in the case of the test.
- Update to
default.nix
File to parse the file specified in scope coverage.
For example, the development of coverage is for mlir.hello
Case of Test:
Trials / Mlir / Hello / Hi.Json:
{
"assert": (
{
"name": "vmv_v_i",
"description": "single instruction vmv.v.i"
}
),
"tree": (),
"module": ()
}
Trials / Mlir / Default.nix:
if ( -f ${caseName}.json ); then
${jq}/bin/jq -r '(.assert() | "+assert " + .name) + (.tree() | "+tree " + .name) + (.module() | "+module " + .name) | .()' \
${caseName}.json > $pname.cover
else
echo "-assert *" > $pname.cover
fi
Then, you can run the test script script to check if the coverage is made correctly:
nix build .#t1.blastoise.t1emu.cases.mlir.hello -L
Use the vcs-emu-cover
Emulator Type to run the test case and generate the coverage report:
nix develop -c t1-helper run -i t1emu -c blastoise -e vcs-emu-cover mlir.hello
Bump Nixpkgs:
Bump Chisel submodule versions:
$ cd nix/t1/dependencies
$ nix run '.#nvfetcher'
Copyright © 2022-2023, Jiuyang Liu. Released under Apache-2.0 license.
https://opengraph.githubassets.com/dc1bf5a70d58b8d666a588cc19e17aec8ad1ff5dd504ce57ff97c20bc4913749/chipsalliance/t1
2025-02-03 14:22:00