HDL4SE:软件工程师学习Verilog语言(十七)
Posted 饶先宏
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了HDL4SE:软件工程师学习Verilog语言(十七)相关的知识,希望对你有一定的参考价值。
17 AXI总线
在实际的应用项目中,Xilinx的FPGA使用量比较大,Altera的好像越用越少了,这个可能跟很多因素相关,但是开发软件可能是一个重要的因素,Xilinx的Vivado开发工具,使用的门槛要低很多。为了让我们的RISC_V核在Xilinx的FPGA上跑起来,特意买了一块Xilinx的FPGA开发板来做适配。
FPGA开发越来越系统化了,Altera有QSys,Xilinx则以AXI总线为基础来支持系统设计,它的Vivado设计软件有图形化设计工具,提供了很多常用的IP核,减少了很多设计工作量,并通过代码自动生成减少了不少人为的错误。好用的设计软件,直接降低了使用的门槛,让设计人员更加关注项目相关的逻辑设计,而不是花精力跟一些格式符合性之类的八股文纠缠。本节介绍Vivado下的RISC-V核改造以及如何使用在一个项目中,我们先介绍AXI-Lite总线协议,然后为RISCV核提供AXI-Lite总线支持。
17.1 AXI-Lite
AXI总线是ARM的AMBA总线协议的一部分,完整的协议规范请到ARM公司网站协议规范下载,这里只做一般性描述。
AXI-Lite是AXI总线的一个简化版本,它在一次读写过程中只读写一个字,比较适合于设备的状态读写,并不适合于大规模的高速数据连续传输。AXI-Lite总线能够独立传输5种信息,各种信息在传输时没有相互之间的依赖关系,实现的时候可以独立实现,当然要完成一次读写操作,还是需要其中几种信息协同工作。AXI-Lite总线上的节点分为Master和Slave两种类型。
这五种信息分别是:写地址,写数据(包括字节是能),写回应,读地址,读数据,每种信息有独立的总线信号来表达。分别是:
- 写地址:awvalid, awready, awaddr, awprot,Master为源端,Slave为目的端
- 写数据:wvalid, wready, wdata, wstrb,Master为源端,Slave为目的端
- 写回应:bvalid, bready, bresp,Master为目的端,Slave为源端
- 读地址:arvalid, arready, araddr, arprot,Master为源端,Slave为目的端
- 读数据:rvalid, rready, rdata, rresp,Master为目的端,Slave为源端
其中地址和数据可以是32位或64位的。每种信息都有valid信号和ready信号,这是用来在传输过程中实现握手的。AXI-Lite传输信息时,是通过源端和目的端之间的信号握手实现的,传输信息的过程是源端给出valid信号,以及要传输的信息,目的端给出ready信号,两个信号有效的先后不做要求,在valid和ready信号同时有效时,就表示一次信息传输完成。为了避免死锁,AXI规定同一组内的valid信号不能依赖于ready信号来设置有效,ready信号则可以依赖valid信号生成,所有的源端和目的端实现时都遵循这个规定,就不会出现相互等待的死锁情况了。当然,源端收到ready信号之后,表示传输完成,即可以撤销valid信号。
AXI-Lite还规定,源端一旦设置了valid有效,在传输完成之前不得撤销valid信号,所传输的信息也需要维持有效。
使用这五种信息可以实现由Master发起对Slave的读或者写操作:
1、读操作:Master传输读地址到Slave,Slave收到读地址后,可以根据读地址的内容准备数据,在数据准备好时发起传输读数据到Master,这样就通过读地址和读数据两种信息传递完成一次读操作。注意,AXI-Lite协议中,Master传输读地址给Slave之后,必须等待Slave回传读数据信息,然后才能启动下一次读操作。
2 、写操作:Master传输写地址和写数据两种信息到Slave, 两种信息的传输顺序无关,可以一先一后,也可以同时,Slave收到两种信息后,回传写回应信息到Master,即可完成一次写操作。写操作过程中,Master传输写地址和写数据信息之后,必须等待写回应信息,才能开始下一次写操作。
注意,读写操作之间可以交叉进行,中间没有互相等待的关系。
17.2 RISC-V核实现AXI-Lite Master
17.2.1 RISC-V内部的修改
前面实现RISC-V时,外部的总线是比较简单的一种局部总线,写的时候不需要写回应,读的时候也假设读信号有效的下一拍就能够得到返回值,这样的假设并不能与AXI-Lite兼容,因此,要想让RISC-V核支持AXI-Lite总线,得为读写操作增加回应信号。我们修改RISC-V的端口如下:
module riscv_core_v5(
input wClk,
input nwReset,
output wWrite,
output [31:0] bWriteAddr,
output [31:0] bWriteData,
output [3:0] bWriteMask,
input wWriteReady,
output reg wRead,
output reg [31:0] bReadAddr,
input [31:0] bReadData,
input wReadReady,
output reg [4:0] regno,
output reg [3:0] regena,
output reg [31:0] regwrdata,
output reg regwren,
input [31:0] regrddata,
output reg [4:0] regno2,
output reg [3:0] regena2,
output reg [31:0] regwrdata2,
output reg regwren2,
input [31:0] regrddata2
);
其中增加的wWriteReady和wReadReady两个信号就是写操作和读操作的回应信息。RISC-V本身作为Master发起读写操作,等wWriteReady和wReadyReady信号时读写完成。
我们对RISC-V内部的状态机修改如下:
其中红色的状态转移就是等待相应的ready信号,读寄存器,等待读结果和等待读结果2三个状态下等到读ready时才转移到下一个状态,写RAM和 写RAM2则等待写ready信号有效才转移到下一个状态。注意等待过程中valid信号和信息需要维持,因此内部还增加了一些寄存器来存储相关的信息。下面是状态转移相关的代码,其他代码就不给出了。眼尖的应该发现这个版本还增加了watchdog支持,这是通过CSR实现的看门狗,RISC-V启动时将狗粮寄存器设置到半秒,然后软件保证半秒内通喂狗(狗粮寄存器:编号12’hb20,32位,该寄存器每个时钟周期减一,减到零就将pc和state设置到复位值,重新启动运行,按说应该生成nwReset复位信号就更好了,软件必须在减到零之前写该寄存器合适的值)。
//DEFINE_FUNC(riscv_core_gen_state, "state, instr, nwReset")
always @(posedge wClk)
if (~nwReset
`ifdef WATCHDOG
|| (watchdog == 0)
`endif
) begin
state <= `RISCVSTATE_READ_INST;
end else begin
case (state)
`RISCVSTATE_READ_INST: state <= `RISCVSTATE_READ_REGS;
`RISCVSTATE_READ_REGS:
if (wReadReady) begin
state <= `RISCVSTATE_EXEC_INST;
end
`RISCVSTATE_EXEC_INST: begin
if (opcode == 5'h00) begin
state <= `RISCVSTATE_WAIT_LD;
end else if (opcode == 5'h08) begin
state <= `RISCVSTATE_WAIT_ST;
end else if (opcode == 5'h0c && instr[25] && func3[2] && (rs2 != 0)) begin
state <= `RISCVSTATE_WAIT_DIV;
divclk <= 31;
end else if (opcode == 5'h0c && instr[25] && (func3[2]==0) ) begin
state <= `RISCVSTATE_WAIT_MUL;
divclk <= 3;
end else
state <= `RISCVSTATE_READ_REGS;
end
`RISCVSTATE_WAIT_LD: begin
if (wReadReady) begin
if (func3 == 1 && ldaddr[1:0] == 3) begin /* lh */
state <= `RISCVSTATE_WAIT_LD2;
end
else if (func3 == 2 && ldaddr[1:0] != 0) begin /* lw */
state <= `RISCVSTATE_WAIT_LD2;
end
else if (func3 == 5 && ldaddr[1:0] == 3) begin /* lhu */
state <= `RISCVSTATE_WAIT_LD2;
end
else begin
state <= `RISCVSTATE_READ_REGS;
end
end
end
`RISCVSTATE_WAIT_LD2:
if (wReadReady) begin
state <= `RISCVSTATE_READ_REGS;
end
`RISCVSTATE_WAIT_ST: if (wWriteReady) begin
state <= `RISCVSTATE_READ_REGS;
if (opcode == 5'h08) begin
if (func3 == 1 && (lastaddr & 3) == 3) begin /* sh */
state <= `RISCVSTATE_WAIT_ST2;
end
else if (func3 == 2 && (lastaddr & 3) != 0) begin
state <= `RISCVSTATE_WAIT_ST2;
end
end
end
`RISCVSTATE_WAIT_ST2:
if (wWriteReady )
state <= `RISCVSTATE_READ_REGS;
`RISCVSTATE_WAIT_MUL: begin
`ifdef USEMUL32
if (muldone)
state <= `RISCVSTATE_READ_REGS;
`else
if (divclk == 0)
state <= `RISCVSTATE_READ_REGS;
else
divclk <= divclk - 1;
`endif
end
`RISCVSTATE_WAIT_DIV: begin
`ifdef USEDIV32
if (divdone)
state <= `RISCVSTATE_READ_REGS;
`else
if (divclk == 0)
state <= `RISCVSTATE_READ_REGS;
else
divclk <= divclk - 1;
`endif
end
endcase
end
17.2.2 AXI-Lite Master的实现
下面我们在修改后的局部总线基础上增加AXI-Lite Master信号支持。我们的办法是在核的外面包一个模块,将内存也实例化在里边,AXI-Lite总线外部可能通过互联方式扩展,经过外部握手以及转发,Latency是比较大的,会影响取指令或内存读写指令的读写效率,特别是我们没有Cache支持的情况下,这个问题尤其严重,因此我们把内存实例化在里边, CPU与内存之间还是直接用局部总线相连接。这样做还有个好处,将来我们要增加Cache支持的时候,直接将内存读写改为Cache读写即可。这里我们直接上代码好了:
`timescale 1 ns / 1 ps
module riscv_core_with_axi_master (
input wire m00_axi_aclk,
input wire m00_axi_aresetn,
output wire [31 : 0] m00_axi_awaddr,
output wire [2 : 0] m00_axi_awprot,
output wire m00_axi_awvalid,
input wire m00_axi_awready,
output wire [31 : 0] m00_axi_wdata,
output wire [3 : 0] m00_axi_wstrb,
output wire m00_axi_wvalid,
input wire m00_axi_wready,
input wire [1 : 0] m00_axi_bresp,
input wire m00_axi_bvalid,
output wire m00_axi_bready,
output wire [31 : 0] m00_axi_araddr,
output wire [2 : 0] m00_axi_arprot,
output wire m00_axi_arvalid,
input wire m00_axi_arready,
input wire [31 : 0] m00_axi_rdata,
input wire [1 : 0] m00_axi_rresp,
input wire m00_axi_rvalid,
output wire m00_axi_rready
);
reg axi_awvalid; assign m00_axi_awvalid = axi_awvalid;
reg [31:0] axi_awaddr; assign m00_axi_awaddr = axi_awaddr;
assign m00_axi_awprot = 3'b000;
reg axi_wvalid; assign m00_axi_wvalid = axi_wvalid;
reg [31:0] axi_wdata; assign m00_axi_wdata = axi_wdata;
reg [3:0] axi_wstrb; assign m00_axi_wstrb = axi_wstrb;
assign m00_axi_bready = 1'b1;
reg axi_arvalid; assign m00_axi_arvalid = axi_arvalid;
reg [31:0] axi_araddr; assign m00_axi_araddr = axi_araddr;
assign m00_axi_arprot = 3'b001;
assign m00_axi_rready = 1'b1;
wire wWrite, wRead, wReadReady, wWriteReady;
wire [31:0] bWriteAddr, bWriteData, bReadAddr, bReadData, bReadDataRam, bReadDataKey;
wire [3:0] bWriteMask;
wire [4:0] regno;
wire [3:0] regena;
wire [31:0] regwrdata;
wire regwren;
wire [31:0] regrddata;
wire [4:0] regno2;
wire [3:0] regena2;
wire [31:0] regwrdata2;
wire regwren2;
wire [31:0] regrddata2;
reg [31:0] lastreadaddr;
reg lastread;
always @(posedge m00_axi_aclk)
if (~m00_axi_aresetn) begin
lastreadaddr <= 0;
lastread <= 0;
end else begin
lastreadaddr <= bReadAddr;
lastread <= wRead;
end
wire isramaddr = (lastreadaddr & 32'hfff0_0000) == 32'h0000_0000; /* 1MB ram addr */
assign bReadData = isramaddr ? bReadDataRam : m00_axi_rdata;
assign wReadReady = isramaddr ? lastread : m00_axi_rvalid;
wire isramwriteaddr = (bWriteAddr & 32'hfff0_0000) == 32'h0000_0000; /* 1MB ram addr */
wire isramreadaddr = (bReadAddr & 32'hfff0_0000) == 32'h0000_0000; /* 1MB ram addr */
wire [29:0] ramaddr;
assign ramaddr = wWrite?bWriteAddr[31:2]:bReadAddr[31:2];
reg [4:0] lastregno;
reg [4:0] lastregno2;
always @(posedge m00_axi_aclk) begin
lastregno <= regno;
lastregno2 <= regno2;
end
regfile regs(regno, regena, m00_axi_aclk, regwrdata, regwren, regrddata);
regfile regs2(regno2, regena2, m00_axi_aclk, regwrdata2, regwren2, regrddata2);
`define ALTERA
`ifdef ALTERA
ram4kB ram(.clock(m00_axi_aclk), .address(ramaddr), .byteena(~bWriteMask), .data(bWriteData), .wren(isramwriteaddr ? wWrite : 1'b0), .q(bReadDataRam));
`else
ram4KB ram(.clka(m00_axi_aclk), .ena(1'b1), .addra(ramaddr), .wea((isramwriteaddr && wWrite)?(~bWriteMask):4'b0), .dina(bWriteData) , .douta(bReadDataRam));
`endif
riscv_core_v5 core(
m00_axi_aclk,
m00_axi_aresetn,
wWrite,
bWriteAddr,
bWriteData,
bWriteMask,
wWriteReady,
wRead,
bReadAddr,
bReadData,
wReadReady,
regno,
regena,
regwrdata,
regwren,
(lastregno == 0) ? 0 : regrddata,
regno2,
regena2,
regwrdata2,
regwren2,
(lastregno2 == 0) ? 0 : regrddata2
);
//Write Address
wire writeaxi = (wWrite && ~isramwriteaddr);
reg [31:0] awaddr;
reg awvalid;
always @(posedge m00_axi_aclk)
if (~m00_axi_aresetn) begin
awvalid <= 1'b0;
end else if (writeaxi) begin
awaddr <= bWriteAddr;
awvalid <= 1'b1;
end else if (m00_axi_awready) begin
awvalid <= 1'b0;
end
always @(wWrite or awvalid or bWriteAddr or awaddr)
begin
axi_awvalid = writeaxi ? 1'b1 : awvalid;
axi_awaddr = wWrite ? bWriteAddr : awaddr;
end
/* Write Data */
reg [31:0] waddr;
reg [31:0] wdata;
reg [3:0] wstrb;
reg wvalid;
reg write_local;
always @(posedge m00_axi_aclk)
begin
if (~m00_axi_aresetn) begin
wvalid <= 1'b0;
end else if (writeaxi) begin
waddr <= bWriteAddr;
wdata <= bWriteData;
wstrb <= ~bWriteMask;
wvalid <= 1'b1;
end if (m00_axi_wready) begin
wvalid <= 1'b0;
end
if (~m00_axi_aresetn) begin
write_local <= 1'b0;
end else if (wWrite) begin
if (isramwriteaddr) begin
write_local <= 1;
end else begin
write_local <= 0;
end
end
end
reg writeready;
assign wWriteReady = writeready;
always @(posedge m00_axi_aclk)
if (~m00_axi_aresetn)
writeready <= 1'b0;
else if (~writeready)
writeready <= m00_axi_bvalid || write_local || isramwriteaddr;
always @(wWrite or wvalid or bWriteData or wdata or bWriteMask or wstrb)
begin
axi_wvalid = writeaxi ? 1'b1 : wvalid;
axi_wdata = writeaxi ? bWriteData : wdata;
axi_wstrb = writeaxi ? ~bWriteMask : wstrb;
end
wire readaxi = wRead && ~isramreadaddr;
//Read Address
reg [31:0] araddr;
reg arvalid;
always @(posedge m00_axi_aclk)
if (~m00_axi_aresetn) begin
arvalid <= 1'b0;
end else if (readaxi) begin
araddr <= bReadAddr;
arvalid <= 1'b1;
end else if (m00_axi_arready) begin
arvalid <= 1'b0;
end
always @(wRead or arvalid or bReadAddr or araddr)
begin
axi_arvalid = readaxi ? 1'b1 : arvalid;
axi_araddr = wRead ? bReadAddr : araddr;
end
endmodule
我们为内存地址预留了最低的1MB的地址空间,其他的地址都转发到外部的AXI-Lite接口上去。
注意,AXI-Lite信号相关的端口的顺序和名称不要修改,这样xilinx的Vivado软件能够直接识别出这是一个AXI-Lite Master的接口,这样就可以直接在它的Block Design图形设计软件中使用这个模块作为一个部件来使用。
17.3 AXI-Lite Slave的实现
我们把开发板上的led灯和按键的访问封装成一个AXI-Lite Slave模块,这样也可以在Vivado中直接识别出来。这部分代码比较简单,直接看代码好了:
`timescale 1 ns / 1 ps
module led_key
(
input wire s00_axi_aclk,
input wire s00_axi_aresetn,
input wire [3 : 0] s00_axi_awaddr,
input wire [2 : 0] s00_axi_awprot,
input wire s00_axi_awvalid,
output wire s00_axi_awready,
input wire [31 : 0] s00_axi_wdata,
input wire [3 : 0] s00_axi_wstrb,
input wire s00_axi_wvalid,
output wire s00_axi_wready,
output wire [1 : 0] s00_axi_bresp,
output wire s00_axi_bvalid,
input wire s00_axi_bready,
input wire [3 : 0] s00_axi_araddr,
input wire [2 : 0] s00_axi_arprot,
input wire s00_axi_arvalid,
output wire s00_axi_arready,
output wire [31 : 0] s00_axi_rdata,
output wire [1 : 0] s00_axi_rresp,
output wire s00_axi_rvalid,
input wire s00_axi_rready,
input wire [2:0] key,
output wire [3:0] led
);
reg [31:0] count;
reg [31:0] cpucount;
reg [3:0] led_r;
assign led = led_r;
always @(posedge s00_axi_aclk)
if (~s00_axi_aresetn) begin
count <= 0;
led_r <= 4'b1111;
end else begin
led_r[0] <= cpucount[17];
led_r[1] <= cpucount[19];
led_r[2] <= cpucount[21];
if (count >= 25000000) begin
count <= 0;
led_r[3] <= ~led_r[3];
end else begin
count <= count + 1;
end
end
reg [3:0] axi_awaddr_r;
reg axi_awvalid_r;
wire axi_awvalid = s00_axi_awvalid || axi_awvalid_r;
wire [3:0] axi_awaddr = s00_axi_awvalid ? s00_axi_awaddr : axi_awaddr_r;
reg [31:0] axi_wdata_r;
reg [3:0] axi_wstrb_r;
reg axi_wvalid_r;
wire axi_wvalid = s00_axi_wvalid || axi_wvalid_r;
wire [31:0] axi_wdata = s00_axi_wvalid ? s00_axi_wdata : axi_wdata_r;
wire [3:0] axi_wstrb = s00_axi_wvalid ? s00_axi_wstrb : axi_wstrb_r;
assign s00_axi_awready = 1;
assign s00_axi_wready = 1;
always @(posedge s00_axi_aclk)
if (~s00_axi_aresetn) begin
end else begin
if (s00_axi_awvalid) begin
axi_awaddr_r <= s00_axi_awaddr;
end
if (s00_axi_wvalid) begin
axi_wdata_r <= s00_axi_wdata;
axi_wstrb_r <= s00_axi_wstrb;
end
end
always @(posedge s00_axi_aclk)
if (~s00_axi_aresetn) begin
axi_wvalid_r <= 0;
axi_awvalid_r <= 0;
end else begin
HDL4SE:软件工程师学习Verilog语言(十四)