contributed by < E94079029 施丞宥
>
Design a Tensor Processing Unit(TPU) which has 4x4 Processing elements(PEs) that is capable to calculate (4*K)*(K*4)
8-bit integer matrix muplication. (Where is K
is limited by the size of input global buffer)
Project Constraints
- Your designs should be written in verilog language.
- Your PEs shouldn't more than 4x4, where a 2D systolic array architecture is strictly required in this project.
- An 8-bit data length design.
- 3KiBytes in total of global buffer size.
- 藉由遞增的暫存器來對DATA做pipeline的動作,達到systolic array的效果。
- IDLE : 當
start=1
時,會進到BUZY開始做MAC運算。 - BUZY : 每當一次
4*4
的systolic array算完時,會進到OUTP。 - OUTP : 將運算完的結果存進output global buffer。
- DONE : 所有運算都做完後進到DONE表示運算結束。
- Pass atleast test1~3
- Support
(M*K)*(K*N)
- Synthesis
- Area report
- Timing report
- Cell library
- tsmc13_neg