/ICtoAIacclerators

Use google.

Primary LanguageC++MIT LicenseMIT

from ICs to AI Accelerators

Deep learning frameworks are still evolving, making it hard to design custom hardware. Reconfigurable devices such as field-programmable gate arrays (FPGA) make it easier to evolve hardware, frameworks and software alongside each other. In order to design robust hardware, a certain level of knowledge is necessary-- ranging from undergrad elements of CS to embedded systems and definitely Deep Learning . It is becoming harder to find people who understand the full stack from a first principles theory. This is a micro curriculum which will help you understand the system stack starting from the Building Block of ICs to AI Accelerators from a first Principles Perspective.

Note:- This is in NO way complete. I will keep updating this repo

Transistors and Digital Logic

  • Building block of ICs. Learn about Transistors- Follow the Book by Sedra-Smith on Microelectronic Circuits. BJT, FETs, Power Transistors . Some basic Circuit Theory:- Divide it into sub-chapters. IC design wiki.
  • Sequential and Combinational Circuits. Synchronous, Asynchronous, Register Transfer level, Introduction to VLSI design, Verilog and VHDL.

Microprocessors and Microcontrollers

  • 8051
  • 8085, 8086 - Basics, Instruction cycle
  • AVR Family :- Arduino, some basic projects like LCD controller, Servo Motors, Sensors etc.
  • PIC Family:- Some background knowledge.

FPGAs

  1. Talk about how FPGAs are Built - An FPGA from 7400s. The most basic building block of an FPGA is the Cell, or Slice. Talk about Programmable LUTs.
  1. 5 easy steps to Building an Embedded Processor System inside an FPGA Designing an FPGA from Scratch 38 part Tutorial:- Writing a Software device driver and an application program to run on the system. Pick out a suitable development Board.- Designing an FPGA from SCratch
  2. All about FPGAs
  3. How to Get started with FPGA programming ? What is FPGA programming ?
  • Digital Logic Design- Combinational and sequencial Circuits.
  • Verilog/VHDL language
  • Simulation - Modelsim
  • Synthesis and Implementation Xilinx ISE desisn Suite:- Xilinx ISE

ARM Architecture

  • Read about RISC Architectures - RISC Wiki
  • Learn about ARM organisation. ARM core dataflow model. 3 stage and 5 stage pipeline. ARM 7 and ARM 9. Explaining Pipelining in ARM Processors.

RISC-V Architecture

Include material for the risc V architecture as well.

ARM CPU

  • ARM Assembly basics Tutorial Series:- Writing ARM Assembly Learn about the Assembly language, data types and addressing modes. A good reading source would be from Computer Organization and Architecture by William Stallings. 32- Bit ARM and 16 -Bit Thumb instruction set.
  • ARM Assembly Language.
  • ARM Datatypes
  • ARM Addressing Modes
  • ARM Instruction Formats.
  • ARM Processor/ also cores.
    • ARM Cortex M
    • ARM Cortex A

ARM Operating System.

Building an ARM7 CPU, Coding a BootROM, Coding an Assembler.

Compiler Design

System on Chip (SoC)

Needed for System On Chip design for ASICs.

  • SoC wiki
  • SoC Design Methodology , Overview of the SOC Design Process.
  • Canonical SoC Design, System Design Flow, System Architecture, Components of the system, Hardware & Software, Processor Architectures, System Architecture and Complexity. Parameterized Systems-on-a-Chip , System-on-a-chip Peripheral Cores.
  • Overview of SOC external memory, Internal Memory, Size, Cache memory, Cache Organization, Cache data. Types of Cache:- Split Level Caches, Multi Level Cache. SOC Memory System .
  • SoC Notes:- SOC Notes

Peripheral Devices

  • Buffers and latches, Crystal, Reset circuit, Chip select logic circuit, timers and counters.Universal asynchronous receiver, transmitter (UART), Pulse width modulators.
  • Building a UART(Verilog, 100):- An intro chapter to Verilog, copy a real UART, introducing the concept of MMIO. Serial test echo program and led control. Software Serial arduino.cc
  • Implementing a UART in Verilog and Migen
  • UART, Serial Port, RS-232 Interface

Understanding Memory

  • Semiconductor Main Memory :- SRAM , DRAM, Chip Logic. Flash Memory:- NOR, NAND flash Memory, External Memory
  • DDR DRAM :- DDR SDRAM

Understanding AI Accelerators

Jetson Nano Developer's Kit

Basically a Raspberry Pi on Steroids.

Google TPU

CUDA programming.

CUDA provides two APIs (Application Programming Interfaces) for developers: the CUDA driver API and the CUDA runtime API. The CUDA driver API is more fundamental (low-level) and more flexible. The CUDA runtime API is constructed based on the CUDA driver API and is easier to use. We only consider the CUDA runtime API CUDA C++ extends C++ by allowing the programmer to define C++ functions, called kernels, that, when called, are executed N times in parallel by N different CUDA threads, as opposed to only once like regular C++ functions.

A kernel is defined using the __global__ declaration specifier and the number of CUDA threads that execute that kernel for a given kernel call is specified using a new <<<...>>>. A few examples has been provided in cuda programming.