/VSD_Hardware_Design

Pre and Post Synthesis Simulation of a Design VSDMemSOC

Primary LanguageVerilog

VSD Hardware Design Flow

Table of Contents


Day-0 Tool Installations


Environment: Windows-11 and Ubuntu-22.04 Tool: Yosys

Installation Flow:

$ git clone https://github.com/YosysHQ/yosys.git  
$ cd yosys  
$ sudo apt install make (If make is not installed please install it)   
$ sudo apt-get install build-essential clang bison flex \  
    libreadline-dev gawk tcl-dev libffi-dev git \  
    graphviz xdot pkg-config python3 libboost-system-dev \  
    libboost-python-dev libboost-filesystem-dev zlib1g-dev  
$ make   
$ sudo make install  

yosys

Tool: OpenSTA

Installation guide:
(https://github.com/The-OpenROAD-Project/OpenSTA)#installing-with-cmake

Additional Dependency

$ sudo apt-get install swig

Installation Flow

$ git clone https://github.com/The-OpenROAD-Project/OpenSTA.git  
$ cd OpenSTA  
$ mkdir build  
$ cd build  
$ cmake ..  
$ make  
$ sudo make install  

Image:

STA

Tool : ngspice

Installation source: (https://sourceforge.net/projects/ngspice/files/ng-spice-rework/38/ )

Installation guide: (https://github.com/ngspice/ngspice/blob/master/INSTALL) #Install from tarball

After downloading the tarball from https://sourceforge.net/projects/ngspice/files/ to a local directory, unpack it using:

Installation flow

$ tar -zxvf ngspice-37.tar.gz  
$ cd ngspice-37  
$ mkdir release  
$ cd release  
$ ../configure  --with-x --with-readline=yes --disable-debug  
$ make  
$ sudo make install  

Image:
ngspice

Day -1 Introduction to verilog RTL design and Synthesis


Introduction to opensource simulator iverilog

Simulator : It is a tool for checking whether our RTL design meets the required specifications or not. Icarus Verilog is a simulator used for simulation and synthesis of RTL designs written in verilog which is one of the many hardware description languages.

Design : It is the code or set of verilog codes that has the intended functionality to meet the required specifications.

Testbench : Testbench is a setup which is used to apply stimulus (test_vectors) to the design to check it's functionality.It tests whether the design provides the output that matches the specifications.The RTL Design gets instantiated in the testbench.

Screenshot_20221129_063956

iverilog Based Design Flow

1.The iverilog simulator takes RTL design and Testbench as inputs.
2.It produces a VCD file(Value change dump format) as output. Only changes in the input are dumped to changes in the output.
3.We use Gtkwave to see these output changes graphically.

Labs using iverilog and gtkwave

mkdir vsd  
cd vsd  
git clone https://github.com/kunalg123/vsdflow.git  
mkdir vlsi
cd vlsi
git clone https://github.com/kunalg123/sky130RTLDesignAndSynthesisWorkshop.git  
cd sky130RTLDesignAndSynthesisWorkshop  
cd my_lib  
cd lib  
cd ..
cd verilog_model
cd..
cd..
cd verilog_files

Below screenshot shows the above directory structure inside the vsd upto my_lib directories that was set up through the terminal.

directory structure-1

directory structure-2

Below screenshot shows the list of verilog files. Each verilog design file has an assosciated test bench file.

directory structure-3

Since the environment is now set up,we try to simulate a verilog code named good_mux in one of our verilog_files with the help of it's test bench and gtkwave. The steps are mentioned below:

We simulate the RTL design and assosciated test bench.

iverilog good_mux.v tb_good_mux.v

As a result of the above , a. file is created which can be seen in the list of verilog files and we dump the output into a vcd file using

./a.out  

iverilog-output

The following command invokes gtkwave window where in we can see all our outputs.

gtkwave tb_good_mux_vcd

gtkwave mux op

We can also view our Verilog RTL design and testbench code using

gvim tb_good_mux.v -o good_mux.v  

mux code n tb

Introduction to yosys and Logic Synthesis

Synthesizer :It is a tool used for the conversion of an RTL to a netlist.
Netlist: It is a representation of the input design to yosys in terms of standard cells present in the library. Yosys is the Synthesizer tool that we will be using. Diiferent levels of abstraction and synthesis.

Screenshot_20221129_072106

  • read_verilog : It is used to read the design
  • read_liberty : It is used to read the library .lib
  • write_verilog : It is used to write out the netlist

Labs using yosys and sky130 PDKs

commands to synthesise an RTL code(good_mux) are:

yosys  
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib  
read_verilog good_mux.v  
synth -top good_mux  
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib  
show  

yosys-2

Note:ABC is the command that converts our RTL file into a gate .What gate it has linked to, that gate is specified in the library with the path ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib .The logic of good mux will be realised through standard cells present in the library of the mentioned path. The execution of ABC command gives us a report of the number of input and the output signals of our standard cell.

yosys-3

It also specifies the type and number of cells in a synthesis of RTL design

Screenshot_20221129_020509

Screenshot_20221129_020539

Commands to write the netlist

write_verilog -noattr good_mux_netlist.v
!gvim good_mux_netlist.v

Screenshot_20221129_020912


Day-2 Timing Libs, Heirarchial vs Flat Synthesis And Efficient Flop Coding Styles


Introduction to Timing.lib

The library that is said to have a collection of all the standard cells along with their different flavors. We begin by understanding the name of the library. To look into the library,we use the gvim command

gvim ../lib/SKY130_fd_sc_hd__tt_025C_1v80.lib

The following window appears that shows the library file SKY130_fd_sc_hd__tt_025C_1v80.lib

Screenshot_20221201_041101

First line is the name of the library where TT stands for typical. For a design to work three parameters of the library are important :

  • P- Process:Process is the variations due to fabrication..
  • V -voltage: Variations in voltage also impact the behavior of the circuit.
  • T- temperature :Semiconductors are very sensitive to temperature variations too. All this variations determine the performance of a silicon whether it is fast or slow. Thus, our libraries are characterized to model this variations. The voltage process and temperature conditions are also specified.

Switching off the syntax color red

syn off  

Screenshot_20221201_041614

The lib contains different flavors of this same as well as different types of cells.

Screenshot_20221201_044836

Screenshot_20221201_045229

As we see in the above window, The library also represents the different features of the cell like its leakage power,the various input's combinations and the operations between them.

We pick a small gate small gate for better understanding. We see it's behaviour view.

Screenshot_20221201_045806

We can see in the GVIM window above that there are two input for And gate, and thus four possible combinations the leakage power and the logic levels of which are specified. We now perform the comparison between the and gates.

Screenshot_20221201_051054

On comparison we see that the and gate "and2_4" has more area as compared to the and gate "and2_2" which in turn has more area with the and gate "and2_0". It is thus evident that and2_4 employs wider transistors. These are the different flavours of the same and gate. And and2_4 being the widest also has large leakage power values as well as large area. But it will have small delay values as it is faster.

Heirarchical VS Flat Synthesis

While syntheisizing the RTL design in which multiple modules are present, the synthesis can be done in two forms.

heirarchial_flat-1

It has two some moduels. The module 1 is an OR gate ,sub module 2 is AND gate. The sub module called multiple modules instantiates sub module 1 as u1 and sub module 2 as u2. It has three inputs a b c and an output y.

heirarchial_flat-2

The report has inferred submodule1 having one AND gate ,submodule2 to having one OR gate and multiple module having two cells . Now we link this design to the library using abc command. To show the graphical version ,we use the command.

show multiple_modules  

heirarchial_flat-3

Instead of or and and gates it shows the instances u1 and u2 while preserving the hierarchy. This is called the hierarchical design.

Instead of or, and the circuit is implemented using nand and inverter gates. We always prefer stacked NMOS's(nand gates)to stacked the PMOS's(nor cascaded with inverter for or). Because pmos has a very poor mobility and therefore they have to be made quite wide to obtain a good logical effort. When we use flatten to generate a flat netlist. Here there are no instances of U1 and U2 and hierarchy is not present.

heirarchial_flat-4

Sub-Module Level Synthesis And Necessity

Need for sub-module synthesis

*. Module level synthesis is preferred when we have multiple instances of the same module. Let's assume a top module having multiple instances of the same unit gates(say, a multiplier) .Rather than synthesizing multiplier multiple times as mult1,mult2,mult3, It's better to synthesise it once and replicate it multiple Times. *. Divide and conquer approach Let's assume our RTL design is very very massive and we are giving it to a tool which is not doing a good job. Instead of giving one massive design to the tool, we give portions by portions to the tool so that it provides an optimised netlist and we can stitch all these net lists at at the top level.

Hence we control the model that we are synthesizing using the keywords

synth -top "sub-module name"

In the following example, I am going to synthesize at sub_module 1 level although I have read the RTL file at higher module level multi_modules.v

read_verilog multiple_modules.v  
synth -top sub_module1

sub-module synthesis

In the synthesis report,it inferring only 1 AND gate.

Asynchronous And Synchronous Resets

Asynchronous reset: this reset signal does not wait for a clock .The moment asynchronous reset signal is received output queue becomes 0 irrespective of the clock.

Asynchronous set: this set signal does not wait for a clock. The moment asynchronous reset is signal received output queue becomes 1 irrespective of the clock.

GTKWAVE RTL Simulation and Observations :

sync_res

async_res

async_set

Asynchronous reset Synthesis Implementation results

async_res_implementaion

Optimisations

Let's Consider the following two cases for designs where 1.3 bit input is multiplied by 2 and the output is a 4 bit value. 2.3 bit input is multiplied by 9 and the output is a 6 bit value.

RTL codes for both the codes can be found below.

multiplication RTL codes

The output y[3:0] is the input a[2:0] appended with a 0 at the LSB. or, we can say that y = aX2 = {a,0} .

On synthesizing the netlist and look at its graphical realisation , we will see the same optimisation occuring in the netlist.There is no hardware required fot it.

optimisation-1

The output y[5:0] is equal to the input a[2:0] appended with itself

optimisation-2

Day-3 : Combinational and Sequential optimisations


Introduction to logic optimisations

Inorder to produce a digital circuit design which is optimised interms of area and power, the simulator performs many types of optimisations on the combinational and sequential circuits.

  1. Combinational Optimisation Methods
  • Squeezing the logic to get the optimised design
    • Area and Power savings
  • Constant Propogation
    • Direct Optimisation
  • Boolean Logic Optimisation
    • K-map
    • Quine-mckluskey Algorithm
  1. Sequential optimisation methods
  • Basic
    • Sequential Constant Propogation
  • Advanced
    • State Optimisation
      • Retiming
      • Sequential Logic Cloning

Combinational Logic Optimisations

We will try to understand each of the above mentioned combinational optimisations through different RTL code examples. We also check the synthesis implementation through yosys to understand how the optimisations take place. All the optimisation examples are in files opt_check.v,opt_check2.v,opt_check_3.v opt_check4.v,counter.v and multiple_modules_opt.v. All of these files are present under the verilog_files directory.

Below image shows the RTL code for some of the examples mentioned above

opt_0_1_2_3_4

IN the code for opt_check, ideally the above ternary operator should give us a mux. But the constant 0 propagates further in the logic .Using boolean simplification we obtain y = ab.

Synthesizing this in yosys :

Before realising the netlist, we must issue a command to yosys to perform optimisations. It removes all unused cells and wires to prduce optimised digital circuit.This can be done using the opt_clean -purge command as shown below.

and_statistics

Observation : After executing synth -top opt_check ,we see in the report that 1 AND gate has been inferred.

Next,

abc -liberty ../lib/sky130_fd_sc_hd_tt_025C_1v80.lib  
write_verilog -noattr opt_check_netlist.v  
show  

On viewing the graphical synthesis realisation , we can see the Yosys has synthesized an AND gate as expected.
and_gate

For opt_check2 yosys synthesis results in an or gate. or_gate

For the RTL verilog code of opt_check3.v , we expect the output to be a 3 input AND gate based on constant propagation and boolean logic optimisation.The output y can be simplified to y = abc. Next we generate the netlist and observe its graphical representation after synthesis
and_3input Yosys synthesizes a 3 input AND gate as expected because of optimisations.

For opt_check4.v,the boolean logic optimisation simplifies the output to a single xnor gate i.e. y = a xnor c. Next we generate the netlist and observe its graphical representation after synthesis.
xnor Yosys synthesizes a 3 input XNOR gate as expected because of optimisations.

In the image below we can find the codes for multiple_module_opt.v and multiple_module_opt2.v

While synthesizing this in yosys we use flatten before opt_clean -purge. The multiple_module_opt instantiates both submodule1 and 2. We must use Flat Synthesis here otherwise the optimisations will not be performed on the sub module level.

multiple_module_opt

For multiple_module_opt2.v on boolean optimisation, we obtain y=1 simply. It's synthesis yields

multiple_module_opt2

Sequential Logic Optimisations

We will try to understand each of the sequential optimisations through different RTL code examples. For each example, We also check the synthesis implementation through yosys to understand how the optimisations take place. All the optimisation examples are in files dff_coonst1.v,dff_const2.v,dff_const3.v,dffconst4.v and dff_const5.v,counter_opt.v and counter_opt2.v. All of these files are under the verilog_files directory.

dff_1_2

dff_3_4_5

In the code dff_const1.v, it appears that the output Q should be equal to an inverted reset or Q=!reset. However, as the reset is synchronous,even if the flop has D pinned to logic 1,when reset becomes 0, Q does not immediately goto 1. It waits untill the positive edge of the next clock cycle. This is observed by simulating the design in verilog, and viewing the VCD with GTKWave as follows dff_const1_gtk

Observation : In the gtk waveform above , when reset becomes 0, Q becomes 1 at the next clock edge. Since Q can be either 1 or 0,we do not get a sequential constant, and no optimisations should be possible here. We verify it using Yosys synthesis and optimisation. While synthesis,We use

difflibmap -liberty ../lib/sky130_fd_sc_hd_tt_025C_1v80.lib

dfflibmap is a switch that tells the synthesizer about the library to pick sequential circuits( mainly Dff's and latches) from.

We then generate the netlist

abc -liberty ../lib/sky130_fd_sc_hd_tt_025C_1v80.lib  
write_verilog -noattr dff_const1_netlist.v  
show  

dff_const1_block

As expected, No optimisation is performed in th yosys implementation during synthesis.

In dff_const2.v, regardless of the inputs, the output q always remains constant at 1 . This is observed by simulating the design in verilog, and viewing the VCD with GTKWave as follows
dff_const2_gtk

Since the output is always constant ie Q=1, it can easily be optimised during synthesis.

upload at line 343

In dff_cosnt3.v ,when reset goes from 1 to 0,Q1 follows D at the next positive clock edge in an ideal ckt. But in reality, Q1 becomes 1 a little after the next positive clk edge(once reset has been made 0)due to Clock-to-Q delay. Thus, q takes the value 0 until the next clock edge when it read an input of 1 from q1. This is confirmed with the simulated waveform below.
dff_const3_gtk

Since Q takes both logic 0 and 1 values in different clock cycles. It is wrong to say that Q=!(reset) or Q=Q1 Hence, both the flip-flops are retained and no optimisations are performed on this design. We can confirm this using Yosys as shown below. dff_const3_block

Both the D flip-flops are present in the synthesized netlist.

In dff_const4.v, regardless of the input whether reset or not , Q1 is always going to be constant i.e. Q1=1 . As q can only be 1 or q1 depending on the reset input, but q1 = 1 .Thus q is also constant at the value 1. We can confirm this with the simulated waveforms as shown below. dff_const4_gtk

As the output is always constant, it can easily be optimised using Yosys as shown in the graphical representation.
dff_4_block

In the image below, codes of both counter_opt.v and counter_opt2.v are present. counter

counter_stats

Synthesised ouput of counter_opt2.v counter_synthesis

Day-4 Gate level simulations, Non blocking and blocking assignments, Synthesis-Simulation mismatch


Introduction to gate level simulations

We validate our RTL design by providing stimulus to the testbench and check whether it meets our specifications earlier we were running the test bench with the RTL code as our design under test . But now under GLS ,we apply netlist to the testbench as design under test . What We did at the behavioral level in the RTL code got transformed to the net list in terms of the standard cells present in the library. So,netlist is logically same as the RTL code. They both have the same inputs and outputs so the netlist should seamlessly fit in the place of the RTL code. We put the netlist in place of the RTL file and run the simulation with the test bench. When we do simulation in with the help of RTL code there is no concept of timing analysis such as the hold and setup time which are critical for a circuit. For meeting this setup and hold time criteria there are different flavours of cell in the library.

Screenshot_20221207_063403

In GLS using iverilog flow, the design is a netlist which is given to Iverilog simulator in terms of standard cells present in the library. The library has different flavours of the same type of cell available.To make the simulator understand the specification of the different annotations of the cell the GATE level verilog models is also given as an input. If the GATE level models are timing aware (delay annotated ),then we can use the GLS for timing validation as well.

The reason for the functional validation of netlist eventhough the netlist is the true representation of the RTL code is "Synthesis-Simulation mismatch."

Synthesis Simulaiton mismatches

  • Missing sensitivity list
  • Blocking and Non-Blocking statements

Missing sensitivity list Simulator functions on the basis of input change. If there is no change in the inputs the simulator won't evaluate the output at all.

Below image shows the codes of multiplexers in different ways. The codes are for ternary_operator_mux.v,bad_mux.v and good_mux.v

Screenshot_20221204_065205

The problem in the bad_mux code is that simulation happens only when the select is high so if select is slow and there are changes in i0 or i1 they get completely missed. So for the simulator this marks as good as a latch but the synthesizer does not look at the sensitivity list it checks at the functionality and creates a mux.

Simulation infers a latch and Synthesis results in a mux Hence,Mismatch.

Blocking and Non-blocking statements inside an always block.

  • Blocking Executes the statements in the order it is written So the first statement is evaluated before the second statement.
  • Non Blocking Executes all the RHS when always block is entered and assigns to LHS. Parallel evaluation.

Blocking assignments:

Screenshot_20221207_063659

In this case, D is assigned to Qo which is then assigned to Q. Due to optimisation a single latch is formed where Q is equal to D.

Non-blocking assignments:

begin
q0<=d;
q1<=q0;
end

In the non blocking assignments all the RHS are evaluated and parallel assigned to lhs irrespective of the order in which they appear. So we will always get a two flop shift register.

Therefore we always use non blocking statements for writing sequential circuits.

Example:

Screenshot_20221207_063825

We enter into the loop whenever any of the inputs a b or C changes but Y is assigned with old Qo value since it is using the value of the previous Tclk ,the simulator mimics a delay or a flop. Where as, during synthesis we see the the OR and AND gates as expected.

Therefore ,while using blocking statements in this case,we should evaluate Q0 first and then Y so that Y takes on the updated values of Qo. Although both the circuits on synthesis give the same digital circuit comprising of AND, OR gates. But on simulation we get different behaviours.

Labs on GLS and Synthesis-Simulation mismatch

Screenshot_20221204_065205

Simulation of the ternary_operator_mux.v using design as UUT is below.

Screenshot_20221204_065752

Synthesis of the ternary_operator_mux.v is

Screenshot_20221204_070633

To invoke GLS,

  • We need to read our netlist file and the test bench file assosciated with it.
  • We need to read 2 extra files that contain the description of verilog models in the netlist.

iverilog ../my_lib/verilog_model/primitives.v ../my_lib/verilog_model/sky130_fd_sc_hd.v ternary_operator_mux_net.v tb_ternary_operator_mux.v

Screenshot_20221204_071302

To see the waveform of RTL simulation using netlist as UUT ,we execute the following commands further

./a.out
gtkwave tb_ternary_operator_mux.v

Screenshot_20221204_072115

RTL codes of bad_mux.v and good_mux.v

Screenshot_20221205_064525

For bad_mux.v the GTK simulation using the testbench and the design gives the following result.

Screenshot_20221205_064707

The design simulates a latch rather than a 2x1 mux.But the Yosys implementation shows a 2X1 mux . If we now implement it's GATE level netlist through GLS and observe the waveform,it shows the behaviour of a 2X1 mux as shown below:

Screenshot_20221205_065245

Since,the waveforms of stimulated RTL Code : Is of a LATCH the waveforms of gate level netlist thruogh GLS after synthesis: Is of 2X1 MUX We see a Synthesis-Simulation Mismatch.

Example of synthesis-simulation mismatch due to wrong order of assignment in blocking assignments.

Screenshot_20221205_071646

We enter into the loop whenever any of the inputs a b or C changes but D is assigned with old X value since it is using the value of the previous Tclk ,the simulator mimics a delay or a flop. Where as, during synthesis we see the the OR and AND gates as expected.

Screenshot_20221205_071605

At the instance where both the inputs a and b are 1. a | b should output 1, which when ANDed with c, should give an output y of 1. The output d thus should hold the value 1. Instead,it holds the value 0 . But due to the blocking statements in the rtl code, it actually holds a the value of a OR b from the previous clock, hence giving us an incorrect output.

The netlist representation on synthesis yields

Screenshot_20221205_070722

The synthesizer does not see the sensitivity list rather the functionality of the RTL design.Hence,the netlist representation does not include any latches to hold delayed values pertaining to the previous cycle. It only includes an OR 2 AND gate.

If we run gate level simulations on this netlist in verilog, we observe the following waveform.

Screenshot_20221205_071419

Here , we observe that the circuit behaves as intended combinational ckt. Output d results from the present value of inputs, and not the previous clock values like in the simulation results. Since the waveforms of the stimulated RTL verilog code do not match with the gate level simulation of generated netlist,we get a Synthesis-Simulation Mismatch again.

Day-5 if,case,for loop and for generate


if construct

If condition is used to to write priority logic. The condition one has a priority or if has more priority than the consecutive else statements . Only when condition 1 is not met condition 2 is evaluated and so on and y is assigned accordingly depending on the matching conditions. So,If-Else code translates to a ladder like multiplexer structure in the final design instead of a single multiplexer.

Incomplete if statements infers a latch.

if(condition1)
y=a;
else if(condition 2)
y=b;

In the above code if condition 1 is matched y is equal to a else if condition 2 is matched y is equal to b but there is no specification for the case when condition2 is not matched, as a result of which the simulator tries to latch this case to the output y.It wants to retain the value of y. This is a combinational loop to avoid that the simulator infers a latch. Enable of this latch is OR of the condition 1 and condition 2. If neither condition 1 or condition 2 is met the OR gate output disables the latch . The latch retains the value of y and stores it. This is called the inferred latch due to incomplete if statements which is very dangerous for RTL designing. It should be avoided except for some special cases like the counter.

reg [2:0] count;
always@(posedge clk)
begin
if(reset)
count <=3'b000;
else if(enable)
count <= count+1;
end

This is also a case of incomplete if statements. Here ,if there is no enable the counter should latch onto the previous value.For example if the counter has counted up till 4 and there is no enable then it should retain the value 4 rather than going to 0 again. So here the incomplete if statements result in latching And retaining the previous value which is our desired behavior in a counter. The earlier mux example was a combinational circuit and therefore we cannot have inferred latches.

Note: If, case statements are used inside always block. In verilog whatever variable we use to assign in if or case statements must be a register variable.

case construct

always@(*)
begin
case(sel)
    2'b00: y= statement1;
    2'b01: y= statement2;
    2'b10: y= statement3;
    2'b11: y= statement4;
endcase
end

The case statements do not infer priority logic like IF statements. Depending upon the case matching the y is assigned accordingly.

Some caveats with using CASE statements:

  • Incomplete case
  • Partial assignments
  • Overlapping cases

Incomplete Cases

reg [1:0] sel;
always@(*)
begin
    case(sel)
    2'b00: condition 1;
    2'b01: condition 2;
   endcase
end

If select is 10 or 11 the conditions are not specified. It causes an incomplete case which results in inferred latches for these two cases that latch on to output y.This occurs when some cases are not specified inside the CASE block .For example, if the 2'b10 and 2'b11 cases were not mentioned , the tool would synthesize inferred latches at the 3rd and 4th inputs of the multiplexer. Solution is to code the case block with default inside the CASE block so that the tool knows what to do when a case that is not specified occurs.

reg [1:0] sel;
always@(*)
begin
    case(sel)
    2'b00: condition 1;
    2'b01: condition 2;
    default:condition 3;
   endcase
end

Partial Assignments

reg [1:0] sel;
always@(*)
begin
    case(sel)
    2'b00: begin
            x = a;
            y = b;
            end
    2'b01: begin 
            x = c;
            end
    default: begin 
            x = d;
            y = d;
            end
   endcase
end

In the above example, we have 2 outputs x and y. This will create two 4X1 multiplexers with the respective outputs. If we look at case 2'b01, we have specified the value of x for this case ,but not the value of y. It appears that it is okay to do so, as a default case is specified for both the outputs, and if we don't directly specify the value of y for any case, the simulator will implement the default case. This, however , is incorrect. In partial assignments such as this, the simulator will infer a latch at the 2nd input for multiplexer y as no value is specified for a particular case.

Overlapping case

reg [1:0] sel;
always@(*)
begin
    case(sel)
    2'b00: begin
            x = a;
            end
    2'b01: begin 
            x = b;
            end
     2'b10: begin 
            x = c;
            end
     2'b1?: begin 
            x = d;
            end
   endcase
end

In the above code block ,2'b1? specifies that the corresponding bit can be either be 0 or 1. This means when the sel input is holding a value 3 i.e 2'b11, cases 3 and 4 both hold true. What is synthesized depends on the mercy of the simulator. It can lead to Synthesis-Simulation mismatches. If we used an IF condition here, due to priority logic, condition 4 would be ignored when condition 3 is met. However,in the CASE statement , even if the upper case is matched,all the cases are checked.So,if there is overlapping in cases,it poses a problem as the cases are not mutually exclusive. And we would get an unpredictable output.

Labs on incorrect IF and case constructs

Below are the files titled incomp_if.v, incomp_if2.v and can be found in the directory verilog_files.

Screenshot_20221205_102737

The code incomp_if.v contains an incomplete IF statement as no else condition corresponding to it is mentioned . On simulating this design , following gtkwave is obtained

Screenshot_20221205_102942

From the above waveform, We observe no change in y when i0=0.It's equal to previous value when io=0. This shows latching Action, which is verified by looking at the synthesis implementation using Ysosys. A D-latch is created in the synthesised netlist.

Screenshot_20221205_104237

The code in incomp_if2.v contains an incomplete IF statement as well. Here, we have 2 inputs i1 and i3, as well as 2 conditional inputs i0 and i2. As we do not specifythe case when both i0 and i2 go low,which results in an issue in the synthesis. The gtkwaveform of the simulated design is below

Screenshot_20221205_105050

Observation: When io is high,output follows i1. When io is low,it looks for i2.If i2 is high,it follows i3. But if i2 is low(and io is already low),y attains a constant value that is previous output.

This can be verified by checking the graphical realisation of the yosys synthesis below.

Screenshot_20221205_105342

Yosys synthesizes a multiplexer as well as a latch with some combinational logic at its enable pin.

Below are the codes for incomp_case.v, comp_case.v

Screenshot_20221205_011958

Whenever se[1]=1 ,latching action takes place. The yosys synthesis implementation is given below.

Screenshot_20221205_011918

Observation: 1. (sel[1]) is going to D latch enable. 2.The inputs io,sel[0], !(sel[1]) go to the upper mixing logic that is implemented on D pin of the latch.

In comp_case.v code Output follows i2 at default case,if i1 and io go low. Hence a 4X1 mux is synthesized without any latch that can be verified below.

Screenshot_20221205_012308

Partial assignments

Screenshot_20221205_012558

The 2X1 mux with output y is inferred without any latch. The second output x will infer a latch. Below image shows the statistics of the gates and latches it infers

Screenshot_20221205_013451

4:1 mux with overlapping case:

Screenshot_20221205_015430

In gtkwaveform of RTL simulation:

Screenshot_20221205_015849

Observation : When sel[1:0]=11, the output neither follows i2 nor i3. It simply latches to 1.

Whereas while running GLS on the netlist,the waveform of the synthesized netlist behaves as 4X1 mux as shown below

Screenshot_20221205_020910

Thus ,Overlapping cases confuse the simulator and leads to Synthesis-Simulation Mismatches.

Introduction to Looping constructs

There are two types of FOR loops in verilog.

  • For loop used in always block used to evaluate expressions.
  • Generate for loop only used outside the always block, used for instantiating hardware.

For loops are extremely useful when we want to write a code /design that involves multiple assignments or evaluations within the always block. Lets us take an example, If we want to write the code for 4:1 multiplexer, we can easily do so using a either four if blocks or using a case block with 4 cases,as seen in the previous if-else blocks.But this approach is not suitable for complicated design with numerous inputs/outputs say 256X1 mux.If we wanted to design a 256X1 multiplexer, we will have to write 256 lines of condition statements using select and corresponding assignments. But in for loop ,be it 4X1 or 256X1 we would always be writing 4 lines of code only. Although we need to provide 256 inputs using an internal bus.

integer k;
always@(*)
begin
for(k=0;k<256;k=k+1)
begin
if(k==sel)
y=in[k];
end
end

This code can be infinitely scaled up by just replacing the condition k < 256 with the desired specification for our multiplexer.

Similarly, we can create High input demultiplexers as well.

integer k;
always@(*)
begin
int_bus[15:0] = 16'b0;
for(k=0;k<16;k=k+1)
begin
if(k==sel)
int_bus[k]=inp[k];
end
end

Here , we have created a 16:1 demultiplexer using for loops within the always block. The int_bus[15:0] specifies our internal bus which takes on the input of the demux. It is necessary to assign all outputs to low for a new value of sel else latches will be inferred resulting in the incorrect implementation of our logic.

mux_generate.v that generates a 4X1 mux using For loop.

Screenshot_20221205_034433

The gtkwave obtained after the simulation

Screenshot_20221205_033829

demux_generate.v that generates a 4X1 demux using For loop.

Screenshot_20221205_035547

The above code has good readabilty,scalability and easy to write as well. Let's verify if it functions as a 8X1 demux as expected by viewing its gtkwave simulated waveform.

Screenshot_20221205_041550

for generate

FOR Generate is used when we needto create multiple instances of the same hardware. We must use the For generate outside the always block.

We take example of a 8 bit Ripple Carry Adder(RCA) to understand the ease of instantiations provided by the For generate statement. An RCA consists of Full Adders tied in series where the carry out of the previous full adder is fed as the carry in bit of the next full adder in the chain. Hence, we can make use of generate for to instantiate every full adder in the design , as they are all represent the same hardware.

For this example , we use the file rcs.v which holds the code for the ripple carry adder. It also needs to be included in our simulation. Here, fa references another verilog design file containing the definition of all the full adder submodules .This is shown below, from the fa.v file

Screenshot_20221205_051042

In the RCA verilog code, we instantiate fa in a loop using generate for outside the always block.

Now, let us simulate this design in verilog and view its waveform with GKTWave .As the rca design referances the file fa.v , we must specify it in our commands as follows

iverilog fa.v rca.v tb_rca.v
./a.out
gtkwave tb_rca.v

the resulting gtkwaveform is shown below that shows an adder being simulated:

Screenshot_20221205_052051

Day-7 to 9 Timing and constraints fundamentals


what are constraints?

A RTL code can be synthesized in multiple ways using the standard cells present. A constraint file guides the synthesizer to select the appropriate library cells to meet the timing and performance requirements

what is SDC?

SDC stands for Synopsys Design Constraints which has become an industry standard for various design tools to specify the design constraints to enable appropriate optimisation suitable for acheiving the best implementation during synthesis.

During synthesis a constraint file is provided along with the RTL and .lib to help the synthesis tool decide what flavour of the cell to use to optimize the design for performance and area.

Screenshot_20230209_111441

Static Timing Analysis

  • Setup Time Requirement(Max Delay)

Time required to meet the setup time of the clock.

Screenshot_20230209_124122

Let say the design runs at 200MHZ i.e; Tclk = 5ns so the maximum combinational delay must be TCOMBI < 5-TCQA-Tsetup_B.

  • Hold Time Requirement(Min Delay)

Time required to meet the hold time requirement of the flop.

Screenshot_20230209_124109

This defines the constraints given by the HOLD window and this occurs usually when we delay the clock (with delay circuits in red) so we can meet a fixed COMBI delay with a slower clock (e.g TCOMBI = 8ns , Tclk=5ns) THOLD_B+ TPUSH < TCQ_A+TCOMBI ; TPUSH is the time inserted by the delay circuits.

Screenshot_20230209_034157

Parameters affecting the delay

  • Higher inflow of current(input transition) corresponds to lower delay.
  • Higher load capacitance(output load) higher the delay.

Timing Arcs

For a COMBINATIONAL CELL, delay information from every input pin to output pin which it can control is present in timing arc.

For a SEQUENTIAL CELL

  • Delay from clk to Q for DFlop
  • Delay from clk to Q or D to Q for Dlatch
  • Setup and Hold times

Screenshot_20230105_050118

setup and hold times are calculated around the sampling points.

For a positive level latch setup and hold times are calculated at the negedge of the clock. Similarly for a negative level latch setup and hold times are calculated at the positive edge of the clock( positive edge is where the data is sampled).

Screenshot_20230209_050759

Understanding Timing paths and IO Modelling

There are different paths in a circuit which determine the critical path of a circuit. The critical path is the path with the highest delay (slowest path) and that determines the operating frequency of the circuit.

Starting points for the paths are:

  • Input Ports
  • Clk pins of Registers

End points of a timing path are:

  • Output ports
  • D pin of DFF/DLAT

This gives us 3 types of paths:

  • Reg2Reg Timing path

    • Clk pin to D pin
    • Constrained by the clock
  • IO Timing Path

    • Clk to Output or Input to D
    • Reg2Out path constrained by external delay, output load and clock period.
    • In2Reg path constrained by input external delay, Input transition and clock period.
    • Modeling the above 2 paths is referred to as IO delay Modeling and has to be constrained for both max and min delay.
  • IO timing Path - Input to Output ports.

Screenshot_20230210_033259

Based on the clock period the synthesizer decides the maximum possible combinational delay that meets the STA requirement.

In a common design usually the working frequency (Tclk) will be fixed to achieve a certain performance so the components (TCOMBI) will need to be optimized. The clock period will limit the delays in Reg2Reg Paths - so the synth tools will need to select proper technology cells from the library (.lib file contain TCQ, TSETUP/HOLD, TCOMBI_cell) to meet the clock period.

  • Input/output External Delay : Because for the external circuit elements we do not have control and we get also other influences like routing - we need to define a timing margin that will decrease our available timing for the "input/output circuit" . The Input/output External Delay is defined usually by standards (e.g. SPI, I2C etc.) or IO budgeting and this is given by the designer of the external circuitry organized usually in a module or IP.
  • Input transition Input transition/Output load are the information of the real behavior of the input/output logic signal due to parasitic elements:

The above parameters are modeled inside the library based on the technology behavior.

Understanding contents in .lib file:

Screenshot_20221201_041614

  • Technology: CMOS

  • default_max_transition: 1.500 - it is the max capacitance in defined unit (usually pF) allowed for the load of a gate

  • default_operating_conditions : tt_025c_1v80 process, temperature and voltage respectively.

  • delay_model : table_lookup - it is a table format for 2 parameters and during the simulation the tool will use it to get interpolated values for each specific case.

  • Timing sense: For each cell we get whether it is positive unate or negative unate or non-unate. Depending upon this unnateness, DC tool will know what transition at the input to propogate what transition at output.

    For AND gate

    • timing_sense : positive unate
    • timing_type : combinational

    For sequential register

    • timing_sense : non-unate
    • timing_type : falling edge

Screenshot_20230210_040326

Clock Tree Modelling

Due to the practical floorplaning and clock tree synthesis of the physical design there exist routing delays and not all registers receive the clock at the same time.

Jitter refers to the inherent variations that exist in clock sources due to stochastic effects.

The clock edge does not have a non-zero rise time and arrives in a small window period. ie Tclk ± Delta

Hence for setup time analysis our equation becomes Tclk-Tjitter> TCQ_A+TCOMBI+TSETUP_B


Clock Skew

Difference in clock periods due to generated paths during CTS refers to as clock skew and can result in timing failures post clock tree synthesis. Tclk-Tskew> TCQ_A+TCOMBI+TSETUP_B


Factors for clock modelling

  • Period
  • Source Latency: Time taken by clock source to generate clock
  • Clock Network Latency: Time taken by clock distribution network
  • Clock skew: clock path delay mismatches which causes difference in the arrial of clock.
  • Jitter:
    • Duty cycle jitter
    • Period jitter

These factors (Skew & Clock Network Latency) have to be accounted for before CTS and Post CTS the only uncertainty in clock is due to Jitter.

Writing Synopsys Design Constraints(SDC)

Useful DC commands:

Querying command: get_*

Querying ports

  • get_ports clk
  • get_ports *clk* Returns collection of ports whose name contains clk
  • get_ports * Returns all ports of the design
  • get_ports * - filter "direction == in" Return all ports with satisfites the filter condition direction==in
  • get_ports * - filter "direction == out" Return all output ports

Querying clocks

  • get_clocks * Returns all clocks in the design
  • get_clocks *clk* Returns all clocks with name clk in it
  • get_clocks * -filter "period>10" Returns all clocks with period greater than 10
  • get_clocks my_clk Lists all parameters of the clock.
  • get_attribute [get_clock my_clk] period Returns the period of clock
  • get_attribute [get_clock my_clk] is_generated Report if the clock has is_generated attribute

Screenshot 2023-03-03 182622


Creating Clocks

create_clock -name <clk_name> -per <time> [get_port <port_name>]clock is defined to a specific port
Example: create_clock -name MY_CLK -per 5 [get_port clk]
set_clock_latency -time [get_clocks MY_CLK]

Screenshot 2023-03-03 184241

Screenshot 2023-03-03 184341


Input IO Modelling

>set_input_delay -max <time> -clock [get_clocks <clk_name>] [get_ports <port_name>] Eg:set_input_delay -max 5 -clock [get_clocks MY_CLK] [get_ports IN_*]     
>set_input_delay -min <time> -clock [get_clocks <clk_name>] [get_ports <port_name>] 
>set_input_transition -max <time> [get_ports <port_name>] Eg: set_input_transition -max 5 [get_ports IN_*]
>set_input_transition -min <time> [get_ports <port_name>]   

output IO Modelling

>set_output_delay -max <time> -clock [get_clocks <clk_name>] [get_ports <port_name>] Eg:set_output_delay -max 5 -clock [get_clocks MY_CLK] [get_ports OUT_Y]     
>set_output_delay -min <time> -clock [get_clocks <clk_name>] [get_ports <port_name>] 
>set_output_load -max <cap_unit> [get_ports <port_name>] Eg: set_output_load -max 5 [get_ports OUT_Y]
>set_output_load -min <cap_unit> [get_ports <port_name>]   

Pure combinational logic from input to output can be constrained using set_max_latency and virtual clock

Day 10-15 Importance of MOSFETS in STA/EDA


Fundamentals of N-mos and P-mos

N-MOS

  • P type substrate, n+ Diffusion Regions
  • Isolation region(SiO2), PolySi or Metal Gate
  • 4-Terminal element, Gate,Source,Drain,Body
  • Threshold Voltage
    • Threshold voltage is defined as the gate voltage at which significant current starts to flow from source to drain.
    • Vgs=0,Vs=Vd=Vb=0, Substrate-source(B-S) and Substrate-Drain(B-D) form pn-junction diodes.Both junctions are off due to OV bias. Souce to Drain resistance is high.
    • Increasing Vgs starting with applying a small gate potential at gate terminal will repel postive charges in P-substrate forming a depletion region.
    • As we increase the Vgs, depletion region width increases gradually. At one point the surface inverts into a n-type material.This phenomenon is called strong inversion.
    • Vgs Voltage at which the inversion happens, it is referred as Threshold voltage(Vt).
    • Further increase in Vgs won't change the width of the depletion region,instead it attracts more electrons from adjacent n+ regions, leading to an increase in channel width,forming a continuous channel from source to drain.
    • Continuous n-channel formation from S-D, whose conductivity is modulated by 'Vgs'
    • Add Vsb voltage, addition potential is required for strong inversion.
    • Vto is the threshold voltage at Vsb=0, a function of manufactuting process.

Screenshot 2023-03-07 184738

Spice Simulation

SPICE file:day1_nfet_idvds_L1p2_W1p8.spice

*Model Description
.param temp=27


*Including sky130 library files
.lib "sky130_fd_pr/models/sky130.lib.spice" tt


*Netlist Description



XM1 Vdd n1 0 0 sky130_fd_pr__nfet_01v8 w=5 l=2

R1 n1 in 55

Vdd vdd 0 1.8V
Vin in 0 1.8V

*simulation commands

.op
.dc Vdd 0 1.8 0.1 Vin 0 1.8 0.2

.control

run
display
setplot dc1
.endc

.end

SPICE NMOS Id-Vds Graph

Screenshot 2023-03-09 143031

Basics of N-mos Drain current(id) and drain to source voltage(Vds)

  • Resistive Operation

    • At Vgs>Vt condition with small Vds
    • Induced charge Qi is proportional to (Vgs-Vt)
    • Induced charge at any point 'x' in the channel Qi(x) \propto [(Vgs-Vx)-Vt]
    • Currents in this mode of operation
      • Drift current, due to potential difference
      • Diffusion current, due to difference in carrier concentration
    • Id = Kn' (W/L) ((Vgs-Vt)*Vds-(Vds**2)/2) = Kn*((Vgs-Vt)*Vds-(Vds**2)/2)
      • Kn' = Transconductance parameter
      • Kn = Kn'(W/L)
    • when (Vgs-Vt)>>Vd, Id ~= Kn *(Vgs-Vt)Vds , linear function of Vds.
  • Saturation Region

    • Pinch off from (Vgs-Vds)<=Vt,electron channel under the gate began to disappear
    • In saturation channel voltage remains constant to (Vgs-Vt)
      • Id(sat) = kn((Vgs-Vt)(Vgs-Vt)-((Vgs-Vt)**2)/2)==Kn/2*(Vgs-Vt)**2
      • Acts like a perfect current source as there is no dependency on Vds. In reality it is affected by Vds.
      • Id(sat) = Kn/2((Vgs-Vt)**2)*(1+(lamda*Vds))
  • SPICE Simulation

Spice file: day2_nfet_idvgs_L0p25_W0p375.spice

.param temp=27


*Including sky130 library files
.lib "sky130_fd_pr/models/sky130.lib.spice" tt


*Netlist Description

XM1 Vdd n1 0 0 sky130_fd_pr__nfet_01v8 w=0.39 l=0.15

R1 n1 in 55

Vdd vdd 0 1.8V
Vin in 0 1.8V

*simulation commands

.op
.dc Vin 0 1.8 0.1 

.control

run
display
setplot dc1
.endc

.end

SPICE NMOS Id-Vgs Graph

Screenshot 2023-03-09 145103

Spice file:day2_nfet_idvds_L0p25_W0p375.spice

*Model Description
.param temp=27


*Including sky130 library files
.lib "sky130_fd_pr/models/sky130.lib.spice" tt


*Netlist Description

XM1 Vdd n1 0 0 sky130_fd_pr__nfet_01v8 w=0.39 l=0.15
R1 n1 in 55
Vdd vdd 0 1.8V
Vin in 0 1.8V

*simulation commands

.op
.dc Vdd 0 1.8 0.1 Vin 0 1.8 0.2

.control

run
display
setplot dc1
.endc

.end

SPICE NMOS ID-Vds Graph:

Screenshot 2023-03-09 145453

Velocity Saturation

For lower nodes there is a fourth region of operation - velocity saturation. At lower electric fields the velocity tends to be linear, at higher field velocity tends to be constant because of scattering effects.

Velocity saturation effect

  • Long channel (>250nm)
  • Short channel (<250nm)
  • Id = Kn*((Vgt-Vmin)-((Vmin**2)/2)*(1+lamda*Vds)
  • Vmin = min(Vgt,Vds,Vd(sat))

Screenshot 2023-03-09 162802

```Vdsat``` is the saturation voltage at which device velocity saturates and is independent of the Vgs and Vds. It is a technology parameter.

CMOS Voltage Transfer Characteristics(VTC)

Transistor

  • switch off when |Vgs| < |Vt|
  • switch on when |Vgs| > |Vt|

Assume CMOS Inverter in 0-2v Range

Screenshot 2023-03-10 121140

Spice File: day3_inv_vtc_W0p084_W0n084.spice

*Model Description
.param temp=27


*Including sky130 library files
.lib "sky130_fd_pr/models/sky130.lib.spice" tt


*Netlist Description


XM1 out in vdd vdd sky130_fd_pr__pfet_01v8 w=0.84 l=0.15
XM2 out in 0 0 sky130_fd_pr__nfet_01v8 w=0.84 l=0.15
Cload out 0 50fF
Vdd vdd 0 1.8V
Vin in 0 1.8V

*simulation commands

.op

.dc Vin 0 1.8 0.01

.control
run
setplot dc1
display
.endc

.end

VTC for identical (W/L) P/NMOS

Screenshot 2023-03-16 143922

CMOS Switching Threshold and Dynamic Simulations

  • Switching Threshold - Vm,Threshold voltge should be near the middle of the CMOS inverter characteristics. - Vm = RVdd/(1+R), ```R = (Rp(Wp/Lp)Vdp)/(Rn(Wn/Ln)*Vdn)

  • Transition Delay

    • Rise Delay, Input 0 and output 1
    • Fall Delay, Input 1 and output 0
    • From device physics PMOS (W/L) = 2.5(W/L)NMOS
    • Regular inverter/buffer is preferred for data-path.
  • Spice File: day3_inv_vtc_W0p084_W0n036.spice

*Model Description
.param temp=27


*Including sky130 library files
.lib "sky130_fd_pr/models/sky130.lib.spice" tt


*Netlist Description


XM1 out in vdd vdd sky130_fd_pr__pfet_01v8 w=0.84 l=0.15
XM2 out in 0 0 sky130_fd_pr__nfet_01v8 w=0.36 l=0.15
Cload out 0 50fF
Vdd vdd 0 1.8V
Vin in 0 1.8V

*simulation commands

.op

.dc Vin 0 1.8 0.01

.control
run
setplot dc1
display
.endc

.end

VTC for balanced PMOS/NMOS driving strength.i.e (W/L)PMOS = 2.5* (W/L)NMOS

Screenshot 2023-03-16 145818

Spice File: day3_inv_tran_W0p084_W0n036.spice

*Model Description
.param temp=27


*Including sky130 library files
.lib "sky130_fd_pr/models/sky130.lib.spice" tt


*Netlist Description


XM1 out in vdd vdd sky130_fd_pr__pfet_01v8 w=0.84 l=0.15
XM2 out in 0 0 sky130_fd_pr__nfet_01v8 w=0.36 l=0.15
Cload out 0 50fF
Vdd vdd 0 1.8V
Vin in 0 PULSE(0V 1.8V 0 0.1ns 0.1ns 2ns 4ns)

*simulation commands

.tran 1n 10n

.control
run
.endc

.end

Screenshot 2023-03-10 120808

Rise-Delay = 0.33
Fall-Delay = 0.33

CMOS Noise Margin Robustness

  • Noise Margin
    • NMH = VOH - VIH
    • NML = VIL - VOL

Screenshot 2023-03-15 125520

  • Spice Simulation file: day4_inv_noisemargin_wp1_wn036.spice
*Model Description
.param temp=27


*Including sky130 library files
.lib "sky130_fd_pr/models/sky130.lib.spice" tt


*Netlist Description


XM1 out in vdd vdd sky130_fd_pr__pfet_01v8 w=1 l=0.15
XM2 out in 0 0 sky130_fd_pr__nfet_01v8 w=0.36 l=0.15
Cload out 0 50fF
Vdd vdd 0 1.8V
Vin in 0 1.8V

*simulation commands

.op

.dc Vin 0 1.8 0.01

.control
run
setplot dc1
display
.endc

.end
  • Inverter switching transition diagram

Screenshot 2023-03-15 122537

ITEM VOLTAGE
VOH 1.72069
VIH 0.973585
VIL 0.766038
VOL 0.106897
NMH 0.747105
NML 0.659141

CMOS Power Supply and Device Variation Robustness

Power supply scaling means scaling(decreasing) the voltage at which the device is working.

  • Power supply scaling

    • |Gain| = |Vout(VIH)-Vout(VIL)|/|VIH-VIL|
    • Advantages - Increase in gain by 50% - Reduction in energy (E = 1/2*CVdd**2)
    • Disadvantage - Performance impact on dynamic transition(larger delay)
  • Process Variation

    • Due to Etching : Variation in the layout shape, not exact rectangular formation
    • Due to oxide thickness : Non-uniform oxide thickness in the oxide layer beneath the poly gate
  • Device Variation

    • shift in Vm
    • Variation in NMH/NML
    • Operation of Gate is intact

Spice file for supply variation: day5_inv_supplyvariation_Wp1_Wn036.spice

*Model Description
.param temp=27


*Including sky130 library files
.lib "sky130_fd_pr/models/sky130.lib.spice" tt


*Netlist Description


XM1 out in vdd vdd sky130_fd_pr__pfet_01v8 w=1 l=0.15
XM2 out in 0 0 sky130_fd_pr__nfet_01v8 w=0.36 l=0.15
Cload out 0 50fF
Vdd vdd 0 1.8V
Vin in 0 1.8V

.control

let powersupply = 1.8
alter Vdd = powersupply
	let voltagesupplyvariation = 0
	dowhile voltagesupplyvariation < 6
	dc Vin 0 1.8 0.01
	let powersupply = powersupply - 0.2
	alter Vdd = powersupply
	let voltagesupplyvariation = voltagesupplyvariation + 1
      end
 
plot dc1.out vs in dc2.out vs in dc3.out vs in dc4.out vs in dc5.out vs in dc6.out vs in xlabel "input voltage(V)" ylabel "output voltage(V)" title "Inveter dc characteristics as a function of supply voltage"

.endc

.end

Screenshot 2023-03-15 175722

Screenshot 2023-03-15 175754

GAIN RATIO
dc1.out 7.7141
dc6.out 9.1845

Spice file for device variation: