Multivariate Adaptive Regression Splines in Standard Cell Characterization for Nanometer Technology in Semiconductor Multivariate Adaptive Regression Splines in Standard Cell Characterization for Nanometer Technology in Semiconductor

Multivariate adaptive regression splines (MARSP) is a nonparametric regression method. It is an adaptive procedure which does not have any predetermined regression model. With that said, the model structure of MARSP is constructed dynamically and adaptively according to the information derived from the data. Because of its ability to capture essential nonlinearities and interactions, MARSP is considered as a great fit for high-dimension problems. This chapter gives an application of MARSP in semiconductor field, more spe -cifically, in standard cell characterization. The objective of standard cell characterization is to create a set of high-quality models of a standard cell library that accurately and efficiently capture cell behaviors. In this chapter, the MARSP method is employed to characterize the gate delay as a function of many parameters including process-voltage-temperature parameters. Due to its ability of capturing essential nonlinearities and inter- actions, MARSP method helps to achieve significant accuracy improvement.


Introduction
Multivariate adaptive regression splines (MARSP) was first proposed by Friedman [1] for solving regression-type problems. MARSP is widely used to predict the values of an outcome variable from a set of predictor variables. There are many methods for model fitting, and MARSP is one of them. Other modeling techniques include linear regression (e.g., general linear model), nonlinear regression (e.g., generalized linear/nonlinear models), and regression trees (e.g., classification and regression trees), and so on. It is also worth noting that a neural network, which is very popular nowadays in an era of artificial intelligence and machine learning, is also a modeling technique.
MARSP is a nonparametric regression procedure that makes no assumptions about the underlying functional relationships between dependent and independent variables. The form of MARSP and its coefficients are entirely derived from the regression data. The modeling strategy is called "divide and conquer," by which the input space is partitioned into a number of regions, with each region having its own regression equation. This makes MARSP particularly efficient for high-dimension problems, where other techniques most likely have accuracy issues.
As the name suggests, MARSP uses splines as its main component. Splines are piecewise curves from polynomial functions. When different splines are smoothly connected, it can result in a flexible model which can handle both linear and nonlinear situations. The connection points between different pieces are called knots, which connect the end of one region of data and the beginning of another.
The MARSP technique has been particularly popular in data mining because it does not require or assume any particular type or any class of relationship (e.g., logistic, linear, etc.) between the outcome variable of interest and the predictor variables. Instead, MARSP derives useful models (i.e., models that yield accurate predictions) even in situations where the relationship between the predictor variable and the predictor variables is difficult to approximate with parametric models. If you are interested in more information about MARSP and how it compares to other methods for nonlinear regression (or regression trees), please refer to Chapter 9 of [2].

Standard cell characterization in very large scale integration (VLSI) design
In semiconductor design, standard cell methodology is a method that is widely used for very large scale integration (VLSI) design, especially for digital logic circuits. It is a design abstraction, where the low-level circuit layout can be encapsulated into many abstract logic representations (e.g., NOR2, NAND2 cells). As a cell-based methodology, it can enable one designer to focus on the high-level aspect (logical function) of a design, while another designer can work on the implementation aspect (physical layout). As semiconductor fabrication technology progressed to sub-10 nm regime, standard cell methodology was the enabler to allow designers to scale application-specific integrated circuits (ASICs) from simple chips of several thousand cells, to complex chips with hundreds of millions of cells.
A standard cell provides a Boolean logic function (e.g., AND, OR) or a storage function (latch or flip-flop). A standard cell can be as simple as an inverter which consists of only two transistors. It can also be as complex as adders or multiplexers which have tens of transistors. As a standard cell is a logic gate, "cell" and "gate" are often interchangeable. Standard cell library is a collection of predefined cells which are usually fully customized to a specific technology and optimized

Introduction (problem formation)
As mentioned above, one of the most important tasks in standard cell characterization is to find a model which can accurately capture the relationship between the cell propagation delay and the parameters that have impact on cell delay (as shown in the paragraph above). Here, the cell propagation delay is the response variable, and the impacting parameters (input transition time, output loads, VDD, and the process parameters) are the explanatory parameters.
We have not talked about the number of explanatory parameters yet. But as mentioned in Section 1, MARSP is suitable for the high-dimension problem while capturing essential nonlinearities and interactions. In the following subsections, we introduce the high-dimension parameter space when characterizing the delay models of standard cells, especially when the process variations and aging effect are included [3][4][5][6][7].

Process variations
When integrated circuits are fabricated, the parameters of individual transistors vary. The observed random distribution of identically drawn devices is caused by the fabrication process like impurity concentration densities, oxide thicknesses, and diffusion depths, and so on. These physical variations cause changes in the electrical characteristics of the transistors which eventually lead to the variability in the circuit performance. This is called process variation. Process variation is the naturally occurring variation in the attributes of transistors (length, widths, oxide thickness) during the chip fabrication. The scaling down of the VLSI process technologies has increased the process variations, especially in sub-45 nm era.
Process variations can be generally categorized into two classes: inter-die and intra-die variations. Inter-die variations occur from one die (chip) to another, meaning that the same transistor in the design can get different features (channel lengths, threshold voltages, etc.) among different dies (chips). Intra-die variations are variations in transistor features within a chip, meaning that transistors at different locations on the same die can get different features. Spatial correlations are often seen for intra-die variations, meaning adjacent transistors have a higher probability of having similar features than transistors that are far apart. In this work, we consider not only inter-die and intra-die variations, but also the intra-gate variations. Intra-gate variations are part of intra-die variations, in some sense. It is the variations within a gate (cell), meaning that the transistors within the same gate can have different features. While most of the literature works ignored the intra-gate variations, our work has included it. As VLSI technology continues to scale down to sub-10 nm process, intra-die variations (including intra-gate variations) are becoming more and more dominant.
The overall objective of standard cell characterization is to characterize a cell-delay model which is general and able to include inter-die, intra-die, and intra-gate variations with any kind of distribution and any correlation profile between different parameters. In this work, only process variations of standard cells are considered, meaning the variations in interconnect geometries are not considered.

Loading effect modeling (pi-model)
As technology scales down, the impact of interconnect on circuit timing cannot be neglected. In this work, we model interconnect as a resistive-capacitive (RC) network where all the capacitances are grounded.
A small patch of a gate-level circuit is illustrated in Figure 1(a), where a driving Buffer gate has two loading gates, a NOR2 gate and an inverter gate. Figure 1(b) replaces the two loading gates with corresponding input capacitances. The input capacitances of loading cells, together with the interconnect network, form the load of the previous driving cell. With loading gates modeled as corresponding input capacitances, circuit timing can be analyzed in the way that each stage contains a standard cell and its connecting load as Figure 1(b) shows. If the readers are interested in the input-capacitance modeling of the standard cells, they can refer to [8,9] for more details.
Reduced-order models are routinely used to replace the original large-order models. The Pi-model is the most popular reduced-order model to estimate the input admittance of RC interconnects. Figure 2 gives the structure of the Pi-model, where Y(s) denotes the input admittance of the original network and Y′(s) denotes the input admittance of the Pi-model. The values of C 1 , R , and C 2 are obtained by equating the first, second, and third moments of the Pi-model to corresponding moments of the original network.
In Pi-model, we use three parameters to represent the loading effect of the whole RC interconnect. These three parameters C 1 , R, and C 2 , as well as the PVT parameters and input transition time, construct the parameter space for standard cell characterization which is introduced later in Section 3.
The shift in channel length (from the nominal value) is denoted as ∆ L , and threshold voltage shift (from the nominal value) is denoted as ∆ Vth . The supply voltage and temperature of a gate are denoted as ∆ VDD and ∆ T , respectively, assuming that all the transistors within the same gate share the same voltage and temperature. The Pi-model which represents the load of a gate includes three parameters, namely R pi , C pi1 , C pi2 . The input slew time (Slope) is also included for each timing arc. Note that in this work the effect of Multiple Input Switching (MIS) was not considered.  For a cell which has N transistors, there are 2*N device parameters (i.e., ∆ L, ∆ Vth for each transistor within the cell), and six global parameters ( ∆ VDD , ∆ T , R pi , C pi1 , C pi2 , Slope ). This results in a total of (2*N + 6) parameters for a cell with N transistors. In our experiments with a commercial standard cell library, the highest value of N is 32, making the highest (2*N + 6) as 70, which results in a quite high-dimension parameter space for cell characterization.
At this point, we have not introduced the aging effect into the parameter space. If the characterized delay models need to be aging-aware, the aging parameters should be included in the parameter space. With aging parameters included, the dimension of the parameter space would be even higher. We discuss it in the following subsection.

High-dimensional parameter space in aging-aware standard cell characterization
For timing analysis, transistor aging is another source of variability besides PVT variations [10,11]. Our work has considered the following wear-out mechanisms: bias temperature instability (BTI), hot carrier injection (HCI), and time-dependent dielectric breakdown (TDDB). The impact of BTI and HCI is similar as they both cause the threshold voltage of aged transistors to increase, which further decreases the driving strength and ultimately increases gate delay over time. TDDB degrades the drain current of the stressed devices which also results in increased gate delay. Overall, BTI, HCI, and TDDB ultimately cause the cell delay to increase over time. When the increased circuit delay exceeds the clock period, the degraded circuit will fail to work. Therefore, the aging effect needs to be taken into account in circuit timing simulations, especially for those high-reliability applications like aviation, space, automotive [12], medical [13][14][15][16][17], data center [18], and so on.
The variation of channel length and the variation of threshold voltage are denoted as ∆ L and ∆ Vth , respectively. For channel length, the variation ( ∆ L ) comes from only process variation, while for transistor threshold voltage, the variation ( ∆ Vth ) comes from both process variation and aging effect (BTI and HCI).
As the value of N is as high as 32 in our experiments with a commercial library, the value of (4*N + 6) can be as high as 134. Compared to 70, which is the value of (2*N + 6) for cell characterization without aging effect, the dimension of parameter space in the aging-aware cell characterization has nearly doubled.

Training data
We have obtained our training data from simulation program with integrated circuit emphasis (SPICE) simulations. A mixture of central composite design and random samples are used for the design of experiments. Table 1 shows the corners which are used for central composite design.

Why is multivariate adaptive regression splines (MARSP) better
Why is MARSP better than other methods in our application of standard cell characterization? Traditional methods like response surface methodology (RSM) use the same model to cover the entire parameter space. In our application where intra-gate variability is considered, the dimension of the parameter space is particularly high. When the number of input parameters is high, the parameter space is very high dimension. Using one single regression model to estimate gate delay (or slew time) over the whole parameter space is not sufficiently accurate, especially for a complex cell containing over 40 transistors. References [23,24] proposed a clustering method which categorized transistors into switching/non-switching devices and on-transition/off-transition/non-transition devices. This method requires manual intervention to 'filter out' the negligible devices for each of the switching scenarios, which is quite cumbersome. Using MARSP, it can reduce the manual work and automatically capture the essential parameters in its intelligent process.

MARSP for standard cell characterization
This chapter employs MARSP to characterize a fitted function between response variables (gate delay or slew time) and the explanatory parameters (process-voltage-temperature parameters, aging parameters, and RC loads). MARSP uses piecewise polynomial segments to capture essential nonlinearities and interactions, and it is particularly suitable for highdimension problems. This piecewise nature allows MARSP models to split the whole parameter space into multiple subspaces, and each subspace can have a unique regression model. By using hinge functions, MARSP then inherently integrates the regression models of all the subspaces into a single general form. A hinge function has the form of which are shown in Figure 3. They are defined as: where t is a constant called the knot. MARSP forms a collection of hinge-function pairs for each explanatory parameter X j with knots at x where M is the number of experiments.
MARSP models have the following form: is a basis function. There are two phases in the process of constructing a MARSP model: the forward stepwise addition and the backward stepwise deletion.
The first phase is the forward stepwise addition, where MARSP starts with a model consisting of an intercept term. It then repeatedly adds basis functions in pairs to the model step by step. At each step, MARSP finds the pair of basis functions which maximized the reduction in the residue sum-of-squares error. The two basis functions in the pair are identical except that the hinge functions used for each basis function are mirrored. The newly added basis function is constructed by a term that is already in the model (a constant 1 is also considered as an existing term) multiplied by a new hinge function. The process of forward addition phase continues until the residual error difference in two adjacent steps is smaller than a predefined threshold or until the number of terms in the model reaches the maximum.
The model from the forward addition phase usually overfits the data. If a model overfits, it means the model fits well to the training data that are used to build the model, but usually it does not fit to new test data. The second phase of MARSP, namely the backward stepwise deletion, is to build models that can generalize better to new data. Backward deletion phase prunes the model obtained from the previous forward addition phase. In this phase, the technique called generalized cross validation (GCV) is used to trade off goodness-of-fit against model complexity. The stepwise backward deletion phase repeatedly deletes the least important term (according GCV) at each step until the model again has only the intercept term left. At the end of the backward deletion phase, from among the "best" models of each size at each step, the model with the lowest GCV value is selected, and it is outputted as the final model.
MARSP is a nonparametric regression method, so there are no predetermined forms of the model. Instead, the model is constructed adaptively according to the information extracted from the training data. It intelligently removes those negligible parameters that have limited impact on the to-be-modeled gate delays or output slew without manual intervention. Using MARSP for cell characterization can eliminate the need of clustering transistors into the categories of switching/non-switching devices and on-transition/off-transition/non-transition devices, as proposed in [23,24].
The MARSP model is piecewise in nature, so MARSP can split the whole parameter space (which is high-dimension in our application) into multiple subspaces, with each subspace getting its own model. Then the regression models of all the subspaces are integrated into one general expression using piecewise hinge functions. In this way, MARSP can characterize standard cells only once over the whole PVT space, without the need of splitting parameter space and characterizing every subspace.

Experimental results
The characterization variables are delay and transition time. Cell delay is the delay from the 50%-point at the cell input to the 50%-point at the cell output. Cell transition time is also called the output slew time, and it is the time between the 20%-point and the 80%-point at cell output (20-80% for rising transition and 80-20% for falling transition). The goal is to find a model that best fits the relationship between the gate delay (output slew) and the explanatory parameters.
In our work, MARSP is implemented using a Matlab toolbox called ARESLab [25]. Some key settings for ARESLab are as follows: the maximum degree of interactions between explanatory parameters is 3; the maximum number of basis functions is 30; the threshold for the stopping criteria is set to 10 −4 . Please note that all the training data have been normalized.
A commercial library consisting of 247 standard cells was used, and every timing arc for every cell was characterized. The characterization results for some representative cells are shown in Table 2. The "4*N + 6" column in the table means the number of parameters in the MARSP model, and the "Time(s)" column means characterization time. The "Error" column ("Mean" and "S.D.") means the average value and standard deviation of the errors between MARSP and golden reference (SPICE), respectively.
The interconnect characterization is similar to the gate although there are only five considered parameters in our work. The details of reduced-order model of interconnect transfer function is not covered in this chapter (Please refer to [8,9] for more details). The interconnect results are also shown in the last row of Table 2. Interconnect variability (spacing, width) is not included in our experiments. It is also worth noting that our methodology can support a higher-order H′(s)-model which matches more moments of the original H(s) at the expense of adding more parameters to the MARSP models.

Validation using test paths
Our framework was implemented with C++ and Perl, and the experiments were run on a Linux platform with a 2.27 GHz CPU and 1GB memory without using multi-threading.
Our experiments are based on ISCAS85 benchmark circuits where temperature and supply voltage are considered as global parameters, meaning that all the transistors across the circuit have the same values of temperature and voltage. However, it is worth noting that our methodology can support a temperature profile from a thermal simulator and a voltage profile from an IR-drop simulator. For process variation, as mentioned earlier, we have considered inter-die, intra-die, and intra-gate variations. For channel lengths, we have considered interdie and intra-die variation, and for threshold voltage, intra-gate variation is considered. This is because channel length is mostly impacted by lithography and etching which exhibit strong spatial correlations, while threshold voltage is strongly affected by random dopant fluctuations. Again, please note that our methodology can work with any inter-and intra-die variation model and with any distributions and any correlation profiles.
We have shown our MARSP models are perfectly accurate individually. Here we construct a framework to integrate our models and then verify its accuracy using test paths. We refer to our framework as GTSSTA hereafter. Two thousand Monte Carlo samples were run for 10 randomly selected test paths from ISCAS85 benchmark. As shown in the framework above, path delay is calculated for each sample. This obtained delay value is compared to the delay value from hSpice [26], using Eq. (6).
A quadratic delay model was also implemented and tested to give a comparison. The quadratic first generates a quadratic regression model as follows: D denotes gate delay, X i denotes the explanatory parameters, d 0 denotes the constant term, and a i and b i denote coefficients of first-order and second-order terms, respectively.  Table 3 presents the results for our framework in comparison to hSpice using these 10 test paths. Figure 6 gives the histogram comparison of one of the paths between hSpice and GTSSTA. Results in Table 3 also show that quadratic model has limited accuracy for the 10 test paths.

Runtime analysis
Experimental results show our framework consumes only ~2% more runtime than quadratic delay model but achieves much better accuracy.
The quadratic delay model in Eq. (7) has a fixed number of operations, that is, 120 multiplications and 66 additions for a one-input gate and 224 multiplications and 120 additions for a two-input gate. The number of operations using MARSP models is not fixed, and it depends on which subspace the data sample falls into. Basically, calculating a MARSP model will have comparisons first and based on the comparison results, different equations (linear, quadratic etc.) are used for calculations. In average, the number of operations for the MARSP model is close to that of the quadratic delay model.

Conclusion
This chapter talks about the technique called multivariate adaptive regression splines (MARSP). MARSP is a nonparametric regression without taking any pre-assumed form. Instead, it adaptively constructs the model according to the provided data. MARSP has been widely used in high-dimension problems and particularly popular in data mining.
This chapter also gives an application of MARSP in semiconductor field, more specifically, in standard cell characterization. The objective of standard cell characterization is to create a set of high-quality models of a standard cell library that accurately and efficiently model cell behavior. In this work, the MARSP method is employed to characterize the gate delay as a function of many parameters including process-voltage-temperature parameters. Due to its ability of capturing essential nonlinearities and interactions, MARSP method helps to achieve significant accuracy improvement.
Some future work that is worth investigating includes extending the aging-aware MARSPbased timing analyzer to 3D integrated circuits (IC) to study the reliability of 3D ICs which tend to have reliability challenges due to the stronger heat issues. 3D ICs requires more sophisticated thermal models [27][28][29] and more complicated power-grid analysis [30]. As mentioned earlier, the methodology in this chapter is general to support other thermal and IR-drop models

Taizhi Liu
Address all correspondence to: taizhiliu88@gatech.edu Georgia Institute of Technology, Georgia, United States of America