Multiple Linear Solvers Introduced in SmartSpice

Acknowledging the need for more flexibility, SmartSpice now provides three numerical methods for linear system solution. The additional solvers provide for greater capacity by minimizing memory requirements and reducing the overall simulation time.

Historically, spice simulation tool vendors were locked into using the Berkeley Sparse linear system solver in their products due to its tight integration with the simulation engine, and sometimes even with the implementation of certain analyses and models.

Therefore, adding a better linear solver into a spice package has proven to be extremely difficult, unless "reverse integration" is accomplished. The solver should be decoupled from the the rest of the spice package so as to present a very clean interface to the simulator. Silvaco has succeeded in this decoupling approach in SmartSpice and managed to avoid any runtime overhead that usually occurs with this kind of restructuring thanks to the approach it chose to perform this task: instead of introducing a new interface layer between the solver and the simulator, Silvaco stripped away the old interface altogether and replaced it with a clean one.

Once this decoupling was implemented, a multitude of different solvers can ce plugged in through the common solver interface without interfering with what customers are used to with previous versions of SmartSpice. This new approach to solving spice linear systems yielded immediate results. What is to become the default SmartSpice solver is an enhanced version of the Berkeley Sparse1.3 solver. The gain with this solver is an average of 10% on overall simulation time. This gain is achieved by merely using a more compact data structure than the original one provided by Berkeley Sparse. The main objective is to improve memory access patterns by putting data accessed around the same time in memory locations close to one another. This is known as spatial locality. This technique was coupled with a cache blocking approach that increases the percentage of useful data loaded into the processor's cache with each load request from the main memory. Future versions of this solver library will show much better speeds as there is still room for improvement using for instance temporal locality.

Numerically well-conditioned circuits can also take advantage of a much faster solver SmartSpice provides which is called Speeds.

The Berkeley Sparse1.3. solver is still available for completeness and backward compatibility. Our goal is to provide the matrix inversion method best suited for each type of circuit. We will briefly discuss each methods' speed improvement and provide a broad guideline for when each method is appropriate. The linear solver method can be chosen from within the SmartSpice input deck using the statement: .

OPTION SOLVER= <method>

where <method> is

- speeds: to use the fast solver

- sparse: to revert to Berkeley Sparse1.3

- default: uses the default SmartSpice Solver.

 

1) The Default SmartSpice Solver

While retaining the same numerical properties as Berkeley Sparse1.3, the default SmartSpice solver has been optimized for speed. This solver relies on the stability of the structure of the circuit matrix to optimize memory layout and data transfer in memory. A special memory layout of the sparse matrix elements has been devised to optimize data access during the linear solver phase. This layout is used throughout the simulation in order to minimize cache-misses during the LU factorization and back solve phases. The average gain on simulation time with respect to the solver in previous releases of SmartSpice ranges between 5 and 15 %. Whenever the matrix structure itself changes during the simulation, the improved matrix layout needs to be re-evaluated to make sure it still is optimal. In very rare situations when this happens very often during the course of one simulation, the overhead will become significant and the new solver will not show a big speed improvement.

 

2) The Speeds Solver

One of the most time consuming operations in the default SmartSpice solver is the search for the best pivot element during the LU decomposition phase, at every iteration of the process. This choice of pivot element is supposed to minimize the fill in of the resulting matrix, therefore minimizing memory usage and also improving stability of the LU decomposition. With some circuits, this is overkill, which led Silvaco to implement a much faster solver that takes advantage of a circuit's stability, to speed up the simulation.

With the Speeds solver, the "pivrel" and "pivtol" parameters are used to check that the pivot is a good pivot, while bypassing the time-consuming pivot search. This solver proves to be efficient in two types of situations: - on circuits with a structurally stable and well-conditioned matrix, - in cases where matrix reordering and factorization takes more than 50% of the total simulation time.

2-1) For general circuits with well-conditioned matrices, the time gain varies between 5% and 25% of the total simulation time. Here are some examples of execution times with the two solvers on two benchmarks.

Circuit Speeds default ratio(%)
bench_44.sp 28.88 sec. 37.36 sec. 77.30
bench_62.sp 326.50 sec 378.01 sec. 86.37

 

2-2) In some cases, the Speeds solver can provide dramatic time gains because it bypasses completely the matrix reordering phase. For circuits spending more than 50 % of simulation time in matrix reordering or factorization, this can lower computation time dramatically. Sometimes, these circuits are the ones with many independent voltage sources that are connected to many devices.

Using a ".option acct" statement, one can determine which phase of the circuit takes the biggest part of total simulation time. Example of such a transient analysis:

<.option solver=default acct>

Total user time : 1993.330 seconds

Total user time : 1993.330 seconds
Total system time : 1.220 seconds
<...>
equations (Circuit Equations) = 2540
loadtime (Load time) = 186.83
lutime (L-U decomposition time) = 310.83
reordertime (Matrix reordering time) = 1291.49
solvetime (Matrix solve time) = 23.36
transpoints (Transient timepoints) = 2061

Note that the reordering and factor phases now take only 20% of the total simulation time. As a side effect, the total number of Newton iterations is twice smaller, which accounts for the halved "loadtime".

 

3) The Berkeley Sparse1.3 solver

The Berkeley Sparse1.3 solver is retained for possible cases when: - reordering and factorization phases take more than 50% of total simulation time - AND the Speeds solver fails to solve because the matrix is ill-conditioned.

 

Conclusion

SmartSpice now provides three numerical methods for matrix inversion, allowing greater flexibility in adapting the simulator to each input deck. In addition to the new SmartSpice default solver, stable circuits can benefit from a faster solver, which bypasses the reordering phase of the sparse matrix. The extraction of the linear solver functionality from the core spice simulator allows great flexibility for future developments. This flexibility paves the way for other linear solvers to be implemented, which will scale much better than current solvers with the circuit size. With proper preconditioning techniques, iterative solvers may well become the solution to the bottleneck towards a true multi-million transistor spice simulator.