Parallel DRC by Functional Decomposition of DRC Decks

1. Introduction

A new version of Savage DRC system taps the full power of multiprocessor computers by introducing simultaneous execution of DRC commands from a DRC deck. A DRC deck is a set of DRC rules to be performed for a given layout. For typical DRC decks, the described approach alone delivers high processor utilization and speed-up. Further improvements are possible using parallelization by decomposition of IC layout data and exploiting its hierarchy.

This article describes the approach used in the current implementation. The description provides guidelines for writing efficiently parallelizable DRC decks.

2. Algorithm

The DRC deck parallelization has to address two issues:

  • Every rule of deck is computationally independent from others (beneficial factor);
  • There is input/output data dependency between rules of a DRC deck (adverse factor).

The main idea of DRC parallelization comes from the fact that a DRC command can start as soon as all input data for the command is ready, it does not have to wait for all preceding commands in deck to finish. By comparing the inputs and outputs of all DRC commands in deck we can determine the data dependency between the execution of the commands and hence build a data dependency graph. The graph consists of a set of nodes. A node may correspond to one command or several consecutive commands in deck. Each node may have parent nodes and child nodes. Node is ready to be processed if all parent nodes have been processed. Each node is assigned a "weight", which is the number of child nodes it has. If there are several nodes ready to be executed, nodes with higher weights are submitted for processing first.


Figure 1. Data dependency graph of a DRC deck with weights of nodes.

The parallel algorithm is implemented by having one main thread of execution and N worker threads. The main thread analyzes DRC deck to be executed and builds the dependency graph object. Then it starts N worker threads and waits for all of them to exit. Each worker thread takes 1 task from the graph, submits it for execution to a DRC executable module, notifies the graph when the task is completed and asks it for new task to execute…

Here is an outline of the parallel algorithm:

I. Main thread :

  • Read design rules
  • Build dependency graph
  • Create N Worker threads
  • Wait for all threads to exit
  • Collect results of all performed commands (such as Error Database and Log file)


II. Worker thread :

While there are unprocessed tasks in graph DO


- Ask graph for a task to execute

- Call external DRC executable to perform corresponding command(s)

- Wait for executable to exit

- Notify graph of completion of the task


- Exit Thread


3. Results and Discussion

In Table 1 and Table 2 you can see the tables of runtime comparisons for 1-CPU vs. Multi-CPU DRC executions:

Table 1. PC: Windows NT 4.0, 2-CPU, 800Mhz.


Table 2. Sun: Solaris 5.7, 4-CPU, 400Mhz


Note that performance of parallel DRC depends on the structure of the DRC deck. In above-mentioned examples, the test "S00" shows comparatively low speedup on 4-CPU computer. A reason of the low speedup of the test "S00" is "bad parallelizable" structure of DRC deck for this example. That means its graph of data dependency contains several "bottleneck" commands that have to wait completing of all (or almost all) previous ones. These "bottleneck" commands often generate a situation when several CPUs are not used.


Figure 2. Bad parallelizable structure of graph (with node weights).


Here is an example of a script (Script1) that is not parallelizable at all:

// Script name : Script1
/* 1 */ Substrate: LayerR=⊂
/* 2 */ Select: Relation=ENCLOSE, Layer1=&SUB, 
Layer2=M5, LayerR=&TMP;
/* 3 */ And: Layer1=M4, Layer2=$TMP, LayerR=PJCT;
/* 4 */ Select: Relation=ENCLOSE, Layer1=&SUB, 
Layer2=M4, LayerR=&TMP;
/* 5 */ Dif: Layer1=M1, Layer2=$TMP, LayerR=METLDIFF;
/* 6 */ Or: Layer1=M1, Layer2=M2, LayerR=&TMP;
/* 7 */ OverSize: Value=0.005um, Layer=&TMP, 


Figure 3. Graph of data dependency for
Script1 (with command numbers).


Script1 has 3 independent parts that could be executed in parallel: (commands 1-3, 4-5, 6-7), but each independent part uses the same name for temporary layer (&TMP), which creates additional dependencies in graph. In order to use benefits of parallel DRC we can rewrite the same script using &TMP1, &TMP2, &TMP3