Advanced Pairwise Merging Algorithm for VLSI Floorplanning
Introduction
This paper concerns the problem of determining optimal placement of rectangular blocks within a rectangular area known as the packing or cuttingstock problem. This problem arises at then floorplanning stage of VLSI design.
Assume that one is given n different types of rectangular blocks R0, R1, ... Rn ? 1, where Ri is rectangle hi x wi. The goal is to find nonoverlapping placement of the blocks within rectangular area A so that uncovered part of the area is as small as possible. The rectangle A is referred to as target area. Without loss of generality one can assume that a block of any type Ri can fit in the target area. This problem is referred to as packing problem. The problem is also known as cutting stock problem. (It can be easily reformulated as problem of cutting rectangular sheet of material into rectangular pieces. We will use this name to refer to the problem as well.)
If some resultant placement P contains ci blocks of type i, square of a block of type i is Si = hi * wi and target area A is a rectangle with dimensions h and w, then square of uncovered part of A can be obtained from the following formula:
TWP = h * w  c0 * S0  c1 * S1  ?  cn  1 * Sn  1
Value TWP is called total waste of placement P. Placement P is considered optimal if and only if TWPTWQ for any other placement Q.
In general case optimal solution is very difficult to find and requires an algorithm based on exhaustive search approach. But in most practical cases a acceptable approximate solution can be obtained much faster and easier. There are number of purely heuristic algorithms designed to solve the packing problem which work very fast. There are also some heuristic modifications of exhaustive search algorithms, which substantially increase algorithms performance at cost of quality of result.
Pairwise Merging Algorithm
One of the popular approaches here is the pairwise merging technique. An algorithm based on this technique works in the following manner. Starting with initial blocks it generates new ones by merging them in pairwise manner. Blocks in each pair are attached to each other without overlapping at different relative positions thus creating a number of combined blocks. Such combined blocks are referred to as clusters. If cluster still fits in target area it is called feasible cluster and participates in subsequent mergers to form larger clusters and so on. Note that any feasible cluster as well as any initial block alone represents more or less satisfactory solution of the problem. When the algorithm is not able to generate any new cluster the best solution found so far is reported as final result. The following is the sketch of algorithm based on pairwise merging technique:

S, L and N ? sets of feasible clusters
feasible(X) ? functions which returns set of all feasible clusters from set X, feasible(X) X
merge(s, l) ? operation which generates finite set of new clusters by merging clusters s and l at different relative positions.
Step 1. S := ; L := {R1, R2, ..., Rn ? 1}
Step 2. N := feasible((sS, lL merge(s, l)) ( L, lL merge(s, l)))
Step 3. if N = goto step 5
Step 4. S := SL; L := N; goto step 2
Step 5. P* = arg minP TWP; output(P*)
To avoid generation of identical clusters again and again the algorithm performs merging (step 2) only in such pairs of clusters that consist of at least one cluster generated at previous iteration (set L). However, it doesn?t guarantee that each cluster will be built only once. It is easy to show that in the general case a cluster consisting of more than two blocks can be built by pairwise merging operation in several ways. By elimination of repeatedly generated clusters from set N on step 2 of the algorithm its performance can be substantially increased.
Operation merge
The main obstacle in the way of practical implementation of this algorithm is merging operation (merge is the sketch above). A number of problems complicate the implementation of this operation and the whole algorithm. The most interesting of them is the following.
Any two geometrical objects, even pair of rectangles, can be merged into a new one in an infinite number of ways. While initial blocks are rectangles and are relatively easy to deal with on first iterations of the algorithm, clusters that appear later can have very complex shapes. From the other side, it is easy to see that all initial blocks and intermediate clusters are isothetic objects. There are theoretical results which show that in order to find an optimal solution in this case we can limit ourselves to consideration of a finite number of relative positions of such objects ? so called contact concurrent positions [4]. However, the number of such positions can still be very large, which results in generation of large amounts of new feasible clusters. In practice the amount of intermediate data generated by the algorithm can easily overflow memory in any modern computer. Not to mention that algorithm?s iteration time grows at a very high rate.
A simple way to reduce the number of variants generated by merge function is the following. First, the particular class of geometrical shapes H is chosen and operation merge is defined as taking two parameters from class H and returning a subset of class H:
merge: (H, H) > 2
It means that each variant of merging of two objects form class
TWP = EWP + IWP
Now when clusters can contain internal waste the notion of cluster domination can be introduced. Consider two clusters s and t, each one consisting of cs0, cs1, ? , csn ? 1 and ct0, ct1, ? , ctn ? 1 blocks of each type respectively. Cluster s is said to be dominating over cluster t if and only if the following two conditions are met (see Figure 1):

1. , for
any i = 1, 2, ..., n;
2. cluster s can be enclosed within cluster t.
Figure 1. Domination: cluster s dominates over cluster
t
When some cluster s is generated at particular iteration of the algorithm, all previously built clusters t, such that cluster s dominates over cluster t, can be discarded without compromising the quality of final solution. In fact, in any placement that contains cluster t it can be naturally replaced with cluster s. Such replacement can only increase the quality of the solution. Note that according to definition of domination relation any cluster s dominates over itself. It means that by discarding dominated clusters the algorithm discards identical clusters as well. The discarding results in substantial decrease in number of clusters built by the algorithm thus reducing the number of variants to be checked and increasing the overall performance of the algorithm.
Heuristic Criteria for Cluster Discarding
Amount of internal waste in cluster can be used in two simple heuristic criteria which significantly reduce number of variants processed by the algorithm [2]. The main idea here is to detect and discard intermediate clusters that are unlikely to be part of optimal solution. The criterion is applied to clusters generated by merge operation. If a cluster meets the criterion it participates in subsequent mergers, otherwise the cluster is discarded because of high percentage of internal waste. The first criterion is formulated as follows:
IWc
Bs* H * W, 0Ba1
This criterion is set to be absolute criterion. Coefficient Ba is parameter of
the algorithm. The first criterion is formulated as follows:
IWc,<Br * square(C), 0 < Br < 1
This criterion is set to be relative criterion. While relative criterion discards "unpromising" clusters evenly on all iterations of the algorithm, absolute criterion tends to be more tolerant to clusters generated on first iterations.
Obviously, these criteria are heuristics, and on particular input data they can "cut off" optimal solution of the problem. But with algorithms described below they provide excellent results in terms of performance and quality of solution. There are also more complex and more precise criteria for discarding clusters that cannot be part of optimal solution [3]. Branchandbound technique can be naturally used in pairwise merging algorithms.
Wangs Algorithm
A good illustration to this approach is Wangs algorithm for twodimensional cutting stock problem [2]. The algorithm uses class of rectangles with their sides parallel to coordinate axes as class of geometrical shapes H. In this case operation merge is easy to define. Two rectangles are be combined in two different ways called horizontal and vertical composition and after that resultant shapes are approximated by their bounding boxes (see Figure 2).
Figure 2. Horizontal and vertical composition
The obvious disadvantage with Wang?s algorithm is that approximation with bounding box in many cases will substantially increase amount of internal waste in intermediate clusters generated by merge operation and, consequently, in final placement. Moreover, it is easy to show that generally this algorithm is not able to find optimal solution of the problem. All placements generated by Wang?s algorithm are so called guillotine ones. In terms of cutting problem, guillotine placement allows cutting of rectangular sheet of material into rectangular pieces by consequential sidetoside cuts. Placement shown on Figure 3 is optimal for corresponding problem but is not guillotine one. It cannot be built by Wang?s algorithm.
Figure 3. Nonguillotine placement.
Lblockbased Algorithm
In this paper an improved algorithm based on pairwise merging technique is introduced. The main difference in the algorithm is the class of so called Lblocks [1] is used as class H and operation merge is defined on class of Lblocks. There are five types of objects in this class (see Figure 4).
Figure 4. Five types of Lblocks.
In sequence of classes of isothetic convexity ICONVEX0, ICONVEX1, ICONVEX2, ? [5] class of rectangles and class of Lblocks occupy first two places (ICONVEX0 and ICONVEX1 respectively). Intuitively, the fact that these classes are found next to each other in this sequence means that Lblocks are "just a little more complex objects" than rectangles. Therefore one can expect that data structures and merge operation for Lblocks are not much more complex than for rectangles. Moreover, class of rectangles is included into class of Lblocks which means that quality of result of Lblockbased algorithm can only be better than quality of result of Wangs algorithm on the same input data. Another important advantage of the algorithm is its ability to process input consisting of Lblocks as well as rectangles.
Operation merge on class of Lblocks can be easily defined through five elementary operations:
 complement. This unary operation complements Lblock to its bounding box (which is Lblock too).
 *1, *2, +1, +2 ? four types of build. These binary operations combine two input Lblocks to form a new one (see [1] for detailed description of these operations).
Completeness of this system of operations is proved by the following theorem [1].
Theorem. Let L be class of Lblocks and f: (L, L) > L is any function that combines pair of Lblocks without overlapping into a new Lblock.
Then for any pair of Lblocks a and b, a, b L, either Lblock (A o B) or (B o A), where A {a, a}, B {b, b}, o {*1, *2, +1, +2}, can be enclosed within Lblock f(a, b) (see [1] for proof). Using this result one can build operation merge for the algorithm:
In general this operation generates 16 variants of merging of two Lblocks. But in each particular case number of generated Lblocks can be substantially less since some of them can be the same. For example, if Lblock a is rectangle then a = a, which means that for two rectangular input blocks operation merge will not produce more then four new blocks.
However, this Lblockbased algorithm is not guaranteed to find optimal solution of packing problem. As Wang?s algorithm is not able to generate nonguillotine placements, this algorithm is not able to produce placement shown below (see Figure 5). Moreover, counterexample can be built for any pairwise merging algorithm which works within one of the classes of isothetic convexity [6].
Figure 5. Counterexample for Lblockbased algorithm
Computational Results
A number of placements generated by Lblockbased algorithm are shown below (see Figure 6). The problem solved in all three cases is constrained cutting stock problem on input data from [2] and [3].
It is important to note that in examples a) and c) the Lblockbased algorithm produced nonguillotine placement. In this two cases quality of solutions is substantially better than that provided by Wang?s algorithm (see [2]).
Figure 6. Placements generated by Lblockbased algorithm.
References
 D. F. Wong, H. W. Leong, C. L. Liu. Simulated Annealing for VLSI Design. Kluwer Academic Publishers. 1988. p. 202.
 P. Y. Wang. Two Algorithms for Constrained TwoDimensional Cutting Stock Problems. // Operation Research, 1983, Vol. 31, No. 3, pp. 573586.
 J. F. Oliveira, J. S. Ferreira. An Improved Version of Wang’s Algorithm for TwoDimensional Cutting Problems. European Journal of Operational Research. 1990, vol. 44. pp. 256266.
 N. N. Metelski, V. S. Krikun. Method of Hierarchical Placement of Isothetic Blocks. Minsk, Institute of Mathematics of Belarussian Academy of Science. 1990.
 N. N. Metelski, V. N. Martinchik. Basic Classes of Generalized Convexity for Isothetic Regions. Minsk, Institute of Mathematics of Belarussian Academy of Science. 1991.
 Recursive Cutting of Rectangular Partitions for VLSI Floorplanning. Simulation Standard. 1998, vol. 9, n. 9, p. 6.