June 18, 2006

Dancing Links in Sudoku - source: Wikipedia

In computer science, Dancing Links, commonly known as DLX, is the technique suggested by Donald Knuth to efficiently implement his Algorithm X. Algorithm X is a recursive, nondeterministic, depth-first, brute-force algorithm that finds all solutions to the exact cover problem. Some of the better known exact cover problems include tiling, N-queens and Sudoku.

The name Dancing Links comes from the way the algorithm works, as iterations of the algorithm cause the links to "dance" with partner links so as to resemble an "exquisitely choreographed dance." Knuth credits Hitotumatu and Noshita with having invented the idea in 1979, but it is his paper which has popularized it.

As the remainder of this article discusses the details of an implementation technique for Algorithm X, the reader is strongly encouraged to read the Algorithm X article first.

The idea of DLX is based on the observation that in a circular doubly-linked list of nodes, then

x.left.right ← x.right;
x.right.left ← x.left;

will remove node x from the list, while

x.left.right ← x;
x.right.left ← x;

will restore x's position in the list. This works regardless of the number of elements in the list, even 1.

Knuth observed that a naive implementation of his Algorithm X would spend an inordinate amount of time searching for 1s. When selecting a column, the entire matrix had to be searched for 1s. When selecting a row, an entire column had to be searched for 1s. After selecting a row, that row and a number of columns had to be searched for 1s. To improve this search time from complexity O(n) to O(1), Knuth implemented a sparse matrix where only 1s are stored.

At all times, each node in the matrix will point to the adjacent nodes to the left and right (1s in the same row), above and below (1s in the same column), and the header for its column (described below). Each row and column in the matrix will consist of a circular doubly-linked list of nodes.

Each column will have a special node known as the "column header," which will be included in the column list, and will form a special row ("control row") consisting of all the columns which still exist in the matrix.

Finally, each column header may optionally track the number of nodes in its column, so that locating a column with the lowest number of nodes is of complexity O(n) rather than O(n * m) where n is the number of columns and m is the number of rows.

In Algorithm X, rows and columns are regularly eliminated from and restored to the matrix. Eliminations are determined by selecting a column and a row in that column. If a selected column doesn't have any rows, the current matrix is unsolvable and must be backtracked. When an elimination occurs, the selected row's column, other rows 'belonging' to that column, and other columns to which the selected row 'belongs' are all removed. These columns are removed because they have been filled, and these rows are removed because they conflict with the selected row. To perform the elimination, first remove the selected column's header. Next, for each row where the selected column contains a 1, traverse the row and remove it from other columns (this makes those rows inaccessible and is how conflicts are prevented). Finally, remove each column (other than the selected column, it has already been removed) in which the selected row has a 1 (they have been filled by the selected row). This order ensures that any removed node is removed exactly once and in a predictable order, so it can be backtracked appropriately. If the resulting matrix has no columns, then they have all been filled and the selected rows form the solution.

To backtrack, the above process must be reversed using the second algorithm stated above. A requirement of using that algorithm is that backtracking must be done as an exact reversal of eliminations. Knuth's paper gives a clear picture of these relationships and how the node removal and reinsertion works.

It is also possible to solve one-cover problems in which a particular constraint is optional, but can be satisfied no more than once. DLX accommodates these with primary columns which must be filled and secondary columns which are optional. This alters the algorithm's solution test from a matrix having no columns to a matrix having no primary columns, but doesn't require any further changes. Knuth discusses optional constraints as applied to the N-Queens problem. The chessboard diagonals represent optional constraints, as some diagonals may not be occupied. If a diagonal is occupied, it can only be occupied once.
[edit]

Application to Sudoku

Currently (January 2006), Sudoku is enjoying a surge of popularity amongst programmers and game enthusiasts alike. The following considers the popular 3x3-3x3 (3x3 grid of 3x3 boxes) Sudoku variant. To convert any other variant into an exact cover problem, all that is required is an adjustment of the constraint-column mapping.

The DLX matrix for Sudoku represents every possible move and the constraints those moves must satisfy in any valid solution.

Each row in a DLX matrix represents a possible move, i.e., placing a particular digit in a particular cell. Thus the DLX rows can be labeled , where d, r and c each range from 1 to 9. For example, there is a DLX row <2,5,7> for placing the digit 2 in the cell in row 5 and column 7. As there are 9 possible digits and 9x9=81 possible cells, there are 9x81=729 DLX rows.

Note: Although it is thus possible to place more than one digit in the same cell, constraints will assure that any valid solution has exactly one digit per cell.

Each column in a DLX matrix represents a constraint any valid solution must satisfy. Sudoku has four kinds of constraints:

1. Each cell in a row and column must contain exactly one digit: .
2. Each digit must occur in each row exactly once: .
3. Each digit must occur in each column exactly once: .
4. Each digit must occur in each box exactly once: .

For example, the Type 1 (row-column) constraint <5,7> is that the cell in row 5 and column 7 contains exactly one digit. The column in the DLX matrix for this constraint has a 1 in the DLX rows <1,5,7>, <2,5,7>, ..., <9,5,7>, which correspond to placing the digits 1 through 9 in the cell in row 5 and column 7. (The DLX column has a 0 in every other DLX row.) Since a valid solution will select only one of these 9 DLX rows, the cell in row 5 and column 7 will contain exactly one digit. As there are 9 possible rows and 9 possible columns, there are 9x9=81 Type 1 columns in the DLX matrix.

For example, the Type 2 (digit-row) constraint <2,5> is that the digit 2 must occur in row 5 exactly once. The column in the DLX matrix for this constraint has a 1 in the DLX rows <2,5,1>, <2,5,2>, ..., <2,5,9>, which correspond to placing the digit 2 in the cell in row 5 and columns 1 through 9. (The DLX column has a 0 in every other DLX row.) Since a valid solution will select only one of these 9 DLX rows, the digit 2 will occur in row 5 exactly once. As there are 9 possible digits and 9 possible rows, there are 9x9=81 Type 2 columns in the DLX matrix.

Similarly, there are 9x9=81 Type 3 (digit-column) constraints, one for each of 9 digits and 9 columns.

Similarly, there are 9x9=81 Type 4 (digit-box) constraints, one for each of 9 digits and 9 boxes.

Thus there are 81+81+81+81=324 DLX columns.

In summary, the DLX matrix has 729 rows and 324 columns.

Note that each DLX row contains exactly 4 1s (and 324-4=320 0s). For example, the DLX row <2,5,7>, which corresponds to placing the digit 2 in the cell in row 5 and column 7, has a 1 in the DLX columns for row-column constraint <5,7>, digit-row constraint <2,5>, digit-column constraint <2,7> and digit-box constraint <2,6> (and has a 0 in every other DLX column). (The cell in row 5 and column 7 occurs in box 6, when boxes are numbered from 1 to 9 left-to-right and top-to-bottom.)

Note that each DLX column contains exactly 9 1s (and 729-9=720 0s).

Thus the DLX matrix is sparse, and much efficiency is gained by maintaining nodes for only the 1s.

Note that the ordering of the rows and columns is irrelevant, so long as you can convert between the constraints satisfied by a move and the columns representing each constraint.

To solve a specific Sudoku problem, select the DLX rows representing the givens and then solve the DLX matrix. For example, if it is a given that the digit 2 occurs in the cell in row 5 and column 7, then the DLX row <2,5,7> is part of the solution.

No comments: