Byte Jar - Software Lessons Learned the Hard Way
A public service catalog of solutions to annoying software development problems, or
a sporadically updated rant chamber hosted by the code grunts of a tiny software company. Thanks for tuning in.
Gaps occur when rows are removed from the middle of a table with a known, increasing sequence. Finding missing IDs is invaluable during debugging and for in-field forensic investigation.
There are two ways we go about creating surrogate keys: AUTO_INCREMENT and sequence tables. AUTO_INCREMENT is probably the most popular among MySQL users, as it's a quick, transparent way to install the "next" ID into a column. We prefer to use sequence tables -- external tables, usually named <Table>_seq after the PEAR::DB tradition, that contain the "next" ID to store -- as that gives us flexibility in the formula for computing "next".
Occasionally, we'll run into a situation where we'd like to know if there are any gaps in the surrogate key ("ID") sequence. Recently, for example, a client reported a problem in our facilities management application, AERES where a record was mysteriously "deleted." Our review of the on-board audit logs showed deletion didn't occur, so we wanted to double-check the audit log by confirming there weren't any sequence gaps in the IDs for that table.
Not wanting to leave my MySQL shell, I whipped up a pure SQL solution:
SELECT A.ID + 1 AS 'Missing'
FROM theTable AS A
WHERE NOT EXISTS (
SELECT B.ID
FROM theTable AS B
WHERE A.ID + 1 = B.ID
)
GROUP BY A.ID
ORDER BY A.ID;
The logic is simple: find all IDs that do not have a next ID, then show what that next ID would be, in order. With that statement, I confirmed there wasn't any sequence gaps and that the record hadn't been deleted. (The rest of the story: based on a backup, we knew what the ID should have been, so we looked for what record had that ID. As it turns out, the original data had been overwritten owing to an improper initialization on a re-edit.)
Knowing that I'd need this functionality in the future, I set about to turn this into a method that takes the table and column, then gives back a list of missing IDs. The general solution is:
DELIMITER //
DROP PROCEDURE IF EXISTS showSequenceGaps //
CREATE PROCEDURE showSequenceGaps(IN tbl VARCHAR(255), IN col VARCHAR(255))
BEGIN
SET @stmt = CONCAT(
"SELECT A.", col, "+1 AS 'Missing' ",
"FROM ", tbl, " AS A ",
"WHERE NOT EXISTS (SELECT B.", col, " FROM ", tbl, " AS B WHERE A.", col, "+1=B.", col, ") ",
"GROUP BY A.", col, " ",
"ORDER BY A.", col
);
PREPARE stmt1 FROM @stmt;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;
END //
DELIMITER ;
You can call this procedure by simply:
CALL showSequenceGaps("theTable", "ID");
Now, that stored procedure creates the needed SQL statement dynamically, then runs it and gives back the needed list. Future extension to this procedure would be to check the meta-data to ensure that the table and column exist, and to ensure that the column's type was integer-compatible.