PHPRO.ORG

Find Gaps And Islands In Sequence Auto Increment PgSQL MySQL

Find Gaps And Islands In Sequence Auto Increment PgSQL MySQL

The problem of finding gaps in sequential data, or auto-incremented values, or more commonly, Gaps an Islands problem can be resolved with some simple JOINs. There are solutions which use sub-queries, but these tend to be expensive.

To demonstrate simple database is required. In this example, two databases will be used, PgSQL and MySQL to show that the solution works well across both.

PgSQL

CREATE TABLE animals( id SERIAL PRIMARY KEY NOT NULL, name varchar(20) NOT NULL );

MySQL

CREATE TABLE animals( id INT PRIMARY KEY NOT NULL AUTO_INCREMENT name varchar(20) NOT NULL );

With the table created in either PgSQL or MySQL, some data is needed. Lets use a list of Australian Native animals to get started. If you are wondering which ones are dangerous, use the simple rule of wildlife in Australia.

If it is not you, it can kill you.

INSERT INTO animals ( name ) VALUES ('dingo'),('wombat'),('platypus'),('kangaroo'),('kookaburra'),('wallaby'),('echidna'),('koala'),('cassowary'),('kiwi'),('Funnel Web'),('emu'),('bandicoot'),('steve irwin'),('bilby'),('blue tongue lizard'),('tasmanian devil'),('cockatoo'),('quokka'),('frilled neck lizard'),('numbat'),('kowari'),('brolga'),('bogong moth'),('galah'),('yabby'),('wallaroo'),('goanna'),('thylacine'),('cuscus'),('phascogale ');

A simple query to show they are all in place.

SELECT * FROM animals;
id |        name         
----+---------------------
  1 | dingo
  2 | wombat
  3 | platypus
  4 | kangaroo
  5 | kookaburra
  6 | wallaby
  7 | echidna
  8 | koala
  9 | cassowary
 10 | kiwi
 11 | Funnel Web
 12 | emu
 13 | bandicoot
 14 | steve irwin
 15 | bilby
 16 | blue tongue lizard
 17 | tasmanian devil
 18 | cockatoo
 19 | quokka
 20 | frilled neck lizard
 21 | numbat
 22 | kowari
 23 | brolga
 24 | bogong moth
 25 | galah
 26 | yabby
 27 | wallaroo
 28 | goanna
 29 | thylacine
 30 | cuscus
 31 | phascogale 

Now lets create some gaps and islands

DELETE FROM animals WHERE id IN (3,5,7,15,16,17,26,27,28,29);

A quick query again to see what is left.

SELECT * FROM animals;
id |        name         
----+---------------------
  1 | dingo
  2 | wombat
  4 | kangaroo
  6 | wallaby
  8 | koala
  9 | cassowary
 10 | kiwi
 11 | Funnel Web
 12 | emu
 13 | bandicoot
 14 | steve irwin
 18 | cockatoo
 19 | quokka
 20 | frilled neck lizard
 21 | numbat
 22 | kowari
 23 | brolga
 24 | bogong moth
 25 | galah
 30 | cuscus
 31 | phascogale 

Now the job is to find gaps, or islands.

Using a subselect, the start and stop ID's of a gap can be found.

SELECT start, stop FROM (
  select m.id + 1 as start,
    (select min(id) - 1 from animals as x where x.id > m.id) as stop
  from animals as m
    left outer join animals as r on m.id = r.id - 1
  where r.id is null
) as x
WHERE stop IS NOT NULL;

The resulting records would show the starting and ending IDs of a gap.

 start | stop 
-------+------
     3 |    3
     5 |    5
     7 |    7
    15 |   17
    26 |   29

But the use of subqueries, and correlated subqueries is generally expensive, especially if your optimizer creates temp tables. So, the same result can be obtained using JOINs.

SELECT l.id + 1 AS start, min(fr.id) - 1 AS stop
FROM animals as l
    LEFT OUTER JOIN animals AS r ON l.id = r.id - 1
    LEFT OUTER JOIN animals AS fr ON l.id < fr.id
WHERE r.id IS NULL AND fr.id IS NOT NULL
GROUP BY l.id, r.id;

Another option

SELECT
  a.id+1 AS gap,
  MIN(b.id) - 1 AS To
FROM animals AS a, animals AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING a.id < MIN(b.id) - 1;