-
Notifications
You must be signed in to change notification settings - Fork 201
Description
After a few years of active updates, Repology may overflow package.id
32-bit auto-increment field because package modifications imply removing and adding a new package in most cases. When this happens, updates are no longer possible, and the only way out of this state is manually resetting id sequence to zero. However, since where still are packages which were not touched for a long time with small ids, it will fail again as soon as it hits such a package. So some time before we run out of ids, we should forcibly update all packages with small ids to make space at the start of id sequence.
The complete algorithm is roughly as follows, and it should be implemented in code so no manual action is required.
- If packages_id_seq is above 3/4 of max, gradually force update all packages with ids < 1/2 of max
- If packages_id_seq is close to max, reset it to 0
- If packages_id_seq is between 1/4 and 3/4 of max, gradually force update all packages with ids > 1/2 of max
The first step prepares to sequence wraparound by freeing up the lower half of ID space.
The third step frees up the higher half of ID space, allowing id to grow up to maximum again.
"Close to max" here can be somewhere near 2^31-1 - num_packages
, to be sure we don't overflow id with any update. 2^31-1-10000000
should work for foreseeable future given we have 4.6M packages currently.
"Gradually force update all packages" may be performed with the following query:
update project_hashes set hash = 0 where effname in (
select distinct effname from (
select effname from packages where id < 1073741823 order by id
) limit 5000
)
This is rather cheap as it uses index scan on packages + hashagg, and free when there's nothing left to move. Since it's limited, it won't cause noticeable slowdown on updates, yet it'll be able to move everything before the next threshold is reached.
Note also that we only start moving packages when we are in the middle of current range, that gives most active packages time to migrate by natural causes, e.g. an update, and save the overhead of moving them forcibly.