prune: Correctly count used/duplicate blobs for partially compressed repos

Counting the first occurrence of a duplicate blob as used and counting all other as duplicates, independent of which instance of the blob is kept, is only accurate if all copies of the blob have the same size. This is no longer the case for a repository containing both compressed and uncompressed blobs. Thus for duplicated blobs first count all instances as duplicates and then subtract the actually used instance later on.
2025-12-16 00:42:46 +00:00 · 2022-10-22 19:10:33 +02:00
parent b57d42905c
commit 05651d6d4f
2 changed files with 27 additions and 13 deletions
--- a/changelog/unreleased/issue-3918
+++ b/changelog/unreleased/issue-3918
@@ -0,0 +1,12 @@
+Bugfix: Correct prune statistics for partially compressed repositories
+
+In a partially compressed repository, one data blob can exist both in an
+uncompressed and a compressed version. This caused the prune statistics to
+become inaccurate and for example report a too high value for the unused size:
+
+> unused size after prune: 16777215.991 TiB
+
+This has been fixed.
+
+https://github.com/restic/restic/issues/3918
+https://github.com/restic/restic/pull/3980