Compare commits

..

1 Commits

Author SHA1 Message Date
Alexander Neumann
7e72d638df Add patched chunker
Referenced in https://forum.restic.net/t/restic-slice-bounds-out-of-range/2617/20
2020-04-24 20:56:45 +02:00
355 changed files with 5544 additions and 11791 deletions

View File

@@ -39,8 +39,8 @@ Please describe the feature you'd like us to add here.
-->
What are you trying to do? What problem would this solve?
---------------------------------------------------------
What are you trying to do?
--------------------------
<!--
This section should contain a brief description what you're trying to do, which

View File

@@ -10,11 +10,11 @@ your time and add more commits. If you're done and ready for review, please
check the last box.
-->
What does this PR change? What problem does it solve?
-----------------------------------------------------
What is the purpose of this change? What does it change?
--------------------------------------------------------
<!--
Describe the changes and their purpose here, as detailed as needed.
Describe the changes here, as detailed as needed.
-->
Was the change discussed in an issue or in the forum before?
@@ -23,8 +23,8 @@ Was the change discussed in an issue or in the forum before?
<!--
Link issues and relevant forum posts here.
If this PR resolves an issue on GitHub, use "Closes #1234" so that the issue
is closed automatically when this PR is merged.
If this PR resolves an issue on GitHub, use "closes #1234" so that the issue is
closed automatically when this PR is merged.
-->
Checklist

1
.gitignore vendored
View File

@@ -1,3 +1,2 @@
/restic
/.vagrant
/.vscode

View File

@@ -3,6 +3,22 @@ sudo: false
matrix:
include:
- os: linux
go: "1.11.x"
env: RESTIC_TEST_FUSE=0 RESTIC_TEST_CLOUD_BACKENDS=0
cache:
directories:
- $HOME/.cache/go-build
- $HOME/gopath/pkg/mod
- os: linux
go: "1.12.x"
env: RESTIC_TEST_FUSE=0 RESTIC_TEST_CLOUD_BACKENDS=0
cache:
directories:
- $HOME/.cache/go-build
- $HOME/gopath/pkg/mod
- os: linux
go: "1.13.x"
env: RESTIC_TEST_FUSE=0 RESTIC_TEST_CLOUD_BACKENDS=0
@@ -11,17 +27,9 @@ matrix:
- $HOME/.cache/go-build
- $HOME/gopath/pkg/mod
- os: linux
go: "1.14.x"
env: RESTIC_TEST_FUSE=0 RESTIC_TEST_CLOUD_BACKENDS=0
cache:
directories:
- $HOME/.cache/go-build
- $HOME/gopath/pkg/mod
# only run fuse and cloud backends tests on Travis for the latest Go on Linux
- os: linux
go: "1.15.x"
go: "1.14.x"
sudo: true
cache:
directories:
@@ -29,7 +37,7 @@ matrix:
- $HOME/gopath/pkg/mod
- os: osx
go: "1.15.x"
go: "1.14.x"
env: RESTIC_TEST_FUSE=0 RESTIC_TEST_CLOUD_BACKENDS=0
cache:
directories:

View File

@@ -1,642 +1,3 @@
Changelog for restic 0.11.0 (2020-11-05)
=======================================
The following sections list the changes in restic 0.11.0 relevant to
restic users. The changes are ordered by importance.
Summary
-------
* Fix #1212: Restore timestamps and permissions on intermediate directories
* Fix #1756: Mark repository files as read-only when using the local backend
* Fix #2241: Hide password in REST backend repository URLs
* Fix #2319: Correctly dump directories into tar files
* Fix #2491: Don't require `self-update --output` placeholder file
* Fix #2834: Fix rare cases of backup command hanging forever
* Fix #2938: Fix manpage formatting
* Fix #2942: Make --exclude-larger-than handle disappearing files
* Fix #2951: Restic generate, help and self-update no longer check passwords
* Fix #2979: Make snapshots --json output [] instead of null when no snapshots
* Enh #2969: Optimize check for unchanged files during backup
* Enh #340: Add support for Volume Shadow Copy Service (VSS) on Windows
* Enh #2849: Authenticate to Google Cloud Storage with access token
* Enh #1458: New option --repository-file
* Enh #2978: Warn if parent snapshot cannot be loaded during backup
Details
-------
* Bugfix #1212: Restore timestamps and permissions on intermediate directories
When using the `--include` option of the restore command, restic restored timestamps and
permissions only on directories selected by the include pattern. Intermediate directories,
which are necessary to restore files located in sub- directories, were created with default
permissions. We've fixed the restore command to restore timestamps and permissions for these
directories as well.
https://github.com/restic/restic/issues/1212
https://github.com/restic/restic/issues/1402
https://github.com/restic/restic/pull/2906
* Bugfix #1756: Mark repository files as read-only when using the local backend
Files stored in a local repository were marked as writeable on the filesystem for non-Windows
systems, which did not prevent accidental file modifications outside of restic. In addition,
the local backend did not work with certain filesystems and network mounts which do not permit
modifications of file permissions.
Restic now marks files stored in a local repository as read-only on the filesystem on
non-Windows systems. The error handling is improved to support more filesystems.
https://github.com/restic/restic/issues/1756
https://github.com/restic/restic/issues/2157
https://github.com/restic/restic/pull/2989
* Bugfix #2241: Hide password in REST backend repository URLs
When using a password in the REST backend repository URL, the password could in some cases be
included in the output from restic, e.g. when initializing a repo or during an error.
The password is now replaced with "***" where applicable.
https://github.com/restic/restic/issues/2241
https://github.com/restic/restic/pull/2658
* Bugfix #2319: Correctly dump directories into tar files
The dump command previously wrote directories in a tar file in a way which can cause
compatibility problems. This caused, for example, 7zip on Windows to not open tar files
containing directories. In addition it was not possible to dump directories with extended
attributes. These compatibility problems are now corrected.
In addition, a tar file now includes the name of the owner and group of a file.
https://github.com/restic/restic/issues/2319
https://github.com/restic/restic/pull/3039
* Bugfix #2491: Don't require `self-update --output` placeholder file
`restic self-update --output /path/to/new-restic` used to require that new-restic was an
existing file, to be overwritten. Now it's possible to download an updated restic binary to a
new path, without first having to create a placeholder file.
https://github.com/restic/restic/issues/2491
https://github.com/restic/restic/pull/2937
* Bugfix #2834: Fix rare cases of backup command hanging forever
We've fixed an issue with the backup progress reporting which could cause restic to hang
forever right before finishing a backup.
https://github.com/restic/restic/issues/2834
https://github.com/restic/restic/pull/2963
* Bugfix #2938: Fix manpage formatting
The manpage formatting in restic v0.10.0 was garbled, which is fixed now.
https://github.com/restic/restic/issues/2938
https://github.com/restic/restic/pull/2977
* Bugfix #2942: Make --exclude-larger-than handle disappearing files
There was a small bug in the backup command's --exclude-larger-than option where files that
disappeared between scanning and actually backing them up to the repository caused a panic.
This is now fixed.
https://github.com/restic/restic/issues/2942
* Bugfix #2951: Restic generate, help and self-update no longer check passwords
The commands `restic cache`, `generate`, `help` and `self-update` don't need passwords, but
they previously did run the RESTIC_PASSWORD_COMMAND (if set in the environment), prompting
users to authenticate for no reason. They now skip running the password command.
https://github.com/restic/restic/issues/2951
https://github.com/restic/restic/pull/2987
* Bugfix #2979: Make snapshots --json output [] instead of null when no snapshots
Restic previously output `null` instead of `[]` for the `--json snapshots` command, when
there were no snapshots in the repository. This caused some minor problems when parsing the
output, but is now fixed such that `[]` is output when the list of snapshots is empty.
https://github.com/restic/restic/issues/2979
https://github.com/restic/restic/pull/2984
* Enhancement #2969: Optimize check for unchanged files during backup
During a backup restic skips processing files which have not changed since the last backup run.
Previously this required opening each file once which can be slow on network filesystems. The
backup command now checks for file changes before opening a file. This considerably reduces
the time to create a backup on network filesystems.
https://github.com/restic/restic/issues/2969
https://github.com/restic/restic/pull/2970
* Enhancement #340: Add support for Volume Shadow Copy Service (VSS) on Windows
Volume Shadow Copy Service allows read access to files that are locked by another process using
an exclusive lock through a filesystem snapshot. Restic was unable to backup those files
before. This update enables backing up these files.
This needs to be enabled explicitely using the --use-fs-snapshot option of the backup
command.
https://github.com/restic/restic/issues/340
https://github.com/restic/restic/pull/2274
* Enhancement #2849: Authenticate to Google Cloud Storage with access token
When using the GCS backend, it is now possible to authenticate with OAuth2 access tokens
instead of a credentials file by setting the GOOGLE_ACCESS_TOKEN environment variable.
https://github.com/restic/restic/pull/2849
* Enhancement #1458: New option --repository-file
We've added a new command-line option --repository-file as an alternative to -r. This allows
to read the repository URL from a file in order to prevent certain types of information leaks,
especially for URLs containing credentials.
https://github.com/restic/restic/issues/1458
https://github.com/restic/restic/issues/2900
https://github.com/restic/restic/pull/2910
* Enhancement #2978: Warn if parent snapshot cannot be loaded during backup
During a backup restic uses the parent snapshot to check whether a file was changed and has to be
backed up again. For this check the backup has to read the directories contained in the old
snapshot. If a tree blob cannot be loaded, restic now warns about this problem with the backup
repository.
https://github.com/restic/restic/pull/2978
Changelog for restic 0.10.0 (2020-09-19)
=======================================
The following sections list the changes in restic 0.10.0 relevant to
restic users. The changes are ordered by importance.
Summary
-------
* Fix #1863: Report correct number of directories processed by backup
* Fix #2254: Fix tar issues when dumping `/`
* Fix #2281: Handle format verbs like '%' properly in `find` output
* Fix #2298: Do not hang when run as a background job
* Fix #2389: Fix mangled json output of backup command
* Fix #2390: Refresh lock timestamp
* Fix #2429: Backup --json reports total_bytes_processed as 0
* Fix #2469: Fix incorrect bytes stats in `diff` command
* Fix #2518: Do not crash with Synology NAS sftp server
* Fix #2531: Fix incorrect size calculation in `stats --mode restore-size`
* Fix #2537: Fix incorrect file counts in `stats --mode restore-size`
* Fix #2592: SFTP backend supports IPv6 addresses
* Fix #2607: Honor RESTIC_CACHE_DIR environment variable on Mac and Windows
* Fix #2668: Don't abort the stats command when data blobs are missing
* Fix #2674: Add stricter prune error checks
* Fix #2899: Fix possible crash in the progress bar of check --read-data
* Chg #2482: Remove vendored dependencies
* Chg #2546: Return exit code 3 when failing to backup all source data
* Chg #2600: Update dependencies, require Go >= 1.13
* Chg #1597: Honor the --no-lock flag in the mount command
* Enh #1570: Support specifying multiple host flags for various commands
* Enh #1680: Optimize `restic mount`
* Enh #2072: Display snapshot date when using `restic find`
* Enh #2175: Allow specifying user and host when creating keys
* Enh #2277: Add support for ppc64le
* Enh #2395: Ignore sync errors when operation not supported by local filesystem
* Enh #2427: Add flag `--iexclude-file` to backup command
* Enh #2569: Support excluding files by their size
* Enh #2571: Self-heal missing file parts during backup of unchanged files
* Enh #2858: Support filtering snapshots by tag and path in the stats command
* Enh #323: Add command for copying snapshots between repositories
* Enh #551: Use optimized library for hash calculation of file chunks
* Enh #2195: Simplify and improve restore performance
* Enh #2328: Improve speed of check command
* Enh #2423: Support user@domain parsing as user
* Enh #2576: Improve the chunking algorithm
* Enh #2598: Improve speed of diff command
* Enh #2599: Slightly reduce memory usage of prune and stats commands
* Enh #2733: S3 backend: Add support for WebIdentityTokenFile
* Enh #2773: Optimize handling of new index entries
* Enh #2781: Reduce memory consumption of in-memory index
* Enh #2786: Optimize `list blobs` command
* Enh #2790: Optimized file access in restic mount
* Enh #2840: Speed-up file deletion in forget, prune and rebuild-index
Details
-------
* Bugfix #1863: Report correct number of directories processed by backup
The directory statistics calculation was fixed to report the actual number of processed
directories instead of always zero.
https://github.com/restic/restic/issues/1863
* Bugfix #2254: Fix tar issues when dumping `/`
We've fixed an issue with dumping either `/` or files on the first sublevel e.g. `/foo` to tar.
This also fixes tar dumping issues on Windows where this issue could also happen.
https://github.com/restic/restic/issues/2254
https://github.com/restic/restic/issues/2357
https://github.com/restic/restic/pull/2255
* Bugfix #2281: Handle format verbs like '%' properly in `find` output
The JSON or "normal" output of the `find` command can now deal with file names that contain
substrings which the Golang `fmt` package considers "format verbs" like `%s`.
https://github.com/restic/restic/issues/2281
* Bugfix #2298: Do not hang when run as a background job
Restic did hang on exit while restoring the terminal configuration when it was started as a
background job, for example using `restic ... &`. This has been fixed by only restoring the
terminal configuration when restic is interrupted while reading a password from the
terminal.
https://github.com/restic/restic/issues/2298
* Bugfix #2389: Fix mangled json output of backup command
We've fixed a race condition in the json output of the backup command that could cause multiple
lines to get mixed up. We've also ensured that the backup summary is printed last.
https://github.com/restic/restic/issues/2389
https://github.com/restic/restic/pull/2545
* Bugfix #2390: Refresh lock timestamp
Long-running operations did not refresh lock timestamp, resulting in locks becoming stale.
This is now fixed.
https://github.com/restic/restic/issues/2390
* Bugfix #2429: Backup --json reports total_bytes_processed as 0
We've fixed the json output of total_bytes_processed. The non-json output was already fixed
with pull request #2138 but left the json output untouched.
https://github.com/restic/restic/issues/2429
* Bugfix #2469: Fix incorrect bytes stats in `diff` command
In some cases, the wrong number of bytes (e.g. 16777215.998 TiB) were reported by the `diff`
command. This is now fixed.
https://github.com/restic/restic/issues/2469
* Bugfix #2518: Do not crash with Synology NAS sftp server
It was found that when restic is used to store data on an sftp server on a Synology NAS with a
relative path (one which does not start with a slash), it may go into an endless loop trying to
create directories on the server. We've fixed this bug by using a function in the sftp library
instead of our own implementation.
The bug was discovered because the Synology sftp server behaves erratic with non-absolute
path (e.g. `home/restic-repo`). This can be resolved by just using an absolute path instead
(`/home/restic-repo`). We've also added a paragraph in the FAQ.
https://github.com/restic/restic/issues/2518
https://github.com/restic/restic/issues/2363
https://github.com/restic/restic/pull/2530
* Bugfix #2531: Fix incorrect size calculation in `stats --mode restore-size`
The restore-size mode of stats was counting hard-linked files as if they were independent.
https://github.com/restic/restic/issues/2531
* Bugfix #2537: Fix incorrect file counts in `stats --mode restore-size`
The restore-size mode of stats was failing to count empty directories and some files with hard
links.
https://github.com/restic/restic/issues/2537
* Bugfix #2592: SFTP backend supports IPv6 addresses
The SFTP backend now supports IPv6 addresses natively, without relying on aliases in the
external SSH configuration.
https://github.com/restic/restic/pull/2592
* Bugfix #2607: Honor RESTIC_CACHE_DIR environment variable on Mac and Windows
On Mac and Windows, the RESTIC_CACHE_DIR environment variable was ignored. This variable can
now be used on all platforms to set the directory where restic stores caches.
https://github.com/restic/restic/pull/2607
* Bugfix #2668: Don't abort the stats command when data blobs are missing
Runing the stats command in the blobs-per-file mode on a repository with missing data blobs
previously resulted in a crash.
https://github.com/restic/restic/pull/2668
* Bugfix #2674: Add stricter prune error checks
Additional checks were added to the prune command in order to improve resiliency to backend,
hardware and/or networking issues. The checks now detect a few more cases where such outside
factors could potentially cause data loss.
https://github.com/restic/restic/pull/2674
* Bugfix #2899: Fix possible crash in the progress bar of check --read-data
We've fixed a possible crash while displaying the progress bar for the check --read-data
command. The crash occurred when the length of the progress bar status exceeded the terminal
width, which only happened for very narrow terminal windows.
https://github.com/restic/restic/pull/2899
https://forum.restic.net/t/restic-rclone-pcloud-connection-issues/2963/15
* Change #2482: Remove vendored dependencies
We've removed the vendored dependencies (in the subdir `vendor/`). When building restic, the
Go compiler automatically fetches the dependencies. It will also cryptographically verify
that the correct code has been fetched by using the hashes in `go.sum` (see the link to the
documentation below).
https://github.com/restic/restic/issues/2482
https://golang.org/cmd/go/#hdr-Module_downloading_and_verification
* Change #2546: Return exit code 3 when failing to backup all source data
The backup command used to return a zero exit code as long as a snapshot could be created
successfully, even if some of the source files could not be read (in which case the snapshot
would contain the rest of the files).
This made it hard for automation/scripts to detect failures/incomplete backups by looking at
the exit code. Restic now returns the following exit codes for the backup command:
- 0 when the command was successful - 1 when there was a fatal error (no snapshot created) - 3 when
some source data could not be read (incomplete snapshot created)
https://github.com/restic/restic/issues/956
https://github.com/restic/restic/issues/2064
https://github.com/restic/restic/issues/2526
https://github.com/restic/restic/issues/2364
https://github.com/restic/restic/pull/2546
* Change #2600: Update dependencies, require Go >= 1.13
Restic now requires Go to be at least 1.13. This allows simplifications in the build process and
removing workarounds.
This is also probably the last version of restic still supporting mounting repositories via
fuse on macOS. The library we're using for fuse does not support macOS any more and osxfuse is not
open source any more.
https://github.com/bazil/fuse/issues/224
https://github.com/osxfuse/osxfuse/issues/590
https://github.com/restic/restic/pull/2600
https://github.com/restic/restic/pull/2852
https://github.com/restic/restic/pull/2927
* Change #1597: Honor the --no-lock flag in the mount command
The mount command now does not lock the repository if given the --no-lock flag. This allows to
mount repositories which are archived on a read only backend/filesystem.
https://github.com/restic/restic/issues/1597
https://github.com/restic/restic/pull/2821
* Enhancement #1570: Support specifying multiple host flags for various commands
Previously commands didn't take more than one `--host` or `-H` argument into account, which
could be limiting with e.g. the `forget` command.
The `dump`, `find`, `forget`, `ls`, `mount`, `restore`, `snapshots`, `stats` and `tag`
commands will now take into account multiple `--host` and `-H` flags.
https://github.com/restic/restic/issues/1570
* Enhancement #1680: Optimize `restic mount`
We've optimized the FUSE implementation used within restic. `restic mount` is now more
responsive and uses less memory.
https://github.com/restic/restic/issues/1680
https://github.com/restic/restic/pull/2587
https://github.com/restic/restic/pull/2787
* Enhancement #2072: Display snapshot date when using `restic find`
Added the respective snapshot date to the output of `restic find`.
https://github.com/restic/restic/issues/2072
* Enhancement #2175: Allow specifying user and host when creating keys
When adding a new key to the repository, the username and hostname for the new key can be
specified on the command line. This allows overriding the defaults, for example if you would
prefer to use the FQDN to identify the host or if you want to add keys for several different hosts
without having to run the key add command on those hosts.
https://github.com/restic/restic/issues/2175
* Enhancement #2277: Add support for ppc64le
Adds support for ppc64le, the processor architecture from IBM.
https://github.com/restic/restic/issues/2277
* Enhancement #2395: Ignore sync errors when operation not supported by local filesystem
The local backend has been modified to work with filesystems which doesn't support the `sync`
operation. This operation is normally used by restic to ensure that data files are fully
written to disk before continuing.
For these limited filesystems, saving a file in the backend would previously fail with an
"operation not supported" error. This error is now ignored, which means that e.g. an SMB mount
on macOS can now be used as storage location for a repository.
https://github.com/restic/restic/issues/2395
https://forum.restic.net/t/sync-errors-on-mac-over-smb/1859
* Enhancement #2427: Add flag `--iexclude-file` to backup command
The backup command now supports the flag `--iexclude-file` which is a case-insensitive
version of `--exclude-file`.
https://github.com/restic/restic/issues/2427
https://github.com/restic/restic/pull/2898
* Enhancement #2569: Support excluding files by their size
The `backup` command now supports the `--exclude-larger-than` option to exclude files which
are larger than the specified maximum size. This can for example be useful to exclude
unimportant files with a large file size.
https://github.com/restic/restic/issues/2569
https://github.com/restic/restic/pull/2914
* Enhancement #2571: Self-heal missing file parts during backup of unchanged files
We've improved the resilience of restic to certain types of repository corruption.
For files that are unchanged since the parent snapshot, the backup command now verifies that
all parts of the files still exist in the repository. Parts that are missing, e.g. from a damaged
repository, are backed up again. This verification was already run for files that were
modified since the parent snapshot, but is now also done for unchanged files.
Note that restic will not backup file parts that are referenced in the index but where the actual
data is not present on disk, as this situation can only be detected by restic check. Please
ensure that you run `restic check` regularly.
https://github.com/restic/restic/issues/2571
https://github.com/restic/restic/pull/2827
* Enhancement #2858: Support filtering snapshots by tag and path in the stats command
We've added filtering snapshots by `--tag tagList` and by `--path path` to the `stats`
command. This includes filtering of only 'latest' snapshots or all snapshots in a repository.
https://github.com/restic/restic/issues/2858
https://github.com/restic/restic/pull/2859
https://forum.restic.net/t/stats-for-a-host-and-filtered-snapshots/3020
* Enhancement #323: Add command for copying snapshots between repositories
We've added a copy command, allowing you to copy snapshots from one repository to another.
Note that this process will have to read (download) and write (upload) the entire snapshot(s)
due to the different encryption keys used on the source and destination repository. Also, the
transferred files are not re-chunked, which may break deduplication between files already
stored in the destination repo and files copied there using this command.
To fully support deduplication between repositories when the copy command is used, the init
command now supports the `--copy-chunker-params` option, which initializes the new
repository with identical parameters for splitting files into chunks as an already existing
repository. This allows copied snapshots to be equally deduplicated in both repositories.
https://github.com/restic/restic/issues/323
https://github.com/restic/restic/pull/2606
https://github.com/restic/restic/pull/2928
* Enhancement #551: Use optimized library for hash calculation of file chunks
We've switched the library used to calculate the hashes of file chunks, which are used for
deduplication, to the optimized Minio SHA-256 implementation.
Depending on the CPU it improves the hashing throughput by 10-30%. Modern x86 CPUs with the SHA
Extension should be about two to three times faster.
https://github.com/restic/restic/issues/551
https://github.com/restic/restic/pull/2709
* Enhancement #2195: Simplify and improve restore performance
Significantly improves restore performance of large files (i.e. 50M+):
https://github.com/restic/restic/issues/2074
https://forum.restic.net/t/restore-using-rclone-gdrive-backend-is-slow/1112/8
https://forum.restic.net/t/degraded-restore-performance-s3-backend/1400
Fixes "not enough cache capacity" error during restore:
https://github.com/restic/restic/issues/2244
NOTE: This new implementation does not guarantee order in which blobs are written to the target
files and, for example, the last blob of a file can be written to the file before any of the
preceeding file blobs. It is therefore possible to have gaps in the data written to the target
files if restore fails or interrupted by the user.
The implementation will try to preallocate space for the restored files on the filesystem to
prevent file fragmentation. This ensures good read performance for large files, like for
example VM images. If preallocating space is not supported by the filesystem, then this step is
silently skipped.
https://github.com/restic/restic/pull/2195
https://github.com/restic/restic/pull/2893
* Enhancement #2328: Improve speed of check command
We've improved the check command to traverse trees only once independent of whether they are
contained in multiple snapshots. The check command is now much faster for repositories with a
large number of snapshots.
https://github.com/restic/restic/issues/2284
https://github.com/restic/restic/pull/2328
* Enhancement #2423: Support user@domain parsing as user
Added the ability for user@domain-like users to be authenticated over SFTP servers.
https://github.com/restic/restic/pull/2423
* Enhancement #2576: Improve the chunking algorithm
We've updated the chunker library responsible for splitting files into smaller blocks. It
should improve the chunking throughput by 5-15% depending on the CPU.
https://github.com/restic/restic/issues/2820
https://github.com/restic/restic/pull/2576
https://github.com/restic/restic/pull/2845
* Enhancement #2598: Improve speed of diff command
We've improved the performance of the diff command when comparing snapshots with similar
content. It should run up to twice as fast as before.
https://github.com/restic/restic/pull/2598
* Enhancement #2599: Slightly reduce memory usage of prune and stats commands
The prune and the stats command kept directory identifiers in memory twice while searching for
used blobs.
https://github.com/restic/restic/pull/2599
* Enhancement #2733: S3 backend: Add support for WebIdentityTokenFile
We've added support for EKS IAM roles for service accounts feature to the S3 backend.
https://github.com/restic/restic/issues/2703
https://github.com/restic/restic/pull/2733
* Enhancement #2773: Optimize handling of new index entries
Restic now uses less memory for backups which add a lot of data, e.g. large initial backups. In
addition, we've improved the stability in some edge cases.
https://github.com/restic/restic/pull/2773
* Enhancement #2781: Reduce memory consumption of in-memory index
We've improved how the index is stored in memory. This change can reduce memory usage for large
repositories by up to 50% (depending on the operation).
https://github.com/restic/restic/pull/2781
https://github.com/restic/restic/pull/2812
* Enhancement #2786: Optimize `list blobs` command
We've changed the implementation of `list blobs` which should be now a bit faster and consume
almost no memory even for large repositories.
https://github.com/restic/restic/pull/2786
* Enhancement #2790: Optimized file access in restic mount
Reading large (> 100GiB) files from restic mountpoints is now faster, and the speedup is
greater for larger files.
https://github.com/restic/restic/pull/2790
* Enhancement #2840: Speed-up file deletion in forget, prune and rebuild-index
We've sped up the file deletion for the commands forget, prune and rebuild-index, especially
for remote repositories. Deletion was sequential before and is now run in parallel.
https://github.com/restic/restic/pull/2840
Changelog for restic 0.9.6 (2019-11-22)
=======================================
@@ -2000,10 +1361,10 @@ Details
Exploiting the vulnerability requires a Linux/Unix system which saves backups via restic and
a Windows systems which restores files from the repo. In addition, the attackers need to be able
to create files with arbitrary names which are then saved to the restic repo. For example, by
creating a file named "..\test.txt" (which is a perfectly legal filename on Linux) and
restoring a snapshot containing this file on Windows, it would be written to the parent of the
target directory.
to create create files with arbitrary names which are then saved to the restic repo. For
example, by creating a file named "..\test.txt" (which is a perfectly legal filename on Linux)
and restoring a snapshot containing this file on Windows, it would be written to the parent of
the target directory.
We'd like to thank Tyler Spivey for reporting this responsibly!

View File

@@ -16,8 +16,7 @@ help also.
The restic project uses the GitHub infrastructure (see the
[project page](https://github.com/restic/restic)) for all related discussions
as well as the [forum](https://forum.restic.net/) and the `#restic` channel
on [irc.freenode.net](https://kiwiirc.com/nextclient/irc.freenode.net/restic).
as well as the `#restic` channel on `irc.freenode.net`.
If you want to find an area that currently needs improving have a look at the
open issues listed at the
@@ -26,10 +25,7 @@ for discussing enhancement to the restic tools.
If you are unsure what to do, please have a look at the issues, especially
those tagged
[minor complexity](https://github.com/restic/restic/labels/help%3A%20minor%20complexity)
or [good first issue](https://github.com/restic/restic/labels/help%3A%20good%20first%20issue).
If you are already a bit experienced with the restic internals, take a look
at the issues tagged as [help wanted](https://github.com/restic/restic/labels/help%3A%20wanted).
[minor complexity](https://github.com/restic/restic/labels/minor%20complexity).
Reporting Bugs
@@ -64,11 +60,16 @@ uploading it somewhere or post only the parts that are really relevant.
Development Environment
=======================
The repository contains the code written for restic in the directories
`cmd/` and `internal/`.
The repository contains several sets of directories with code: `cmd/` and
`internal/` contain the code written for restic, whereas `vendor/` contains
copies of libraries restic depends on. The libraries are managed with the
command `go mod vendor`.
Restic requires Go version 1.13 or later for compiling. Clone the repo (without
having `$GOPATH` set) and `cd` into the directory:
Go >= 1.11
----------
For Go version 1.11 or later, you should clone the repo (without having
`$GOPATH` set) and `cd` into the directory:
$ unset GOPATH
$ git clone https://github.com/restic/restic
@@ -78,12 +79,40 @@ Then use the `go` tool to build restic:
$ go build ./cmd/restic
$ ./restic version
restic 0.10.0-dev (compiled manually) compiled with go1.15.2 on linux/amd64
restic 0.9.2-dev (compiled manually) compiled with go1.11 on linux/amd64
You can run all tests with the following command:
$ go test ./...
Go < 1.11
---------
In order to compile restic with Go before 1.11, it needs to be checked out at
the right path within a `GOPATH`. The concept of a `GOPATH` is explained in
["How to write Go code"](https://golang.org/doc/code.html).
If you do not have a directory with Go code yet, executing the following
instructions in your shell will create one for you and check out the restic
repo:
$ export GOPATH="$HOME/go"
$ mkdir -p "$GOPATH/src/github.com/restic"
$ cd "$GOPATH/src/github.com/restic"
$ git clone https://github.com/restic/restic
$ cd restic
You can then build restic as follows:
$ go build ./cmd/restic
$ ./restic version
restic compiled manually
compiled with go1.8.3 on linux/amd64
The following commands can be used to run all the tests:
$ go test ./...
Providing Patches
=================
@@ -96,14 +125,15 @@ down to the following steps:
GitHub. For a new feature, please add an issue before starting to work on
it, so that duplicate work is prevented.
1. Next, fork our project on GitHub if you haven't done so already.
1. First we would kindly ask you to fork our project on GitHub if you haven't
done so already.
2. Clone your fork of the repository locally and **create a new branch** for
your changes. If you are working on the code itself, please set up the
development environment as described in the previous section.
2. Clone the repository locally and create a new branch. If you are working on
the code itself, please set up the development environment as described in
the previous section.
3. Commit your changes to the new branch as fine grained as possible, as
smaller patches, for individual changes, are easier to discuss and merge.
3. Then commit your changes as fine grained as possible, as smaller patches,
that handle one and only one issue are easier to discuss and merge.
4. Push the new branch with your changes to your fork of the repository.
@@ -116,19 +146,20 @@ down to the following steps:
existing commit, use common sense to decide which is better), they will be
automatically added to the pull request.
7. If your pull request changes anything that users should be aware of
(a bugfix, a new feature, ...) please add an entry as a new file in
`changelog/unreleased` including the issue number in the filename (e.g.
`issue-8756`). Use the template in `changelog/TEMPLATE` for the content.
It will be used in the announcement of the next stable release. While
writing, ask yourself: If I were the user, what would I need to be aware
of with this change?
7. If your pull request changes anything that users should be aware
of (a bugfix, a new feature, ...) please add an entry as a new
file in `changelog/unreleased` including the issue number in the
filename (e.g. `issue-8756`). Use the template in
`changelog/TEMPLATE` for the content. It will be used in the
announcement of the next stable release. While writing, ask
yourself: If I were the user, what would I need to be aware of
with this change.
8. Once your code looks good and passes all the tests, we'll merge it. Thanks
a lot for your contribution!
Please provide the patches for each bug or feature in a separate branch and
open up a pull request for each, as this simplifies discussion and merging.
open up a pull request for each.
The restic project uses the `gofmt` tool for Go source indentation, so please
run

114
README.md
View File

@@ -1,114 +0,0 @@
[![Documentation](https://readthedocs.org/projects/restic/badge/?version=latest)](https://restic.readthedocs.io/en/latest/?badge=latest)
[![Build Status Travis](https://travis-ci.com/restic/restic.svg?branch=master)](https://travis-ci.com/restic/restic)
[![Build Status AppVeyor](https://ci.appveyor.com/api/projects/status/nuy4lfbgfbytw92q/branch/master?svg=true)](https://ci.appveyor.com/project/fd0/restic/branch/master)
[![Go Report Card](https://goreportcard.com/badge/github.com/restic/restic)](https://goreportcard.com/report/github.com/restic/restic)
# Introduction
restic is a backup program that is fast, efficient and secure. It supports the three major operating systems (Linux, macOS, Windows) and a few smaller ones (FreeBSD, OpenBSD).
For detailed usage and installation instructions check out the [documentation](https://restic.readthedocs.io/en/latest).
You can ask questions in our [Discourse forum](https://forum.restic.net).
Quick start
-----------
Once you've [installed](https://restic.readthedocs.io/en/latest/020_installation.html) restic, start
off with creating a repository for your backups:
$ restic init --repo /tmp/backup
enter password for new backend:
enter password again:
created restic backend 085b3c76b9 at /tmp/backup
Please note that knowledge of your password is required to access the repository.
Losing your password means that your data is irrecoverably lost.
and add some data:
$ restic --repo /tmp/backup backup ~/work
enter password for repository:
scan [/home/user/work]
scanned 764 directories, 1816 files in 0:00
[0:29] 100.00% 54.732 MiB/s 1.582 GiB / 1.582 GiB 2580 / 2580 items 0 errors ETA 0:00
duration: 0:29, 54.47MiB/s
snapshot 40dc1520 saved
Next you can either use `restic restore` to restore files or use `restic
mount` to mount the repository via fuse and browse the files from previous
snapshots.
For more options check out the [online documentation](https://restic.readthedocs.io/en/latest/).
# Backends
Saving a backup on the same machine is nice but not a real backup strategy.
Therefore, restic supports the following backends for storing backups natively:
- [Local directory](https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#local)
- [sftp server (via SSH)](https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#sftp)
- [HTTP REST server](https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#rest-server) ([protocol](doc/100_references.rst#rest-backend), [rest-server](https://github.com/restic/rest-server))
- [AWS S3](https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#amazon-s3) (either from Amazon or using the [Minio](https://minio.io) server)
- [OpenStack Swift](https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#openstack-swift)
- [BackBlaze B2](https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#backblaze-b2)
- [Microsoft Azure Blob Storage](https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#microsoft-azure-blob-storage)
- [Google Cloud Storage](https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#google-cloud-storage)
- And many other services via the [rclone](https://rclone.org) [Backend](https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#other-services-via-rclone)
# Design Principles
Restic is a program that does backups right and was designed with the
following principles in mind:
- **Easy:** Doing backups should be a frictionless process, otherwise
you might be tempted to skip it. Restic should be easy to configure
and use, so that, in the event of a data loss, you can just restore
it. Likewise, restoring data should not be complicated.
- **Fast**: Backing up your data with restic should only be limited by
your network or hard disk bandwidth so that you can backup your files
every day. Nobody does backups if it takes too much time. Restoring
backups should only transfer data that is needed for the files that
are to be restored, so that this process is also fast.
- **Verifiable**: Much more important than backup is restore, so restic
enables you to easily verify that all data can be restored.
- **Secure**: Restic uses cryptography to guarantee confidentiality and
integrity of your data. The location the backup data is stored is
assumed not to be a trusted environment (e.g. a shared space where
others like system administrators are able to access your backups).
Restic is built to secure your data against such attackers.
- **Efficient**: With the growth of data, additional snapshots should
only take the storage of the actual increment. Even more, duplicate
data should be de-duplicated before it is actually written to the
storage back end to save precious backup space.
# Reproducible Builds
The binaries released with each restic version starting at 0.6.1 are
[reproducible](https://reproducible-builds.org/), which means that you can
reproduce a byte identical version from the source code for that
release. Instructions on how to do that are contained in the
[builder repository](https://github.com/restic/builder).
News
----
You can follow the restic project on Twitter [@resticbackup](https://twitter.com/resticbackup) or by subscribing to
the [project blog](https://restic.net/blog/).
License
-------
Restic is licensed under [BSD 2-Clause License](https://opensource.org/licenses/BSD-2-Clause). You can find the
complete text in [``LICENSE``](LICENSE).
Sponsorship
-----------
Backend integration tests for Google Cloud Storage and Microsoft Azure Blob
Storage are sponsored by [AppsCode](https://appscode.com)!
[![Sponsored by AppsCode](https://cdn.appscode.com/images/logo/appscode/ac-logo-color.png)](https://appscode.com)

135
README.rst Normal file
View File

@@ -0,0 +1,135 @@
|Documentation| |Build Status| |Build status| |Report Card| |Say Thanks| |Reviewed by Hound|
Introduction
------------
restic is a backup program that is fast, efficient and secure. It supports the three major operating systems (Linux, macOS, Windows) and a few smaller ones (FreeBSD, OpenBSD).
For detailed usage and installation instructions check out the `documentation <https://restic.readthedocs.io/en/latest>`__.
You can ask questions in our `Discourse forum <https://forum.restic.net>`__.
Quick start
-----------
Once you've `installed
<https://restic.readthedocs.io/en/latest/020_installation.html>`__ restic, start
off with creating a repository for your backups:
.. code-block:: console
$ restic init --repo /tmp/backup
enter password for new backend:
enter password again:
created restic backend 085b3c76b9 at /tmp/backup
Please note that knowledge of your password is required to access the repository.
Losing your password means that your data is irrecoverably lost.
and add some data:
.. code-block:: console
$ restic --repo /tmp/backup backup ~/work
enter password for repository:
scan [/home/user/work]
scanned 764 directories, 1816 files in 0:00
[0:29] 100.00% 54.732 MiB/s 1.582 GiB / 1.582 GiB 2580 / 2580 items 0 errors ETA 0:00
duration: 0:29, 54.47MiB/s
snapshot 40dc1520 saved
Next you can either use ``restic restore`` to restore files or use ``restic
mount`` to mount the repository via fuse and browse the files from previous
snapshots.
For more options check out the `online documentation <https://restic.readthedocs.io/en/latest/>`__.
Backends
--------
Saving a backup on the same machine is nice but not a real backup strategy.
Therefore, restic supports the following backends for storing backups natively:
- `Local directory <https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#local>`__
- `sftp server (via SSH) <https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#sftp>`__
- `HTTP REST server <https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#rest-server>`__ (`protocol <doc/100_references.rst#rest-backend>`__ `rest-server <https://github.com/restic/rest-server>`__)
- `AWS S3 <https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#amazon-s3>`__ (either from Amazon or using the `Minio <https://minio.io>`__ server)
- `OpenStack Swift <https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#openstack-swift>`__
- `BackBlaze B2 <https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#backblaze-b2>`__
- `Microsoft Azure Blob Storage <https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#microsoft-azure-blob-storage>`__
- `Google Cloud Storage <https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#google-cloud-storage>`__
- And many other services via the `rclone <https://rclone.org>`__ `Backend <https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#other-services-via-rclone>`__
Design Principles
-----------------
Restic is a program that does backups right and was designed with the
following principles in mind:
- **Easy:** Doing backups should be a frictionless process, otherwise
you might be tempted to skip it. Restic should be easy to configure
and use, so that, in the event of a data loss, you can just restore
it. Likewise, restoring data should not be complicated.
- **Fast**: Backing up your data with restic should only be limited by
your network or hard disk bandwidth so that you can backup your files
every day. Nobody does backups if it takes too much time. Restoring
backups should only transfer data that is needed for the files that
are to be restored, so that this process is also fast.
- **Verifiable**: Much more important than backup is restore, so restic
enables you to easily verify that all data can be restored.
- **Secure**: Restic uses cryptography to guarantee confidentiality and
integrity of your data. The location the backup data is stored is
assumed not to be a trusted environment (e.g. a shared space where
others like system administrators are able to access your backups).
Restic is built to secure your data against such attackers.
- **Efficient**: With the growth of data, additional snapshots should
only take the storage of the actual increment. Even more, duplicate
data should be de-duplicated before it is actually written to the
storage back end to save precious backup space.
Reproducible Builds
-------------------
The binaries released with each restic version starting at 0.6.1 are
`reproducible <https://reproducible-builds.org/>`__, which means that you can
easily reproduce a byte identical version from the source code for that
release. Instructions on how to do that are contained in the
`builder repository <https://github.com/restic/builder>`__.
News
----
You can follow the restic project on Twitter `@resticbackup <https://twitter.com/resticbackup>`__ or by subscribing to
the `development blog <https://restic.net/blog/>`__.
License
-------
Restic is licensed under `BSD 2-Clause License <https://opensource.org/licenses/BSD-2-Clause>`__. You can find the
complete text in ``LICENSE``.
Sponsorship
-----------
Backend integration tests for Google Cloud Storage and Microsoft Azure Blob
Storage are sponsored by `AppsCode <https://appscode.com>`__!
|AppsCode|
.. |Documentation| image:: https://readthedocs.org/projects/restic/badge/?version=latest
:target: https://restic.readthedocs.io/en/latest/?badge=latest
.. |Build Status| image:: https://travis-ci.com/restic/restic.svg?branch=master
:target: https://travis-ci.com/restic/restic
.. |Build status| image:: https://ci.appveyor.com/api/projects/status/nuy4lfbgfbytw92q/branch/master?svg=true
:target: https://ci.appveyor.com/project/fd0/restic/branch/master
.. |Report Card| image:: https://goreportcard.com/badge/github.com/restic/restic
:target: https://goreportcard.com/report/github.com/restic/restic
.. |Say Thanks| image:: https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg
:target: https://saythanks.io/to/restic
.. |AppsCode| image:: https://cdn.appscode.com/images/logo/appscode/ac-logo-color.png
:target: https://appscode.com
.. |Reviewed by Hound| image:: https://img.shields.io/badge/Reviewed_by-Hound-8E64B0.svg
:target: https://houndci.com

View File

@@ -1 +1 @@
0.11.0
0.9.6

View File

@@ -20,11 +20,11 @@ init:
install:
- rmdir c:\go /s /q
- appveyor DownloadFile https://dl.google.com/go/go1.15.2.windows-amd64.msi
- msiexec /i go1.15.2.windows-amd64.msi /q
- appveyor DownloadFile https://dl.google.com/go/go1.14.windows-amd64.msi
- msiexec /i go1.14.windows-amd64.msi /q
- go version
- go env
- appveyor DownloadFile https://sourceforge.netcologne.de/project/gnuwin32/tar/1.13-1/tar-1.13-1-bin.zip -FileName tar.zip
- appveyor DownloadFile http://sourceforge.netcologne.de/project/gnuwin32/tar/1.13-1/tar-1.13-1-bin.zip -FileName tar.zip
- 7z x tar.zip bin/tar.exe
- set PATH=bin/;%PATH%

View File

@@ -3,7 +3,7 @@
// This program aims to make building Go programs for end users easier by just
// calling it with `go run`, without having to setup a GOPATH.
//
// This program needs Go >= 1.12. It'll use Go modules for compilation. It
// This program needs Go >= 1.11. It'll use Go modules for compilation. It
// builds the package configured as Main in the Config struct.
// BSD 2-Clause License
@@ -327,8 +327,8 @@ func (v GoVersion) String() string {
}
func main() {
if !goVersion.AtLeast(GoVersion{1, 12, 0}) {
die("Go version (%v) is too old, restic requires Go >= 1.12\n", goVersion)
if !goVersion.AtLeast(GoVersion{1, 11, 0}) {
die("Go version (%v) is too old, Go <= 1.11 does not support Go Modules\n", goVersion)
}
if !goVersion.AtLeast(config.MinVersion) {

View File

@@ -1,8 +0,0 @@
Enhancement: Optimize `restic mount`
We've optimized the FUSE implementation used within restic.
`restic mount` is now more responsive and uses less memory.
https://github.com/restic/restic/issues/1680
https://github.com/restic/restic/pull/2587
https://github.com/restic/restic/pull/2787

View File

@@ -1,6 +0,0 @@
Bugfix: Report correct number of directories processed by backup
The directory statistics calculation was fixed to report the actual number
of processed directories instead of always zero.
https://github.com/restic/restic/issues/1863

View File

@@ -1,9 +0,0 @@
Enhancement: Allow specifying user and host when creating keys
When adding a new key to the repository, the username and hostname for the new
key can be specified on the command line. This allows overriding the defaults,
for example if you would prefer to use the FQDN to identify the host or if you
want to add keys for several different hosts without having to run the key add
command on those hosts.
https://github.com/restic/restic/issues/2175

View File

@@ -1,9 +0,0 @@
Bugfix: Fix tar issues when dumping `/`
We've fixed an issue with dumping either `/` or files on the first sublevel
e.g. `/foo` to tar. This also fixes tar dumping issues on Windows where this
issue could also happen.
https://github.com/restic/restic/issues/2254
https://github.com/restic/restic/issues/2357
https://github.com/restic/restic/pull/2255

View File

@@ -1,12 +0,0 @@
Enhancement: Ignore sync errors when operation not supported by local filesystem
The local backend has been modified to work with filesystems which doesn't support
the `sync` operation. This operation is normally used by restic to ensure that data
files are fully written to disk before continuing.
For these limited filesystems, saving a file in the backend would previously fail with
an "operation not supported" error. This error is now ignored, which means that e.g.
an SMB mount on macOS can now be used as storage location for a repository.
https://github.com/restic/restic/issues/2395
https://forum.restic.net/t/sync-errors-on-mac-over-smb/1859

View File

@@ -1,7 +0,0 @@
Enhancement: Add flag `--iexclude-file` to backup command
The backup command now supports the flag `--iexclude-file` which is a
case-insensitive version of `--exclude-file`.
https://github.com/restic/restic/issues/2427
https://github.com/restic/restic/pull/2898

View File

@@ -1,8 +0,0 @@
Enhancement: Support excluding files by their size
The `backup` command now supports the `--exclude-larger-than` option to exclude files which are
larger than the specified maximum size. This can for example be useful to exclude unimportant
files with a large file size.
https://github.com/restic/restic/issues/2569
https://github.com/restic/restic/pull/2914

View File

@@ -1,16 +0,0 @@
Enhancement: Self-heal missing file parts during backup of unchanged files
We've improved the resilience of restic to certain types of repository corruption.
For files that are unchanged since the parent snapshot, the backup command now
verifies that all parts of the files still exist in the repository. Parts that are
missing, e.g. from a damaged repository, are backed up again. This verification
was already run for files that were modified since the parent snapshot, but is
now also done for unchanged files.
Note that restic will not backup file parts that are referenced in the index but
where the actual data is not present on disk, as this situation can only be
detected by restic check. Please ensure that you run `restic check` regularly.
https://github.com/restic/restic/issues/2571
https://github.com/restic/restic/pull/2827

View File

@@ -1,9 +0,0 @@
Enhancement: Support filtering snapshots by tag and path in the stats command
We've added filtering snapshots by `--tag tagList` and by `--path path` to
the `stats` command. This includes filtering of only 'latest' snapshots or
all snapshots in a repository.
https://github.com/restic/restic/issues/2858
https://github.com/restic/restic/pull/2859
https://forum.restic.net/t/stats-for-a-host-and-filtered-snapshots/3020

View File

@@ -1,20 +0,0 @@
Enhancement: Add command for copying snapshots between repositories
We've added a copy command, allowing you to copy snapshots from one
repository to another.
Note that this process will have to read (download) and write (upload) the
entire snapshot(s) due to the different encryption keys used on the source
and destination repository. Also, the transferred files are not re-chunked,
which may break deduplication between files already stored in the
destination repo and files copied there using this command.
To fully support deduplication between repositories when the copy command is
used, the init command now supports the `--copy-chunker-params` option,
which initializes the new repository with identical parameters for splitting
files into chunks as an already existing repository. This allows copied
snapshots to be equally deduplicated in both repositories.
https://github.com/restic/restic/issues/323
https://github.com/restic/restic/pull/2606
https://github.com/restic/restic/pull/2928

View File

@@ -1,10 +0,0 @@
Enhancement: Use optimized library for hash calculation of file chunks
We've switched the library used to calculate the hashes of file chunks, which
are used for deduplication, to the optimized Minio SHA-256 implementation.
Depending on the CPU it improves the hashing throughput by 10-30%. Modern x86
CPUs with the SHA Extension should be about two to three times faster.
https://github.com/restic/restic/issues/551
https://github.com/restic/restic/pull/2709

View File

@@ -1,8 +0,0 @@
Enhancement: Improve speed of check command
We've improved the check command to traverse trees only once independent of
whether they are contained in multiple snapshots. The check command is now much
faster for repositories with a large number of snapshots.
https://github.com/restic/restic/pull/2328
https://github.com/restic/restic/issues/2284

View File

@@ -1,19 +0,0 @@
Change: Return exit code 3 when failing to backup all source data
The backup command used to return a zero exit code as long as a snapshot
could be created successfully, even if some of the source files could not
be read (in which case the snapshot would contain the rest of the files).
This made it hard for automation/scripts to detect failures/incomplete
backups by looking at the exit code. Restic now returns the following exit
codes for the backup command:
- 0 when the command was successful
- 1 when there was a fatal error (no snapshot created)
- 3 when some source data could not be read (incomplete snapshot created)
https://github.com/restic/restic/pull/2546
https://github.com/restic/restic/issues/956
https://github.com/restic/restic/issues/2064
https://github.com/restic/restic/issues/2526
https://github.com/restic/restic/issues/2364

View File

@@ -1,6 +0,0 @@
Enhancement: Improve speed of diff command
We've improved the performance of the diff command when comparing snapshots
with similar content. It should run up to twice as fast as before.
https://github.com/restic/restic/pull/2598

View File

@@ -1,6 +0,0 @@
Enhancement: Slightly reduce memory usage of prune and stats commands
The prune and the stats command kept directory identifiers in memory twice
while searching for used blobs.
https://github.com/restic/restic/pull/2599

View File

@@ -1,14 +0,0 @@
Change: Update dependencies, require Go >= 1.13
Restic now requires Go to be at least 1.13. This allows simplifications in the
build process and removing workarounds.
This is also probably the last version of restic still supporting mounting
repositories via fuse on macOS. The library we're using for fuse does not
support macOS any more and osxfuse is not open source any more.
https://github.com/restic/restic/pull/2600
https://github.com/restic/restic/pull/2852
https://github.com/restic/restic/pull/2927
https://github.com/bazil/fuse/issues/224
https://github.com/osxfuse/osxfuse/issues/590

View File

@@ -1,8 +0,0 @@
Bugfix: Add stricter prune error checks
Additional checks were added to the prune command in order to improve
resiliency to backend, hardware and/or networking issues. The checks now
detect a few more cases where such outside factors could potentially cause
data loss.
https://github.com/restic/restic/pull/2674

View File

@@ -1,6 +0,0 @@
Enhancement: S3 backend: Add support for WebIdentityTokenFile
We've added support for EKS IAM roles for service accounts feature to the S3 backend.
https://github.com/restic/restic/pull/2733
https://github.com/restic/restic/issues/2703

View File

@@ -1,6 +0,0 @@
Enhancement: Optimize handling of new index entries
Restic now uses less memory for backups which add a lot of data, e.g. large initial backups.
In addition, we've improved the stability in some edge cases.
https://github.com/restic/restic/pull/2773

View File

@@ -1,8 +0,0 @@
Enhancement: Reduce memory consumption of in-memory index
We've improved how the index is stored in memory.
This change can reduce memory usage for large repositories by up to 50%
(depending on the operation).
https://github.com/restic/restic/pull/2781
https://github.com/restic/restic/pull/2812

View File

@@ -1,6 +0,0 @@
Enhancement: Optimize `list blobs` command
We've changed the implementation of `list blobs` which should be now a bit faster
and consume almost no memory even for large repositories.
https://github.com/restic/restic/pull/2786

View File

@@ -1,6 +0,0 @@
Enhancement: Optimized file access in restic mount
Reading large (> 100GiB) files from restic mountpoints is now faster,
and the speedup is greater for larger files.
https://github.com/restic/restic/pull/2790

View File

@@ -1,8 +0,0 @@
Change: Honor the --no-lock flag in the mount command
The mount command now does not lock the repository if given the
--no-lock flag. This allows to mount repositories which are archived
on a read only backend/filesystem.
https://github.com/restic/restic/issues/1597
https://github.com/restic/restic/pull/2821

View File

@@ -1,7 +0,0 @@
Enhancement: Speed-up file deletion in forget, prune and rebuild-index
We've sped up the file deletion for the commands forget, prune and
rebuild-index, especially for remote repositories.
Deletion was sequential before and is now run in parallel.
https://github.com/restic/restic/pull/2840

View File

@@ -1,9 +0,0 @@
Bugfix: Fix possible crash in the progress bar of check --read-data
We've fixed a possible crash while displaying the progress bar for the
check --read-data command. The crash occurred when the length of the
progress bar status exceeded the terminal width, which only happened for
very narrow terminal windows.
https://github.com/restic/restic/pull/2899
https://forum.restic.net/t/restic-rclone-pcloud-connection-issues/2963/15

View File

@@ -1,11 +0,0 @@
Bugfix: Restore timestamps and permissions on intermediate directories
When using the `--include` option of the restore command, restic restored
timestamps and permissions only on directories selected by the include pattern.
Intermediate directories, which are necessary to restore files located in sub-
directories, were created with default permissions. We've fixed the restore
command to restore timestamps and permissions for these directories as well.
https://github.com/restic/restic/issues/1212
https://github.com/restic/restic/issues/1402
https://github.com/restic/restic/pull/2906

View File

@@ -1,15 +0,0 @@
Bugfix: Mark repository files as read-only when using the local backend
Files stored in a local repository were marked as writeable on the
filesystem for non-Windows systems, which did not prevent accidental file
modifications outside of restic. In addition, the local backend did not work
with certain filesystems and network mounts which do not permit modifications
of file permissions.
restic now marks files stored in a local repository as read-only on the
filesystem on non-Windows systems. The error handling is improved to support
more filesystems.
https://github.com/restic/restic/issues/1756
https://github.com/restic/restic/issues/2157
https://github.com/restic/restic/pull/2989

View File

@@ -1,10 +0,0 @@
Bugfix: Hide password in REST backend repository URLs
When using a password in the REST backend repository URL,
the password could in some cases be included in the output
from restic, e.g. when initializing a repo or during an error.
The password is now replaced with "***" where applicable.
https://github.com/restic/restic/issues/2241
https://github.com/restic/restic/pull/2658

View File

@@ -1,12 +0,0 @@
Bugfix: Correctly dump directories into tar files
The dump command previously wrote directories in a tar file in a way which
can cause compatibility problems. This caused, for example, 7zip on Windows
to not open tar files containing directories. In addition it was not possible
to dump directories with extended attributes. These compatibility problems
are now corrected.
In addition, a tar file now includes the name of the owner and group of a file.
https://github.com/restic/restic/issues/2319
https://github.com/restic/restic/pull/3039

View File

@@ -1,9 +0,0 @@
Bugfix: Don't require `self-update --output` placeholder file
`restic self-update --output /path/to/new-restic` used to require that
new-restic was an existing file, to be overwritten. Now it's possible
to download an updated restic binary to a new path, without first
having to create a placeholder file.
https://github.com/restic/restic/issues/2491
https://github.com/restic/restic/pull/2937

View File

@@ -1,7 +0,0 @@
Bugfix: Fix rare cases of backup command hanging forever
We've fixed an issue with the backup progress reporting which could cause
restic to hang forever right before finishing a backup.
https://github.com/restic/restic/issues/2834
https://github.com/restic/restic/pull/2963

View File

@@ -1,6 +0,0 @@
Bugfix: Fix manpage formatting
The manpage formatting in restic v0.10.0 was garbled, which is fixed now.
https://github.com/restic/restic/issues/2938
https://github.com/restic/restic/pull/2977

View File

@@ -1,7 +0,0 @@
Bugfix: Make --exclude-larger-than handle disappearing files
There was a small bug in the backup command's --exclude-larger-than
option where files that disappeared between scanning and actually
backing them up to the repository caused a panic. This is now fixed.
https://github.com/restic/restic/issues/2942

View File

@@ -1,9 +0,0 @@
Bugfix: restic generate, help and self-update no longer check passwords
The commands `restic cache`, `generate`, `help` and `self-update` don't need
passwords, but they previously did run the RESTIC_PASSWORD_COMMAND (if set in
the environment), prompting users to authenticate for no reason. They now skip
running the password command.
https://github.com/restic/restic/issues/2951
https://github.com/restic/restic/pull/2987

View File

@@ -1,9 +0,0 @@
Enhancement: Optimize check for unchanged files during backup
During a backup restic skips processing files which have not changed since the last backup run.
Previously this required opening each file once which can be slow on network filesystems. The
backup command now checks for file changes before opening a file. This considerably reduces
the time to create a backup on network filesystems.
https://github.com/restic/restic/issues/2969
https://github.com/restic/restic/pull/2970

View File

@@ -1,9 +0,0 @@
Bugfix: Make snapshots --json output [] instead of null when no snapshots
Restic previously output `null` instead of `[]` for the `--json snapshots`
command, when there were no snapshots in the repository. This caused some
minor problems when parsing the output, but is now fixed such that `[]` is
output when the list of snapshots is empty.
https://github.com/restic/restic/issues/2979
https://github.com/restic/restic/pull/2984

View File

@@ -1,12 +0,0 @@
Enhancement: Add support for Volume Shadow Copy Service (VSS) on Windows
Volume Shadow Copy Service allows read access to files that are locked by
another process using an exclusive lock through a filesystem snapshot. Restic
was unable to backup those files before. This update enables backing up these
files.
This needs to be enabled explicitely using the --use-fs-snapshot option of the
backup command.
https://github.com/restic/restic/issues/340
https://github.com/restic/restic/pull/2274

View File

@@ -1,7 +0,0 @@
Enhancement: Authenticate to Google Cloud Storage with access token
When using the GCS backend, it is now possible to authenticate with OAuth2
access tokens instead of a credentials file by setting the GOOGLE_ACCESS_TOKEN
environment variable.
https://github.com/restic/restic/pull/2849

View File

@@ -1,10 +0,0 @@
Enhancement: New option --repository-file
We've added a new command-line option --repository-file as an alternative
to -r. This allows to read the repository URL from a file in order to
prevent certain types of information leaks, especially for URLs containing
credentials.
https://github.com/restic/restic/issues/1458
https://github.com/restic/restic/issues/2900
https://github.com/restic/restic/pull/2910

View File

@@ -1,8 +0,0 @@
Enhancement: Warn if parent snapshot cannot be loaded during backup
During a backup restic uses the parent snapshot to check whether a file was
changed and has to be backed up again. For this check the backup has to read
the directories contained in the old snapshot. If a tree blob cannot be
loaded, restic now warns about this problem with the backup repository.
https://github.com/restic/restic/pull/2978

View File

@@ -7,7 +7,7 @@ vulnerability, but urge all users to upgrade to the latest version of restic.
Exploiting the vulnerability requires a Linux/Unix system which saves backups
via restic and a Windows systems which restores files from the repo. In
addition, the attackers need to be able to create files with arbitrary
addition, the attackers need to be able to create create files with arbitrary
names which are then saved to the restic repo. For example, by creating a file
named "..\test.txt" (which is a perfectly legal filename on Linux) and
restoring a snapshot containing this file on Windows, it would be written to

View File

@@ -1,21 +1,11 @@
# The first line must start with Bugfix:, Enhancement: or Change:,
# including the colon. Use present use. Remove lines starting with '#'
# from this template.
Enhancement: Allow custom bar in the foo command
Bugfix: Fix behavior for foobar (in present tense)
# Describe the problem in the past tense, the new behavior in the present
# tense. Mention the affected commands, backends, operating systems, etc.
# Focus on user-facing behavior, not the implementation.
We've fixed the behavior for foobar, a long-standing annoyance for restic
users.
Restic foo always used the system-wide bar when deciding how to frob an
item in the baz backend. It now permits selecting the bar with --bar or
the environment variable RESTIC_BAR. The system-wide bar is still the
default.
# The last section is a list of issue, PR and forum URLs.
# The first issue ID determines the filename for the changelog entry:
# changelog/unreleased/issue-1234. If there are no relevant issue links,
# use the PR ID and call the file pull-55555.
The text in the paragraphs is written in past tense. The last section is a list
of issue URLs, PR URLs and other URLs. The first issue ID (or the first PR ID,
in case there aren't any issue links) is used as the primary ID.
https://github.com/restic/restic/issues/1234
https://github.com/restic/restic/pull/55555

View File

@@ -1,2 +0,0 @@
# this file is here so the unreleased/ directory is tracked within git, even if
# no other changelog files are present.

View File

View File

@@ -1,11 +0,0 @@
Bugfix: Restore timestamps and permissions on intermediate directories
When using the `--include` option of the restore command, restic restored
timestamps and permissions only on directories selected by the include pattern.
Intermediate directories, which are necessary to restore files located in sub-
directories, were created with default permissions. We've fixed the restore
command to restore timestamps and permissions for these directories as well.
https://github.com/restic/restic/issues/1212
https://github.com/restic/restic/issues/1402
https://github.com/restic/restic/pull/2906

View File

@@ -1,15 +0,0 @@
Bugfix: Mark repository files as read-only when using the local backend
Files stored in a local repository were marked as writeable on the
filesystem for non-Windows systems, which did not prevent accidental file
modifications outside of restic. In addition, the local backend did not work
with certain filesystems and network mounts which do not permit modifications
of file permissions.
restic now marks files stored in a local repository as read-only on the
filesystem on non-Windows systems. The error handling is improved to support
more filesystems.
https://github.com/restic/restic/issues/1756
https://github.com/restic/restic/issues/2157
https://github.com/restic/restic/pull/2989

View File

@@ -1,10 +0,0 @@
Bugfix: Hide password in REST backend repository URLs
When using a password in the REST backend repository URL,
the password could in some cases be included in the output
from restic, e.g. when initializing a repo or during an error.
The password is now replaced with "***" where applicable.
https://github.com/restic/restic/issues/2241
https://github.com/restic/restic/pull/2658

View File

@@ -1,12 +0,0 @@
Bugfix: Correctly dump directories into tar files
The dump command previously wrote directories in a tar file in a way which
can cause compatibility problems. This caused, for example, 7zip on Windows
to not open tar files containing directories. In addition it was not possible
to dump directories with extended attributes. These compatibility problems
are now corrected.
In addition, a tar file now includes the name of the owner and group of a file.
https://github.com/restic/restic/issues/2319
https://github.com/restic/restic/pull/3039

View File

@@ -1,9 +0,0 @@
Bugfix: Don't require `self-update --output` placeholder file
`restic self-update --output /path/to/new-restic` used to require that
new-restic was an existing file, to be overwritten. Now it's possible
to download an updated restic binary to a new path, without first
having to create a placeholder file.
https://github.com/restic/restic/issues/2491
https://github.com/restic/restic/pull/2937

View File

@@ -1,7 +0,0 @@
Bugfix: Fix rare cases of backup command hanging forever
We've fixed an issue with the backup progress reporting which could cause
restic to hang forever right before finishing a backup.
https://github.com/restic/restic/issues/2834
https://github.com/restic/restic/pull/2963

View File

@@ -1,6 +0,0 @@
Bugfix: Fix manpage formatting
The manpage formatting in restic v0.10.0 was garbled, which is fixed now.
https://github.com/restic/restic/issues/2938
https://github.com/restic/restic/pull/2977

View File

@@ -1,7 +0,0 @@
Bugfix: Make --exclude-larger-than handle disappearing files
There was a small bug in the backup command's --exclude-larger-than
option where files that disappeared between scanning and actually
backing them up to the repository caused a panic. This is now fixed.
https://github.com/restic/restic/issues/2942

View File

@@ -1,9 +0,0 @@
Bugfix: restic generate, help and self-update no longer check passwords
The commands `restic cache`, `generate`, `help` and `self-update` don't need
passwords, but they previously did run the RESTIC_PASSWORD_COMMAND (if set in
the environment), prompting users to authenticate for no reason. They now skip
running the password command.
https://github.com/restic/restic/issues/2951
https://github.com/restic/restic/pull/2987

View File

@@ -1,9 +0,0 @@
Enhancement: Optimize check for unchanged files during backup
During a backup restic skips processing files which have not changed since the last backup run.
Previously this required opening each file once which can be slow on network filesystems. The
backup command now checks for file changes before opening a file. This considerably reduces
the time to create a backup on network filesystems.
https://github.com/restic/restic/issues/2969
https://github.com/restic/restic/pull/2970

View File

@@ -1,9 +0,0 @@
Bugfix: Make snapshots --json output [] instead of null when no snapshots
Restic previously output `null` instead of `[]` for the `--json snapshots`
command, when there were no snapshots in the repository. This caused some
minor problems when parsing the output, but is now fixed such that `[]` is
output when the list of snapshots is empty.
https://github.com/restic/restic/issues/2979
https://github.com/restic/restic/pull/2984

View File

@@ -1,12 +0,0 @@
Enhancement: Add support for Volume Shadow Copy Service (VSS) on Windows
Volume Shadow Copy Service allows read access to files that are locked by
another process using an exclusive lock through a filesystem snapshot. Restic
was unable to backup those files before. This update enables backing up these
files.
This needs to be enabled explicitely using the --use-fs-snapshot option of the
backup command.
https://github.com/restic/restic/issues/340
https://github.com/restic/restic/pull/2274

View File

@@ -14,10 +14,4 @@ file can be written to the file before any of the preceeding file blobs.
It is therefore possible to have gaps in the data written to the target
files if restore fails or interrupted by the user.
The implementation will try to preallocate space for the restored files
on the filesystem to prevent file fragmentation. This ensures good read
performance for large files, like for example VM images. If preallocating
space is not supported by the filesystem, then this step is silently skipped.
https://github.com/restic/restic/pull/2195
https://github.com/restic/restic/pull/2893

View File

@@ -1,9 +1,7 @@
Enhancement: Improve the chunking algorithm
We've updated the chunker library responsible for splitting files into smaller
blocks. It should improve the chunking throughput by 5-15% depending on the
blocks. It should improve the chunking throughput by 5-10% depending on the
CPU.
https://github.com/restic/restic/pull/2576
https://github.com/restic/restic/pull/2845
https://github.com/restic/restic/issues/2820

View File

@@ -0,0 +1,6 @@
Change: Require Go >= 1.11
Restic now requires Go to be at least 1.11. This allows simplifications in the
build process and removing workarounds.
https://github.com/restic/restic/pull/2600

View File

@@ -1,7 +0,0 @@
Enhancement: Authenticate to Google Cloud Storage with access token
When using the GCS backend, it is now possible to authenticate with OAuth2
access tokens instead of a credentials file by setting the GOOGLE_ACCESS_TOKEN
environment variable.
https://github.com/restic/restic/pull/2849

View File

@@ -1,10 +0,0 @@
Enhancement: New option --repository-file
We've added a new command-line option --repository-file as an alternative
to -r. This allows to read the repository URL from a file in order to
prevent certain types of information leaks, especially for URLs containing
credentials.
https://github.com/restic/restic/issues/1458
https://github.com/restic/restic/issues/2900
https://github.com/restic/restic/pull/2910

View File

@@ -1,8 +0,0 @@
Enhancement: Warn if parent snapshot cannot be loaded during backup
During a backup restic uses the parent snapshot to check whether a file was
changed and has to be backed up again. For this check the backup has to read
the directories contained in the old snapshot. If a tree blob cannot be
loaded, restic now warns about this problem with the backup repository.
https://github.com/restic/restic/pull/2978

25
chunker/.travis.yml Normal file
View File

@@ -0,0 +1,25 @@
language: go
sudo: false
matrix:
include:
- os: linux
go: "1.9.x"
- os: linux
go: "1.10.x"
- os: linux
go: "tip"
- os: osx
go: "1.10.x"
install:
- go get -t ./...
- go get -u golang.org/x/lint/golint
- go get -u golang.org/x/tools/cmd/goimports
script:
- go vet ./...
- go test -v -cpu=2 ./...
- go test -v -cpu=1,2,4 -short -race ./...
- diff -au <(goimports -d .) <(printf "")
- diff -au <(golint ./...) <(printf "")

23
chunker/LICENSE Normal file
View File

@@ -0,0 +1,23 @@
Copyright (c) 2014, Alexander Neumann <alexander@bumpern.de>
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

12
chunker/README.md Normal file
View File

@@ -0,0 +1,12 @@
[![GoDoc](https://godoc.org/github.com/restic/chunker?status.svg)](http://godoc.org/github.com/restic/chunker)
[![Build Status](https://travis-ci.com/restic/chunker.svg?branch=master)](https://travis-ci.com/restic/chunker)
The package `chunker` implements content-defined-chunking (CDC) based on a
rolling Rabin Hash. The library is part of the [restic backup
program](https://github.com/restic/restic).
An introduction to Content Defined Chunking can be found in the restic blog
post [Foundation - Introducing Content Defined Chunking (CDC)](https://restic.github.io/blog/2015-09-12/restic-foundation1-cdc).
You can find the API documentation at
https://godoc.org/github.com/restic/chunker

383
chunker/chunker.go Normal file
View File

@@ -0,0 +1,383 @@
package chunker
import (
"errors"
"fmt"
"io"
"sync"
)
const (
kiB = 1024
miB = 1024 * kiB
// WindowSize is the size of the sliding window.
windowSize = 64
// MinSize is the default minimal size of a chunk.
MinSize = 512 * kiB
// MaxSize is the default maximal size of a chunk.
MaxSize = 8 * miB
chunkerBufSize = 512 * kiB
)
type tables struct {
out [256]Pol
mod [256]Pol
}
// cache precomputed tables, these are read-only anyway
var cache struct {
entries map[Pol]tables
sync.Mutex
}
func init() {
cache.entries = make(map[Pol]tables)
}
// Chunk is one content-dependent chunk of bytes whose end was cut when the
// Rabin Fingerprint had the value stored in Cut.
type Chunk struct {
Start uint
Length uint
Cut uint64
Data []byte
}
type chunkerState struct {
window [windowSize]byte
wpos uint
buf []byte
bpos uint
bmax uint
start uint
count uint
pos uint
pre uint // wait for this many bytes before start calculating an new chunk
digest uint64
}
type chunkerConfig struct {
MinSize, MaxSize uint
pol Pol
polShift uint
tables tables
tablesInitialized bool
splitmask uint64
rd io.Reader
closed bool
}
// Chunker splits content with Rabin Fingerprints.
type Chunker struct {
chunkerConfig
chunkerState
}
// SetAverageBits allows to control the frequency of chunk discovery:
// the lower averageBits, the higher amount of chunks will be identified.
// The default value is 20 bits, so chunks will be of 1MiB size on average.
func (c *Chunker) SetAverageBits(averageBits int) {
c.splitmask = (1 << uint64(averageBits)) - 1
}
// New returns a new Chunker based on polynomial p that reads from rd.
func New(rd io.Reader, pol Pol) *Chunker {
return NewWithBoundaries(rd, pol, MinSize, MaxSize)
}
// NewWithBoundaries returns a new Chunker based on polynomial p that reads from
// rd and custom min and max size boundaries.
func NewWithBoundaries(rd io.Reader, pol Pol, min, max uint) *Chunker {
c := &Chunker{
chunkerState: chunkerState{
buf: make([]byte, chunkerBufSize),
},
chunkerConfig: chunkerConfig{
pol: pol,
rd: rd,
MinSize: min,
MaxSize: max,
splitmask: (1 << 20) - 1, // aim to create chunks of 20 bits or about 1MiB on average.
},
}
c.reset()
return c
}
// Reset reinitializes the chunker with a new reader and polynomial.
func (c *Chunker) Reset(rd io.Reader, pol Pol) {
c.ResetWithBoundaries(rd, pol, MinSize, MaxSize)
}
// ResetWithBoundaries reinitializes the chunker with a new reader, polynomial
// and custom min and max size boundaries.
func (c *Chunker) ResetWithBoundaries(rd io.Reader, pol Pol, min, max uint) {
*c = Chunker{
chunkerState: chunkerState{
buf: c.buf,
},
chunkerConfig: chunkerConfig{
pol: pol,
rd: rd,
MinSize: min,
MaxSize: max,
splitmask: (1 << 20) - 1,
},
}
c.reset()
}
func (c *Chunker) reset() {
c.polShift = uint(c.pol.Deg() - 8)
c.fillTables()
for i := 0; i < windowSize; i++ {
c.window[i] = 0
}
c.closed = false
c.digest = 0
c.wpos = 0
c.count = 0
c.digest = c.slide(c.digest, 1)
c.start = c.pos
// do not start a new chunk unless at least MinSize bytes have been read
c.pre = c.MinSize - windowSize
}
// fillTables calculates out_table and mod_table for optimization. This
// implementation uses a cache in the global variable cache.
func (c *Chunker) fillTables() {
// if polynomial hasn't been specified, do not compute anything for now
if c.pol == 0 {
return
}
c.tablesInitialized = true
// test if the tables are cached for this polynomial
cache.Lock()
defer cache.Unlock()
if t, ok := cache.entries[c.pol]; ok {
c.tables = t
return
}
// calculate table for sliding out bytes. The byte to slide out is used as
// the index for the table, the value contains the following:
// out_table[b] = Hash(b || 0 || ... || 0)
// \ windowsize-1 zero bytes /
// To slide out byte b_0 for window size w with known hash
// H := H(b_0 || ... || b_w), it is sufficient to add out_table[b_0]:
// H(b_0 || ... || b_w) + H(b_0 || 0 || ... || 0)
// = H(b_0 + b_0 || b_1 + 0 || ... || b_w + 0)
// = H( 0 || b_1 || ... || b_w)
//
// Afterwards a new byte can be shifted in.
for b := 0; b < 256; b++ {
var h Pol
h = appendByte(h, byte(b), c.pol)
for i := 0; i < windowSize-1; i++ {
h = appendByte(h, 0, c.pol)
}
c.tables.out[b] = h
}
// calculate table for reduction mod Polynomial
k := c.pol.Deg()
for b := 0; b < 256; b++ {
// mod_table[b] = A | B, where A = (b(x) * x^k mod pol) and B = b(x) * x^k
//
// The 8 bits above deg(Polynomial) determine what happens next and so
// these bits are used as a lookup to this table. The value is split in
// two parts: Part A contains the result of the modulus operation, part
// B is used to cancel out the 8 top bits so that one XOR operation is
// enough to reduce modulo Polynomial
c.tables.mod[b] = Pol(uint64(b)<<uint(k)).Mod(c.pol) | (Pol(b) << uint(k))
}
cache.entries[c.pol] = c.tables
}
// Next returns the position and length of the next chunk of data. If an error
// occurs while reading, the error is returned. Afterwards, the state of the
// current chunk is undefined. When the last chunk has been returned, all
// subsequent calls yield an io.EOF error.
func (c *Chunker) Next(data []byte) (Chunk, error) {
data = data[:0]
if !c.tablesInitialized {
return Chunk{}, errors.New("tables for polynomial computation not initialized")
}
tabout := c.tables.out
tabmod := c.tables.mod
polShift := c.polShift
// go guarantees the expected behavior for bit shifts even for shift counts
// larger than the value width. Bounding the value of polShift allows the compiler
// to optimize the code for 'digest >> polShift'
if polShift > 53-8 {
return Chunk{}, errors.New("the polynomial must have a degree less than or equal 53")
}
minSize := c.MinSize
maxSize := c.MaxSize
buf := c.buf
for {
if c.bpos >= c.bmax {
n, err := io.ReadFull(c.rd, buf[:])
if err == io.ErrUnexpectedEOF {
err = nil
}
// io.ReadFull only returns io.EOF when no bytes could be read. If
// this is the case and we're in this branch, there are no more
// bytes to buffer, so this was the last chunk. If a different
// error has occurred, return that error and abandon the current
// chunk.
if err == io.EOF && !c.closed {
c.closed = true
// return current chunk, if any bytes have been processed
if c.count > 0 {
return Chunk{
Start: c.start,
Length: c.count,
Cut: c.digest,
Data: data,
}, nil
}
}
if err != nil {
return Chunk{}, err
}
if n < 0 {
return Chunk{}, fmt.Errorf("ReadFull returned negative number of bytes read: %v", n)
}
c.bpos = 0
c.bmax = uint(n)
}
// check if bytes have to be dismissed before starting a new chunk
if c.pre > 0 {
n := c.bmax - c.bpos
if c.pre > uint(n) {
c.pre -= uint(n)
data = append(data, buf[c.bpos:c.bmax]...)
c.count += uint(n)
c.pos += uint(n)
c.bpos = c.bmax
continue
}
data = append(data, buf[c.bpos:c.bpos+c.pre]...)
c.bpos += c.pre
c.count += c.pre
c.pos += c.pre
c.pre = 0
}
add := c.count
digest := c.digest
win := c.window
wpos := c.wpos
for _, b := range buf[c.bpos:c.bmax] {
// slide(b)
// limit wpos before to elide array bound checks
wpos = wpos % windowSize
out := win[wpos]
win[wpos] = b
digest ^= uint64(tabout[out])
wpos++
// updateDigest
index := byte(digest >> polShift)
digest <<= 8
digest |= uint64(b)
digest ^= uint64(tabmod[index])
// end manual inline
add++
if add < minSize {
continue
}
if (digest&c.splitmask) == 0 || add >= maxSize {
i := add - c.count - 1
data = append(data, c.buf[c.bpos:c.bpos+uint(i)+1]...)
c.count = add
c.pos += uint(i) + 1
c.bpos += uint(i) + 1
c.buf = buf
chunk := Chunk{
Start: c.start,
Length: c.count,
Cut: digest,
Data: data,
}
c.reset()
return chunk, nil
}
}
c.digest = digest
c.window = win
c.wpos = wpos % windowSize
steps := c.bmax - c.bpos
if steps > 0 {
data = append(data, c.buf[c.bpos:c.bpos+steps]...)
}
c.count += steps
c.pos += steps
c.bpos = c.bmax
}
}
func updateDigest(digest uint64, polShift uint, tab tables, b byte) (newDigest uint64) {
index := digest >> polShift
digest <<= 8
digest |= uint64(b)
digest ^= uint64(tab.mod[index])
return digest
}
func (c *Chunker) slide(digest uint64, b byte) (newDigest uint64) {
out := c.window[c.wpos]
c.window[c.wpos] = b
digest ^= uint64(c.tables.out[out])
c.wpos = (c.wpos + 1) % windowSize
digest = updateDigest(digest, c.polShift, c.tables, b)
return digest
}
func appendByte(hash Pol, b byte, pol Pol) Pol {
hash <<= 8
hash |= Pol(b)
return hash.Mod(pol)
}

347
chunker/chunker_test.go Normal file
View File

@@ -0,0 +1,347 @@
package chunker
import (
"bytes"
"crypto/sha256"
"encoding/hex"
"io"
"math/rand"
"testing"
"time"
)
func parseDigest(s string) []byte {
d, err := hex.DecodeString(s)
if err != nil {
panic(err)
}
return d
}
type chunk struct {
Length uint
CutFP uint64
Digest []byte
}
// polynomial used for all the tests below
const testPol = Pol(0x3DA3358B4DC173)
// created for 32MB of random data out of math/rand's Uint32() seeded by
// constant 23
//
// chunking configuration:
// window size 64, avg chunksize 1<<20, min chunksize 1<<19, max chunksize 1<<23
// polynom 0x3DA3358B4DC173
var chunks1 = []chunk{
chunk{2163460, 0x000b98d4cdf00000, parseDigest("4b94cb2cf293855ea43bf766731c74969b91aa6bf3c078719aabdd19860d590d")},
chunk{643703, 0x000d4e8364d00000, parseDigest("5727a63c0964f365ab8ed2ccf604912f2ea7be29759a2b53ede4d6841e397407")},
chunk{1528956, 0x0015a25c2ef00000, parseDigest("a73759636a1e7a2758767791c69e81b69fb49236c6929e5d1b654e06e37674ba")},
chunk{1955808, 0x00102a8242e00000, parseDigest("c955fb059409b25f07e5ae09defbbc2aadf117c97a3724e06ad4abd2787e6824")},
chunk{2222372, 0x00045da878000000, parseDigest("6ba5e9f7e1b310722be3627716cf469be941f7f3e39a4c3bcefea492ec31ee56")},
chunk{2538687, 0x00198a8179900000, parseDigest("8687937412f654b5cfe4a82b08f28393a0c040f77c6f95e26742c2fc4254bfde")},
chunk{609606, 0x001d4e8d17100000, parseDigest("5da820742ff5feb3369112938d3095785487456f65a8efc4b96dac4be7ebb259")},
chunk{1205738, 0x000a7204dd600000, parseDigest("cc70d8fad5472beb031b1aca356bcab86c7368f40faa24fe5f8922c6c268c299")},
chunk{959742, 0x00183e71e1400000, parseDigest("4065bdd778f95676c92b38ac265d361f81bff17d76e5d9452cf985a2ea5a4e39")},
chunk{4036109, 0x001fec043c700000, parseDigest("b9cf166e75200eb4993fc9b6e22300a6790c75e6b0fc8f3f29b68a752d42f275")},
chunk{1525894, 0x000b1574b1500000, parseDigest("2f238180e4ca1f7520a05f3d6059233926341090f9236ce677690c1823eccab3")},
chunk{1352720, 0x00018965f2e00000, parseDigest("afd12f13286a3901430de816e62b85cc62468c059295ce5888b76b3af9028d84")},
chunk{811884, 0x00155628aa100000, parseDigest("42d0cdb1ee7c48e552705d18e061abb70ae7957027db8ae8db37ec756472a70a")},
chunk{1282314, 0x001909a0a1400000, parseDigest("819721c2457426eb4f4c7565050c44c32076a56fa9b4515a1c7796441730eb58")},
chunk{1318021, 0x001cceb980000000, parseDigest("842eb53543db55bacac5e25cb91e43cc2e310fe5f9acc1aee86bdf5e91389374")},
chunk{948640, 0x0011f7a470a00000, parseDigest("b8e36bf7019bb96ac3fb7867659d2167d9d3b3148c09fe0de45850b8fe577185")},
chunk{645464, 0x00030ce2d9400000, parseDigest("5584bd27982191c3329f01ed846bfd266e96548dfa87018f745c33cfc240211d")},
chunk{533758, 0x0004435c53c00000, parseDigest("4da778a25b72a9a0d53529eccfe2e5865a789116cb1800f470d8df685a8ab05d")},
chunk{1128303, 0x0000c48517800000, parseDigest("08c6b0b38095b348d80300f0be4c5184d2744a17147c2cba5cc4315abf4c048f")},
chunk{800374, 0x000968473f900000, parseDigest("820284d2c8fd243429674c996d8eb8d3450cbc32421f43113e980f516282c7bf")},
chunk{2453512, 0x001e197c92600000, parseDigest("5fa870ed107c67704258e5e50abe67509fb73562caf77caa843b5f243425d853")},
chunk{2651975, 0x000ae6c868000000, parseDigest("181347d2bbec32bef77ad5e9001e6af80f6abcf3576549384d334ee00c1988d8")},
chunk{237392, 0x0000000000000001, parseDigest("fcd567f5d866357a8e299fd5b2359bb2c8157c30395229c4e9b0a353944a7978")},
}
// test if nullbytes are correctly split, even if length is a multiple of MinSize.
var chunks2 = []chunk{
chunk{MinSize, 0, parseDigest("07854d2fef297a06ba81685e660c332de36d5d18d546927d30daad6d7fda1541")},
chunk{MinSize, 0, parseDigest("07854d2fef297a06ba81685e660c332de36d5d18d546927d30daad6d7fda1541")},
chunk{MinSize, 0, parseDigest("07854d2fef297a06ba81685e660c332de36d5d18d546927d30daad6d7fda1541")},
chunk{MinSize, 0, parseDigest("07854d2fef297a06ba81685e660c332de36d5d18d546927d30daad6d7fda1541")},
}
// the same as chunks1, but avg chunksize is 1<<19
var chunks3 = []chunk{
chunk{1491586, 0x00023e586ea80000, parseDigest("4c008237df602048039287427171cef568a6cb965d1b5ca28dc80504a24bb061")},
chunk{671874, 0x000b98d4cdf00000, parseDigest("fa8a42321b90c3d4ce9dd850562b2fd0c0fe4bdd26cf01a24f22046a224225d3")},
chunk{643703, 0x000d4e8364d00000, parseDigest("5727a63c0964f365ab8ed2ccf604912f2ea7be29759a2b53ede4d6841e397407")},
chunk{1284146, 0x0012b527e4780000, parseDigest("16d04cafecbeae9eaedd49da14c7ad7cdc2b1cc8569e5c16c32c9fb045aa899a")},
chunk{823366, 0x000d1d6752180000, parseDigest("48662c118514817825ad4761e8e2e5f28f9bd8281b07e95dcafc6d02e0aa45c3")},
chunk{810134, 0x0016071b6e180000, parseDigest("f629581aa05562f97f2c359890734c8574c5575da32f9289c5ba70bfd05f3f46")},
chunk{567118, 0x00102a8242e00000, parseDigest("d4f0797c56c60d01bac33bfd49957a4816b6c067fc155b026de8a214cab4d70a")},
chunk{821315, 0x001b3e42c8180000, parseDigest("8ebd0fd5db0293bd19140da936eb8b1bbd3cd6ffbec487385b956790014751ca")},
chunk{1401057, 0x00045da878000000, parseDigest("001360af59adf4871ef138cfa2bb49007e86edaf5ac2d6f0b3d3014510991848")},
chunk{2311122, 0x0005cbd885380000, parseDigest("8276d489b566086d9da95dc5c5fe6fc7d72646dd3308ced6b5b6ddb8595f0aa1")},
chunk{608723, 0x001cfcd86f280000, parseDigest("518db33ba6a79d4f3720946f3785c05b9611082586d47ea58390fc2f6de9449e")},
chunk{980456, 0x0013edb7a7f80000, parseDigest("0121b1690738395e15fecba1410cd0bf13fde02225160cad148829f77e7b6c99")},
chunk{1140278, 0x0001f9f017e80000, parseDigest("28ca7c74804b5075d4f5eeb11f0845d99f62e8ea3a42b9a05c7bd5f2fca619dd")},
chunk{2015542, 0x00097bf5d8180000, parseDigest("6fe8291f427d48650a5f0f944305d3a2dbc649bd401d2655fc0bdd42e890ca5a")},
chunk{904752, 0x000e1863eff80000, parseDigest("62af1f1eb3f588d18aff28473303cc4731fc3cafcc52ce818fee3c4c2820854d")},
chunk{713072, 0x001f3bb1b9b80000, parseDigest("4bda9dc2e3031d004d87a5cc93fe5207c4b0843186481b8f31597dc6ffa1496c")},
chunk{675937, 0x001fec043c700000, parseDigest("5299c8c5acec1b90bb020cd75718aab5e12abb9bf66291465fd10e6a823a8b4a")},
chunk{1525894, 0x000b1574b1500000, parseDigest("2f238180e4ca1f7520a05f3d6059233926341090f9236ce677690c1823eccab3")},
chunk{1352720, 0x00018965f2e00000, parseDigest("afd12f13286a3901430de816e62b85cc62468c059295ce5888b76b3af9028d84")},
chunk{811884, 0x00155628aa100000, parseDigest("42d0cdb1ee7c48e552705d18e061abb70ae7957027db8ae8db37ec756472a70a")},
chunk{1282314, 0x001909a0a1400000, parseDigest("819721c2457426eb4f4c7565050c44c32076a56fa9b4515a1c7796441730eb58")},
chunk{1093738, 0x0017f5d048880000, parseDigest("5dddfa7a241b68f65d267744bdb082ee865f3c2f0d8b946ea0ee47868a01bbff")},
chunk{962003, 0x000b921f7ef80000, parseDigest("0cb5c9ebba196b441c715c8d805f6e7143a81cd5b0d2c65c6aacf59ca9124af9")},
chunk{856384, 0x00030ce2d9400000, parseDigest("7734b206d46f3f387e8661e81edf5b1a91ea681867beb5831c18aaa86632d7fb")},
chunk{533758, 0x0004435c53c00000, parseDigest("4da778a25b72a9a0d53529eccfe2e5865a789116cb1800f470d8df685a8ab05d")},
chunk{1128303, 0x0000c48517800000, parseDigest("08c6b0b38095b348d80300f0be4c5184d2744a17147c2cba5cc4315abf4c048f")},
chunk{800374, 0x000968473f900000, parseDigest("820284d2c8fd243429674c996d8eb8d3450cbc32421f43113e980f516282c7bf")},
chunk{2453512, 0x001e197c92600000, parseDigest("5fa870ed107c67704258e5e50abe67509fb73562caf77caa843b5f243425d853")},
chunk{665901, 0x00118c842cb80000, parseDigest("deceec26163842fdef6560311c69bf8a9871a56e16d719e2c4b7e4d668ceb61f")},
chunk{1986074, 0x000ae6c868000000, parseDigest("64cd64bf3c3bc389eb20df8310f0427d1c36ab2eaaf09e346bfa7f0453fc1a18")},
chunk{237392, 0x0000000000000001, parseDigest("fcd567f5d866357a8e299fd5b2359bb2c8157c30395229c4e9b0a353944a7978")},
}
func testWithData(t *testing.T, chnker *Chunker, testChunks []chunk, checkDigest bool) []Chunk {
chunks := []Chunk{}
pos := uint(0)
for i, chunk := range testChunks {
c, err := chnker.Next(nil)
if err != nil {
t.Fatalf("Error returned with chunk %d: %v", i, err)
}
if c.Start != pos {
t.Fatalf("Start for chunk %d does not match: expected %d, got %d",
i, pos, c.Start)
}
if c.Length != chunk.Length {
t.Fatalf("Length for chunk %d does not match: expected %d, got %d",
i, chunk.Length, c.Length)
}
if c.Cut != chunk.CutFP {
t.Fatalf("Cut fingerprint for chunk %d/%d does not match: expected %016x, got %016x",
i, len(chunks)-1, chunk.CutFP, c.Cut)
}
if checkDigest {
digest := hashData(c.Data)
if !bytes.Equal(chunk.Digest, digest) {
t.Fatalf("Digest fingerprint for chunk %d/%d does not match: expected %02x, got %02x",
i, len(chunks)-1, chunk.Digest, digest)
}
}
pos += c.Length
chunks = append(chunks, c)
}
_, err := chnker.Next(nil)
if err != io.EOF {
t.Fatal("Wrong error returned after last chunk")
}
if len(chunks) != len(testChunks) {
t.Fatal("Amounts of test and resulting chunks do not match")
}
return chunks
}
func getRandom(seed int64, count int) []byte {
buf := make([]byte, count)
rnd := rand.New(rand.NewSource(seed))
for i := 0; i < count; i += 4 {
r := rnd.Uint32()
buf[i] = byte(r)
buf[i+1] = byte(r >> 8)
buf[i+2] = byte(r >> 16)
buf[i+3] = byte(r >> 24)
}
return buf
}
func hashData(d []byte) []byte {
h := sha256.New()
h.Write(d)
return h.Sum(nil)
}
func TestChunker(t *testing.T) {
// setup data source
buf := getRandom(23, 32*1024*1024)
ch := New(bytes.NewReader(buf), testPol)
testWithData(t, ch, chunks1, true)
// setup nullbyte data source
buf = bytes.Repeat([]byte{0}, len(chunks2)*MinSize)
ch = New(bytes.NewReader(buf), testPol)
testWithData(t, ch, chunks2, true)
}
func TestChunkerWithCustomAverageBits(t *testing.T) {
buf := getRandom(23, 32*1024*1024)
ch := New(bytes.NewReader(buf), testPol)
// sligthly decrease averageBits to get more chunks
ch.SetAverageBits(19)
testWithData(t, ch, chunks3, true)
}
func TestChunkerReset(t *testing.T) {
buf := getRandom(23, 32*1024*1024)
ch := New(bytes.NewReader(buf), testPol)
testWithData(t, ch, chunks1, true)
ch.Reset(bytes.NewReader(buf), testPol)
testWithData(t, ch, chunks1, true)
}
func TestChunkerWithRandomPolynomial(t *testing.T) {
// setup data source
buf := getRandom(23, 32*1024*1024)
// generate a new random polynomial
start := time.Now()
p, err := RandomPolynomial()
if err != nil {
t.Fatal(err)
}
t.Logf("generating random polynomial took %v", time.Since(start))
start = time.Now()
ch := New(bytes.NewReader(buf), p)
t.Logf("creating chunker took %v", time.Since(start))
// make sure that first chunk is different
c, err := ch.Next(nil)
if err != nil {
t.Fatal(err.Error())
}
if c.Cut == chunks1[0].CutFP {
t.Fatal("Cut point is the same")
}
if c.Length == chunks1[0].Length {
t.Fatal("Length is the same")
}
if bytes.Equal(hashData(c.Data), chunks1[0].Digest) {
t.Fatal("Digest is the same")
}
}
func TestChunkerWithoutHash(t *testing.T) {
// setup data source
buf := getRandom(23, 32*1024*1024)
ch := New(bytes.NewReader(buf), testPol)
chunks := testWithData(t, ch, chunks1, false)
// test reader
for i, c := range chunks {
if uint(len(c.Data)) != chunks1[i].Length {
t.Fatalf("reader returned wrong number of bytes: expected %d, got %d",
chunks1[i].Length, len(c.Data))
}
if !bytes.Equal(buf[c.Start:c.Start+c.Length], c.Data) {
t.Fatalf("invalid data for chunk returned: expected %02x, got %02x",
buf[c.Start:c.Start+c.Length], c.Data)
}
}
// setup nullbyte data source
buf = bytes.Repeat([]byte{0}, len(chunks2)*MinSize)
ch = New(bytes.NewReader(buf), testPol)
testWithData(t, ch, chunks2, false)
}
func benchmarkChunker(b *testing.B, checkDigest bool) {
size := 32 * 1024 * 1024
rd := bytes.NewReader(getRandom(23, size))
ch := New(rd, testPol)
buf := make([]byte, MaxSize)
b.ResetTimer()
b.SetBytes(int64(size))
var chunks int
for i := 0; i < b.N; i++ {
chunks = 0
_, err := rd.Seek(0, 0)
if err != nil {
b.Fatalf("Seek() return error %v", err)
}
ch.Reset(rd, testPol)
cur := 0
for {
chunk, err := ch.Next(buf)
if err == io.EOF {
break
}
if err != nil {
b.Fatalf("Unexpected error occurred: %v", err)
}
if chunk.Length != chunks1[cur].Length {
b.Errorf("wrong chunk length, want %d, got %d",
chunks1[cur].Length, chunk.Length)
}
if chunk.Cut != chunks1[cur].CutFP {
b.Errorf("wrong cut fingerprint, want 0x%x, got 0x%x",
chunks1[cur].CutFP, chunk.Cut)
}
if checkDigest {
h := hashData(chunk.Data)
if !bytes.Equal(h, chunks1[cur].Digest) {
b.Errorf("wrong digest, want %x, got %x",
chunks1[cur].Digest, h)
}
}
chunks++
cur++
}
}
b.Logf("%d chunks, average chunk size: %d bytes", chunks, size/chunks)
}
func BenchmarkChunkerWithSHA256(b *testing.B) {
benchmarkChunker(b, true)
}
func BenchmarkChunker(b *testing.B) {
benchmarkChunker(b, false)
}
func BenchmarkNewChunker(b *testing.B) {
p, err := RandomPolynomial()
if err != nil {
b.Fatal(err)
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
New(bytes.NewBuffer(nil), p)
}
}

82
chunker/doc.go Normal file
View File

@@ -0,0 +1,82 @@
// Copyright 2014 Alexander Neumann. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
/*
Package chunker implements Content Defined Chunking (CDC) based on a rolling
Rabin Checksum.
Choosing a Random Irreducible Polynomial
The function RandomPolynomial() returns a new random polynomial of degree 53
for use with the chunker. The degree 53 is chosen because it is the largest
prime below 64-8 = 56, so that the top 8 bits of an uint64 can be used for
optimising calculations in the chunker.
A random polynomial is chosen selecting 64 random bits, masking away bits
64..54 and setting bit 53 to one (otherwise the polynomial is not of the
desired degree) and bit 0 to one (otherwise the polynomial is trivially
reducible), so that 51 bits are chosen at random.
This process is repeated until Irreducible() returns true, then this
polynomials is returned. If this doesn't happen after 1 million tries, the
function returns an error. The probability for selecting an irreducible
polynomial at random is about 7.5% ( (2^53-2)/53 / 2^51), so the probability
that no irreducible polynomial has been found after 100 tries is lower than
0.04%.
Verifying Irreducible Polynomials
During development the results have been verified using the computational
discrete algebra system GAP, which can be obtained from the website at
http://www.gap-system.org/.
For filtering a given list of polynomials in hexadecimal coefficient notation,
the following script can be used:
# create x over F_2 = GF(2)
x := Indeterminate(GF(2), "x");
# test if polynomial is irreducible, i.e. the number of factors is one
IrredPoly := function (poly)
return (Length(Factors(poly)) = 1);
end;;
# create a polynomial in x from the hexadecimal representation of the
# coefficients
Hex2Poly := function (s)
return ValuePol(CoefficientsQadic(IntHexString(s), 2), x);
end;;
# list of candidates, in hex
candidates := [ "3DA3358B4DC173" ];
# create real polynomials
L := List(candidates, Hex2Poly);
# filter and display the list of irreducible polynomials contained in L
Display(Filtered(L, x -> (IrredPoly(x))));
All irreducible polynomials from the list are written to the output.
Background Literature
An introduction to Rabin Fingerprints/Checksums can be found in the following articles:
Michael O. Rabin (1981): "Fingerprinting by Random Polynomials"
http://www.xmailserver.org/rabin.pdf
Ross N. Williams (1993): "A Painless Guide to CRC Error Detection Algorithms"
http://www.zlib.net/crc_v3.txt
Andrei Z. Broder (1993): "Some Applications of Rabin's Fingerprinting Method"
http://www.xmailserver.org/rabin_apps.pdf
Shuhong Gao and Daniel Panario (1997): "Tests and Constructions of Irreducible Polynomials over Finite Fields"
http://www.math.clemson.edu/~sgao/papers/GP97a.pdf
Andrew Kadatch, Bob Jenkins (2007): "Everything we know about CRC but afraid to forget"
http://crcutil.googlecode.com/files/crc-doc.1.0.pdf
*/
package chunker

39
chunker/example_test.go Normal file
View File

@@ -0,0 +1,39 @@
package chunker
import (
"bytes"
"crypto/sha256"
"fmt"
"io"
)
func ExampleChunker() {
// generate 32MiB of deterministic pseudo-random data
data := getRandom(23, 32*1024*1024)
// create a chunker
chunker := New(bytes.NewReader(data), Pol(0x3DA3358B4DC173))
// reuse this buffer
buf := make([]byte, 8*1024*1024)
for i := 0; i < 5; i++ {
chunk, err := chunker.Next(buf)
if err == io.EOF {
break
}
if err != nil {
panic(err)
}
fmt.Printf("%d %02x\n", chunk.Length, sha256.Sum256(chunk.Data))
}
// Output:
// 2163460 4b94cb2cf293855ea43bf766731c74969b91aa6bf3c078719aabdd19860d590d
// 643703 5727a63c0964f365ab8ed2ccf604912f2ea7be29759a2b53ede4d6841e397407
// 1528956 a73759636a1e7a2758767791c69e81b69fb49236c6929e5d1b654e06e37674ba
// 1955808 c955fb059409b25f07e5ae09defbbc2aadf117c97a3724e06ad4abd2787e6824
// 2222372 6ba5e9f7e1b310722be3627716cf469be941f7f3e39a4c3bcefea492ec31ee56
}

3
chunker/go.mod Normal file
View File

@@ -0,0 +1,3 @@
module github.com/restic/chunker
go 1.14

310
chunker/polynomials.go Normal file
View File

@@ -0,0 +1,310 @@
package chunker
import (
"crypto/rand"
"encoding/binary"
"errors"
"fmt"
"io"
"strconv"
)
// Pol is a polynomial from F_2[X].
type Pol uint64
// Add returns x+y.
func (x Pol) Add(y Pol) Pol {
r := Pol(uint64(x) ^ uint64(y))
return r
}
// mulOverflows returns true if the multiplication would overflow uint64.
// Code by Rob Pike, see
// https://groups.google.com/d/msg/golang-nuts/h5oSN5t3Au4/KaNQREhZh0QJ
func mulOverflows(a, b Pol) bool {
if a <= 1 || b <= 1 {
return false
}
c := a.mul(b)
d := c.Div(b)
if d != a {
return true
}
return false
}
func (x Pol) mul(y Pol) Pol {
if x == 0 || y == 0 {
return 0
}
var res Pol
for i := 0; i <= y.Deg(); i++ {
if (y & (1 << uint(i))) > 0 {
res = res.Add(x << uint(i))
}
}
return res
}
// Mul returns x*y. When an overflow occurs, Mul panics.
func (x Pol) Mul(y Pol) Pol {
if mulOverflows(x, y) {
panic("multiplication would overflow uint64")
}
return x.mul(y)
}
// Deg returns the degree of the polynomial x. If x is zero, -1 is returned.
func (x Pol) Deg() int {
// the degree of 0 is -1
if x == 0 {
return -1
}
// see https://graphics.stanford.edu/~seander/bithacks.html#IntegerLog
r := 0
if uint64(x)&0xffffffff00000000 > 0 {
x >>= 32
r |= 32
}
if uint64(x)&0xffff0000 > 0 {
x >>= 16
r |= 16
}
if uint64(x)&0xff00 > 0 {
x >>= 8
r |= 8
}
if uint64(x)&0xf0 > 0 {
x >>= 4
r |= 4
}
if uint64(x)&0xc > 0 {
x >>= 2
r |= 2
}
if uint64(x)&0x2 > 0 {
r |= 1
}
return r
}
// String returns the coefficients in hex.
func (x Pol) String() string {
return "0x" + strconv.FormatUint(uint64(x), 16)
}
// Expand returns the string representation of the polynomial x.
func (x Pol) Expand() string {
if x == 0 {
return "0"
}
s := ""
for i := x.Deg(); i > 1; i-- {
if x&(1<<uint(i)) > 0 {
s += fmt.Sprintf("+x^%d", i)
}
}
if x&2 > 0 {
s += "+x"
}
if x&1 > 0 {
s += "+1"
}
return s[1:]
}
// DivMod returns x / d = q, and remainder r,
// see https://en.wikipedia.org/wiki/Division_algorithm
func (x Pol) DivMod(d Pol) (Pol, Pol) {
if x == 0 {
return 0, 0
}
if d == 0 {
panic("division by zero")
}
D := d.Deg()
diff := x.Deg() - D
if diff < 0 {
return 0, x
}
var q Pol
for diff >= 0 {
m := d << uint(diff)
q |= (1 << uint(diff))
x = x.Add(m)
diff = x.Deg() - D
}
return q, x
}
// Div returns the integer division result x / d.
func (x Pol) Div(d Pol) Pol {
q, _ := x.DivMod(d)
return q
}
// Mod returns the remainder of x / d
func (x Pol) Mod(d Pol) Pol {
_, r := x.DivMod(d)
return r
}
// I really dislike having a function that does not terminate, so specify a
// really large upper bound for finding a new irreducible polynomial, and
// return an error when no irreducible polynomial has been found within
// randPolMaxTries.
const randPolMaxTries = 1e6
// RandomPolynomial returns a new random irreducible polynomial
// of degree 53 using the default System CSPRNG as source.
// It is equivalent to calling DerivePolynomial(rand.Reader).
func RandomPolynomial() (Pol, error) {
return DerivePolynomial(rand.Reader)
}
// DerivePolynomial returns an irreducible polynomial of degree 53
// (largest prime number below 64-8) by reading bytes from source.
// There are (2^53-2/53) irreducible polynomials of degree 53 in
// F_2[X], c.f. Michael O. Rabin (1981): "Fingerprinting by Random
// Polynomials", page 4. If no polynomial could be found in one
// million tries, an error is returned.
func DerivePolynomial(source io.Reader) (Pol, error) {
for i := 0; i < randPolMaxTries; i++ {
var f Pol
// choose polynomial at (pseudo)random
err := binary.Read(source, binary.LittleEndian, &f)
if err != nil {
return 0, err
}
// mask away bits above bit 53
f &= Pol((1 << 54) - 1)
// set highest and lowest bit so that the degree is 53 and the
// polynomial is not trivially reducible
f |= (1 << 53) | 1
// test if f is irreducible
if f.Irreducible() {
return f, nil
}
}
// If this is reached, we haven't found an irreducible polynomial in
// randPolMaxTries. This error is very unlikely to occur.
return 0, errors.New("unable to find new random irreducible polynomial")
}
// GCD computes the Greatest Common Divisor x and f.
func (x Pol) GCD(f Pol) Pol {
if f == 0 {
return x
}
if x == 0 {
return f
}
if x.Deg() < f.Deg() {
x, f = f, x
}
return f.GCD(x.Mod(f))
}
// Irreducible returns true iff x is irreducible over F_2. This function
// uses Ben Or's reducibility test.
//
// For details see "Tests and Constructions of Irreducible Polynomials over
// Finite Fields".
func (x Pol) Irreducible() bool {
for i := 1; i <= x.Deg()/2; i++ {
if x.GCD(qp(uint(i), x)) != 1 {
return false
}
}
return true
}
// MulMod computes x*f mod g
func (x Pol) MulMod(f, g Pol) Pol {
if x == 0 || f == 0 {
return 0
}
var res Pol
for i := 0; i <= f.Deg(); i++ {
if (f & (1 << uint(i))) > 0 {
a := x
for j := 0; j < i; j++ {
a = a.Mul(2).Mod(g)
}
res = res.Add(a).Mod(g)
}
}
return res
}
// qp computes the polynomial (x^(2^p)-x) mod g. This is needed for the
// reducibility test.
func qp(p uint, g Pol) Pol {
num := (1 << p)
i := 1
// start with x
res := Pol(2)
for i < num {
// repeatedly square res
res = res.MulMod(res, g)
i *= 2
}
// add x
return res.Add(2).Mod(g)
}
// MarshalJSON returns the JSON representation of the Pol.
func (x Pol) MarshalJSON() ([]byte, error) {
buf := strconv.AppendUint([]byte{'"'}, uint64(x), 16)
buf = append(buf, '"')
return buf, nil
}
// UnmarshalJSON parses a Pol from the JSON data.
func (x *Pol) UnmarshalJSON(data []byte) error {
if len(data) < 2 {
return errors.New("invalid string for polynomial")
}
n, err := strconv.ParseUint(string(data[1:len(data)-1]), 16, 64)
if err != nil {
return err
}
*x = Pol(n)
return nil
}

424
chunker/polynomials_test.go Normal file
View File

@@ -0,0 +1,424 @@
package chunker
import (
"strconv"
"testing"
)
var polAddTests = []struct {
x, y Pol
sum Pol
}{
{23, 16, 23 ^ 16},
{0x9a7e30d1e855e0a0, 0x670102a1f4bcd414, 0xfd7f32701ce934b4},
{0x9a7e30d1e855e0a0, 0x9a7e30d1e855e0a0, 0},
}
func TestPolAdd(t *testing.T) {
for i, test := range polAddTests {
if test.sum != test.x.Add(test.y) {
t.Errorf("test %d failed: sum != x+y", i)
}
if test.sum != test.y.Add(test.x) {
t.Errorf("test %d failed: sum != y+x", i)
}
}
}
func parseBin(s string) Pol {
i, err := strconv.ParseUint(s, 2, 64)
if err != nil {
panic(err)
}
return Pol(i)
}
var polMulTests = []struct {
x, y Pol
res Pol
}{
{1, 2, 2},
{
parseBin("1101"),
parseBin("10"),
parseBin("11010"),
},
{
parseBin("1101"),
parseBin("11"),
parseBin("10111"),
},
{
0x40000000,
0x40000000,
0x1000000000000000,
},
{
parseBin("1010"),
parseBin("100100"),
parseBin("101101000"),
},
{
parseBin("100"),
parseBin("11"),
parseBin("1100"),
},
{
parseBin("11"),
parseBin("110101"),
parseBin("1011111"),
},
{
parseBin("10011"),
parseBin("110101"),
parseBin("1100001111"),
},
}
func TestPolMul(t *testing.T) {
for i, test := range polMulTests {
m := test.x.Mul(test.y)
if test.res != m {
t.Errorf("TestPolMul failed for test %d: %v * %v: want %v, got %v",
i, test.x, test.y, test.res, m)
}
m = test.y.Mul(test.x)
if test.res != test.y.Mul(test.x) {
t.Errorf("TestPolMul failed for %d: %v * %v: want %v, got %v",
i, test.x, test.y, test.res, m)
}
}
}
func TestPolMulOverflow(t *testing.T) {
defer func() {
// try to recover overflow error
err := recover()
if e, ok := err.(string); ok && e == "multiplication would overflow uint64" {
return
}
t.Logf("invalid error raised: %v", err)
// re-raise error if not overflow
panic(err)
}()
x := Pol(1 << 63)
x.Mul(2)
t.Fatal("overflow test did not panic")
}
var polDivTests = []struct {
x, y Pol
res Pol
}{
{10, 50, 0},
{0, 1, 0},
{
parseBin("101101000"), // 0x168
parseBin("1010"), // 0xa
parseBin("100100"), // 0x24
},
{2, 2, 1},
{
0x8000000000000000,
0x8000000000000000,
1,
},
{
parseBin("1100"),
parseBin("100"),
parseBin("11"),
},
{
parseBin("1100001111"),
parseBin("10011"),
parseBin("110101"),
},
}
func TestPolDiv(t *testing.T) {
for i, test := range polDivTests {
m := test.x.Div(test.y)
if test.res != m {
t.Errorf("TestPolDiv failed for test %d: %v * %v: want %v, got %v",
i, test.x, test.y, test.res, m)
}
}
}
func TestPolDeg(t *testing.T) {
var x Pol
if x.Deg() != -1 {
t.Errorf("deg(0) is not -1: %v", x.Deg())
}
x = 1
if x.Deg() != 0 {
t.Errorf("deg(1) is not 0: %v", x.Deg())
}
for i := 0; i < 64; i++ {
x = 1 << uint(i)
if x.Deg() != i {
t.Errorf("deg(1<<%d) is not %d: %v", i, i, x.Deg())
}
}
}
var polModTests = []struct {
x, y Pol
res Pol
}{
{10, 50, 10},
{0, 1, 0},
{
parseBin("101101001"),
parseBin("1010"),
parseBin("1"),
},
{2, 2, 0},
{
0x8000000000000000,
0x8000000000000000,
0,
},
{
parseBin("1100"),
parseBin("100"),
parseBin("0"),
},
{
parseBin("1100001111"),
parseBin("10011"),
parseBin("0"),
},
}
func TestPolModt(t *testing.T) {
for i, test := range polModTests {
res := test.x.Mod(test.y)
if test.res != res {
t.Errorf("test %d failed: want %v, got %v", i, test.res, res)
}
}
}
func BenchmarkPolDivMod(t *testing.B) {
f := Pol(0x2482734cacca49)
g := Pol(0x3af4b284899)
for i := 0; i < t.N; i++ {
g.DivMod(f)
}
}
func BenchmarkPolDiv(t *testing.B) {
f := Pol(0x2482734cacca49)
g := Pol(0x3af4b284899)
for i := 0; i < t.N; i++ {
g.Div(f)
}
}
func BenchmarkPolMod(t *testing.B) {
f := Pol(0x2482734cacca49)
g := Pol(0x3af4b284899)
for i := 0; i < t.N; i++ {
g.Mod(f)
}
}
func BenchmarkPolDeg(t *testing.B) {
f := Pol(0x3af4b284899)
d := f.Deg()
if d != 41 {
t.Fatalf("BenchmalPolDeg: Wrong degree %d returned, expected %d",
d, 41)
}
for i := 0; i < t.N; i++ {
f.Deg()
}
}
func TestRandomPolynomial(t *testing.T) {
_, err := RandomPolynomial()
if err != nil {
t.Fatal(err)
}
}
func BenchmarkRandomPolynomial(t *testing.B) {
for i := 0; i < t.N; i++ {
_, err := RandomPolynomial()
if err != nil {
t.Fatal(err)
}
}
}
func TestExpandPolynomial(t *testing.T) {
pol := Pol(0x3DA3358B4DC173)
s := pol.Expand()
if s != "x^53+x^52+x^51+x^50+x^48+x^47+x^45+x^41+x^40+x^37+x^36+x^34+x^32+x^31+x^27+x^25+x^24+x^22+x^19+x^18+x^16+x^15+x^14+x^8+x^6+x^5+x^4+x+1" {
t.Fatal("wrong result")
}
}
var polIrredTests = []struct {
f Pol
irred bool
}{
{0x38f1e565e288df, false},
{0x3DA3358B4DC173, true},
{0x30a8295b9d5c91, false},
{0x255f4350b962cb, false},
{0x267f776110a235, false},
{0x2f4dae10d41227, false},
{0x2482734cacca49, true},
{0x312daf4b284899, false},
{0x29dfb6553d01d1, false},
{0x3548245eb26257, false},
{0x3199e7ef4211b3, false},
{0x362f39017dae8b, false},
{0x200d57aa6fdacb, false},
{0x35e0a4efa1d275, false},
{0x2ced55b026577f, false},
{0x260b012010893d, false},
{0x2df29cbcd59e9d, false},
{0x3f2ac7488bd429, false},
{0x3e5cb1711669fb, false},
{0x226d8de57a9959, false},
{0x3c8de80aaf5835, false},
{0x2026a59efb219b, false},
{0x39dfa4d13fb231, false},
{0x3143d0464b3299, false},
}
func TestPolIrreducible(t *testing.T) {
for _, test := range polIrredTests {
if test.f.Irreducible() != test.irred {
t.Errorf("Irreducibility test for Polynomial %v failed: got %v, wanted %v",
test.f, test.f.Irreducible(), test.irred)
}
}
}
func BenchmarkPolIrreducible(b *testing.B) {
// find first irreducible polynomial
var pol Pol
for _, test := range polIrredTests {
if test.irred {
pol = test.f
break
}
}
for i := 0; i < b.N; i++ {
if !pol.Irreducible() {
b.Errorf("Irreducibility test for Polynomial %v failed", pol)
}
}
}
var polGCDTests = []struct {
f1 Pol
f2 Pol
gcd Pol
}{
{10, 50, 2},
{0, 1, 1},
{
parseBin("101101001"),
parseBin("1010"),
parseBin("1"),
},
{2, 2, 2},
{
parseBin("1010"),
parseBin("11"),
parseBin("11"),
},
{
0x8000000000000000,
0x8000000000000000,
0x8000000000000000,
},
{
parseBin("1100"),
parseBin("101"),
parseBin("11"),
},
{
parseBin("1100001111"),
parseBin("10011"),
parseBin("10011"),
},
{
0x3DA3358B4DC173,
0x3DA3358B4DC173,
0x3DA3358B4DC173,
},
{
0x3DA3358B4DC173,
0x230d2259defd,
1,
},
{
0x230d2259defd,
0x51b492b3eff2,
parseBin("10011"),
},
}
func TestPolGCD(t *testing.T) {
for i, test := range polGCDTests {
gcd := test.f1.GCD(test.f2)
if test.gcd != gcd {
t.Errorf("GCD test %d (%+v) failed: got %v, wanted %v",
i, test, gcd, test.gcd)
}
gcd = test.f2.GCD(test.f1)
if test.gcd != gcd {
t.Errorf("GCD test %d (%+v) failed: got %v, wanted %v",
i, test, gcd, test.gcd)
}
}
}
var polMulModTests = []struct {
f1 Pol
f2 Pol
g Pol
mod Pol
}{
{
0x1230,
0x230,
0x55,
0x22,
},
{
0x0eae8c07dbbb3026,
0xd5d6db9de04771de,
0xdd2bda3b77c9,
0x425ae8595b7a,
},
}
func TestPolMulMod(t *testing.T) {
for i, test := range polMulModTests {
mod := test.f1.MulMod(test.f2, test.g)
if mod != test.mod {
t.Errorf("MulMod test %d (%+v) failed: got %v, wanted %v",
i, test, mod, test.mod)
}
}
}

Some files were not shown because too many files have changed in this diff Show More