duplicacy

A new generation cloud backup tool

  • 所有者: gilbertchen/duplicacy
  • 平台:
  • 許可證: Other
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

Duplicacy: A lock-free deduplication cloud backup tool

Duplicacy is a new generation cross-platform cloud backup tool based on the idea of Lock-Free Deduplication.

This repository hosts source code, design documents, and binary releases of the command line version of Duplicacy. There is also a Duplicacy GUI frontend built for Windows and Mac OS X available from https://duplicacy.com.

There is a special edition of Duplicacy developed for VMware vSphere (ESXi) named Vertical Backup that can back up virtual machine files on ESXi to local drives, network or cloud storages.

Features

There are 3 core advantages of Duplicacy over any other open-source or commercial backup tools:

  • Duplicacy is the only cloud backup tool that allows multiple computers to back up to the same cloud storage, taking advantage of cross-computer deduplication whenever possible, without direct communication among them. This feature literally turns any cloud storage server supporting only a basic set of file operations into a sophisticated deduplication-aware server.

  • Unlike other chunk-based backup tools where chunks are grouped into pack files and a chunk database is used to track which chunks are stored inside each pack file, Duplicacy takes a database-less approach where every chunk is saved independently using its hash as the file name to facilitate quick lookups. The lack of a centralized chunk database not only makes the implementation less error-prone, but also produces a highly maintainable piece of software with plenty of room for development of new features and usability enhancements.

  • Duplicacy is fast. While the performance wasn't the top-priority design goal, Duplicacy has been shown to outperform other backup tools by a considerable margin, as indicated by the following results obtained from a benchmarking experiment backing up the Linux code base using Duplicacy and 3 other open-source backup tools.

Comparison of Duplicacy, restic, Attic, duplicity

Getting Started

Storages

Duplicacy currently provides the following storage backends:

  • Local disk
  • SFTP
  • Dropbox
  • Amazon S3
  • Wasabi
  • DigitalOcean Spaces
  • Google Cloud Storage
  • Microsoft Azure
  • Backblaze B2
  • Google Drive
  • Microsoft OneDrive
  • Hubic
  • OpenStack Swift
  • WebDAV (under beta testing)
  • pcloud (via WebDAV)
  • Box.com (via WebDAV)

Please consult the wiki page on how to set up Duplicacy to work with each cloud storage.

For reference, the following chart shows the running times (in seconds) of backing up the Linux code base to each of those supported storages:

Comparison of Cloud Storages

For complete benchmark results please visit https://github.com/gilbertchen/cloud-storage-comparison.

Comparison with Other Backup Tools

duplicity works by applying the rsync algorithm (or more specific, the librsync library)
to find the differences from previous backups and only then uploading the differences. It is the only existing backup tool with extensive cloud support -- the long list of storage backends covers almost every cloud provider one can think of. However, duplicity's biggest flaw lies in its incremental model -- a chain of dependent backups starts with a full backup followed by a number of incremental ones, and ends when another full backup is uploaded. Deleting one backup will render useless all the subsequent backups on the same chain. Periodic full backups are required, in order to make previous backups disposable.

bup also uses librsync to split files into chunks but save chunks in the git packfile format. It doesn't support any cloud storage, or deletion of old backups.

Duplicati is one of the first backup tools that adopt the chunk-based approach to split files into chunks which are then uploaded to the storage. The chunk-based approach got the incremental backup model right in the sense that every incremental backup is actually a full snapshot. As Duplicati splits files into fixed-size chunks, deletions or insertions of a few bytes will foil the deduplication. Cloud support is extensive, but multiple clients can't back up to the same storage location.

Attic has been acclaimed by some as the Holy Grail of backups. It follows the same incremental backup model like Duplicati, but embraces the variable-size chunk algorithm for better performance and higher deduplication efficiency (not susceptible to byte insertion and deletion any more). Deletions of old backup is also supported. However, no cloud backends are implemented. Although concurrent backups from multiple clients to the same storage is in theory possible by the use of locking, it is
not recommended by the developer due to chunk indices being kept in a local cache.
Concurrent access is not only a convenience; it is a necessity for better deduplication. For instance, if multiple machines with the same OS installed can back up their entire drives to the same storage, only one copy of the system files needs to be stored, greatly reducing the storage space regardless of the number of machines. Attic still adopts the traditional approach of using a centralized indexing database to manage chunks, and relies heavily on caching to improve performance. The presence of exclusive locking makes it hard to be extended to cloud storages.

restic is a more recent addition. It uses a format similar to the git packfile format. Multiple clients backing up to the same storage are still guarded by
locks, and because a chunk database is used, deduplication isn't real-time (different clients sharing the same files will upload different copies of the same chunks). A prune operation will completely block all other clients connected to the storage from doing their regular backups. Moreover, since most cloud storage services do not provide a locking service, the best effort is to use some basic file operations to simulate a lock, but distributed locking is known to be a hard problem and it is unclear how reliable restic's lock implementation is. A faulty implementation may cause a prune operation to accidentally delete data still in use, resulting in unrecoverable data loss. This is the exact problem that we avoided by taking the lock-free approach.

The following table compares the feature lists of all these backup tools:, Feature/Tool, duplicity, bup, Duplicati, Attic, restic, Duplicacy, :------------------:, :---------:, :---:, :-----------------:, :---------------:, :-----------------:, :-------------:, Incremental Backup, Yes, Yes, Yes, Yes, Yes, Yes, Full Snapshot, No, Yes, Yes, Yes, Yes, Yes, Compression, Yes, Yes, Yes, Yes, No, Yes, Deduplication, Weak, Yes, Weak, Yes, Yes, Yes, Encryption, Yes, Yes, Yes, Yes, Yes, Yes, Deletion, No, No, Yes, Yes, No, Yes, Concurrent Access, No, No, No, Not recommended, Exclusive locking, Lock-free, Cloud Support, Extensive, No, Extensive, No, Limited, Extensive, Snapshot Migration, No, No, No, No, No, Yes, ## License

  • Free for personal use or commercial trial
  • Non-trial commercial use requires per-computer CLI licenses available from duplicacy.com at a cost of $50 per year
  • The computer with a valid commercial license for the GUI version may run the CLI version without a CLI license
  • CLI licenses are not required to restore or manage backups; only the backup command requires valid CLI licenses
  • Modification and redistribution are permitted, but commercial use of derivative works is subject to the same requirements of this license

主要指標

概覽
名稱與所有者gilbertchen/duplicacy
主編程語言Go
編程語言Go (語言數: 2)
平台
許可證Other
所有者活动
創建於2016-02-23 01:28:10
推送於2025-05-03 02:37:15
最后一次提交2025-05-02 15:39:59
發布數57
最新版本名稱v3.2.5 (發布於 )
第一版名稱v0.1.1 (發布於 )
用户参与
星數5.5k
關注者數94
派生數345
提交數830
已啟用問題?
問題數524
打開的問題數292
拉請求數84
打開的拉請求數30
關閉的拉請求數37
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?