Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

text

Table of Contents
outlinetrue

DVC

DVC stands for Data Version Control and it does exactly that. It is built on top of Git and so it uses very similar commands, which can be both easy and confusing at the same time. DVC ensures your large files themselves are not tracked by Git. Instead, it keeps the large files in its own cache and writes simple text files, which store the unique "version code" (a hash), each time you commit to DVC. These textfiles then have to be committed to Git. This may seem a bit cumbersome, as you have to commit data first to DVC, and then its version code to Git, but this allows the user to utilize the full potential of both Github as well as Cloud Storages (Like Amazon S3, Google Cloud, Microsoft Azure etc.). The version history is all stored in Git and can be pushed to Github or Gitlab, but the data itsself can be stored on different platforms more specialized in handling large files.


Find more on how to install and use DVC here