When coding software / running models / creating extensive configurations, your code space / project folder / configurations will most likely contain a mix of text files, data files and software binaries. All these files need to be place in a repository under version control and need to be managed as a whole. For the text files you will want to be able to compare differences between versions, in order to understand what has changed over time. This will not be the case for binary files or very large data files as humans are generally not well equipped to compare bits and bytes.
In the 'past' SVN was an ideal place to store your whole repository in one place. Currently SVN is in the process of being phased out and as a replacement GITHUB has been introduced. What the advantages / disadvantages of both systems are will not be discussed here. Instead we will focus on how to setup your GITHUB repository to include both your text based files as your larger binaries.
The problem with GIT (and therefor also GITHUB) is that it is not designed to handle large and or binary files. To overcome this problem GIT Large File Storage (LFS) was introduced. The basic idea behind GIT LFS is that the actual binary file is not stored in your GITHUB repository. Instead only a reference to this file is stored. The actual binary file is stored in an Object Storage location (S3 bucket).
Although GITHUB offers LFS out-of-the-box, using it's own cloud base storage facilities, Deltares has chosen not to use this. The reason being the costs involved in storing data on the servers of GITHUB and also the costs involved in up- and downloading data to and from these servers. Instead Deltares has chosen to host it's own Object Storage in the form of a MinIO server.
So in short. You will have a GIT repository in one of the two Deltares GITHUB organizations; Deltares or Deltares-research, which will contain only your text base files and small data files. While your large files or binary files will be stored on the Deltares Minio server.
To manage all your text-, large- and binary files as a single project you have three options to connect your GITHUB repository to the Deltares MinIO object store:
Prerequisite: In the below guides we expect the user to have a basic understanding of GIT and its related commands.
How to setup GIT-LFS?
Expand | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Setup your GITHUB repository:First you must setup your GITHUB repository by cloning this to your computer. If you do not yet have a GITHUB repository, you can request one 'Request a repository' page. Once your GITHUB repository is in place and up-to-date, continue with the GIT-LFS instructions. Setup GIT-LFS:Go to the GIT-LFS website and follow the instructions on how to download and install GIT-LFS. A good starting point is the 'Getting Started' section on the home page. GIT-LFS configuration files:.gitattributes: Stored in the root folder of your repository. This file contains patterns of all files that GIT-LFS should track and manage as 'large files'.
Useful examples for .gitattributes can be found here .lfsconfig: Stored in the root folder of your repository. This file is necessary to point GIT-LFS to the MinIO server of Deltares instead of the default GITHUB LFS.
The URL points to the GIT-LFS Proxy endpoint. Currently the proxy listens on port 8080 but this will soon be configurable in the proxy configuration fileIn this example the proxy listens on port 8080 but this will is configurable in the proxy configuration file config.json Setup GIT-LFS API ProxyBesides installing GIT-LFS you will also need to install the GIT LFS Proxy that is shown in the figure below. This proxy is needed to seamlessly integrate GIT-LFS with the S3-based MinIO API. The proxy serves as a bridge between GIT-LFS and the S3 storage protocol, and will translate Git LFS API calls into S3 API calls, ensuring that files tracked by Git LFS are correctly stored on S3. How this works in practice is as follows:
The proxy installation files can be found here:
The proxy installation can be placed anywhere on your computer. You can start the proxy by opening a command window and executing git-lfs-minio.exe. A more robust solution is to run the executable as a Windows Service: How to install a Windows Service:code
GIT-LFS Proxy configuration files:config.json: Stored outside of the root folder of your repository. This file contains the MinIO end-point server of Deltares
Setup credentials:(THIS NEEDS TO BE UPDATED: The GIT-LFS credentials are separate from GITHUB ) In order for GIT-LFS to login to the MinIO API, it is required to provide credentials. This can be configured in the Windows Credential Manager on your (Windows) computer.Minio access- and secretkey can be created in the MinIO console https://s3-console.deltares.nl. You will also require a MinIO account and a Bucket. You can make request for this together with your request for a GITHUB repository Open the Credential Manager tool from the Control Panel and the credentials for the MinIO API as a new Generic Credentials entry. |
How to setup DVC?
Expand | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Setup your GITHUB repository:First you must setup your GITHUB repository by cloning this to your computer. If you do not yet have a GITHUB repository, you can request one 'Request a repository' page. Once your GITHUB repository is in place and up-to-date, continue with the DVC instructions. Setup DVC:Go to the DVC website and follow the instructions on how to download and install DVC. A good starting point is the 'Get Started' page. DVC configuration files:.dvcignore: Stored in the root folder of your DVC project. This file contains patterns of all files that DVC should ignore.
.dvc/.gitignore: Stored in the .dvc folder. This file is similar to the .dvcignore file however this file contains patterns of all files that GIT should ignore.
.dvc/config: Stored in the .dvc folder. Contains all DVC configuration that can be shared and can be uploaded into your repository.
.dvc/config.local: Stored in the .dvc folder. Contains all DVC configuration that cannot be shared nor uploaded into your repository
|
How to setup custom scripts?
Expand | |||||||
---|---|---|---|---|---|---|---|
How you will setup your scripting environment will strongly depend on the codding language of the source code in your GITHUB repository. But in all cases you can take advantage of the REWIND functionality of MinIO. This functionality allows you to restore your data folder of files to a given point in time. For Python an example can be found here: https://github.com/robin-deltares/minio-py-rewind/blob/main/minio_rewind.py
|
Choosing between the above solutions
Expand |
---|
GIT-LFSgit-lfs is intended to be transparent to git, therefore it requires a customized server. Its learning process is short and fast. Some configuration commands, and bang! it is running, storing large files independently of the git repository. That's its only function, and it does it fine. Having an additional server is not a drawback, but instead a requirement for such transparency. Once configured, files are just handled by git, by means of git hooks (endpoints that are activated after git operations). Limitations of GIT-LFS can are documented here. DVCdvc is intended to provide independent management of large files for the final user. What dvc basically does is this: it just makes git ignore the files that you wish to control (adding them to Some comparisons with related technologies can be found here. Custom scriptsscripting is the most flexible way to go. It allows you to access all MinIO's API functionality in the language of your preference. To help you get started MinIO offers the user a variety of SDKs. Scripting does imply that you as developer have enough coding skills are also the maintainer. However with enough real-word examples this option should not be too difficult. Note to developers: Please provide your examples to github-support@deltares.nl so we can incorporate them into this manual. |