Baby Steps Towards De-Sci (Decentralized Science) with Datalad + git annex + IPFS (Interplanetary File Storage System)

Project info


Baby Steps Towards a Decentralized Science (De-Sci) with Datalad + git annex + Interplanetary File Storage System and a Dash of Ethereum


Project lead: Shady El Damaty - @hebbianloop

Project collaborators: @nkhalsa

Registered Brainhack Global 2020 Event: Brainhack DC

Project Description: A substantial barrier to open science practice is the sharing and accessibility of datasets. Often datasets are stored in a centralized location such as a lab’s server or in costly enterprise cloud systems.

There are multiple problems associated with centralized data storage: 1) outages may make data temporarily unavailable, 2) data can disappear forever if the central location suffers failure, 3) centralized data storage enables censorship and can limit accessibility.

The datalad version control software takes steps to address this by including git annex in the back-end to support multiple types of “special remotes” for downloading and publishing datasets. However, there has been no attempt to bridge a decentralized file storage protocol into the datalad suite of supported remotes.

The interplanetary file system allows peer-to-peer sharing of data and storage on distributed networks such as bittorrent, filecoin and cloudflare.

Data storage on these distributed networks also enables tokenization of individual datasets on the Ethereum blockchain and is an important first step for establishing data marketplaces for the peer-to-peer exchange of data and models.

The current project aims to explore the requirements and feasibility of upgrading datalad to support ipfs by including wrapper code for the definition of an ipfs special remote. Once implemented, the project will satisfy requirements for tools needed to automate the tokenization of datasets on the ethereum blockchain.

What we are Doing Including IPFS special remote capability to datalad

For Who? For Decentralized Science!

Why? Centralized data storage is not sustainable in the era of web 3.0

Resources Git Annex IPFS Datalad FAQ IPFS Infura (IPFS API) A tokenized brain

Data to use: Open Neuro

Link to project repository/sources:

Goals for Brainhack Global 2020:

Good first issues:

  1. How does datalad work with special remotes under the hood? Can you set up your own ftp/ssh special remote?
  2. Demonstrate git annex special remote with IPFS.
  3. Add special remote wrapper/plugin to datalad core
  4. Test on multiple machines/environments
  5. Pull request on datalad repository
  6. Create tokenized dataset on ethereum blockchain

Skills: You don’t require much background besides familiarity with the terminal and working with the command line in a unix-y environment. We will work together and research how to add the special remote. Familiarity with git highly recommended.

Tools/Software/Methods to Use: git git annex datalad python

Communication channels:

Project labels

Project Submission

Submission checklist

Once the issue is submitted, please check items in this list as you add under ‘Additional project info’

We would like to think about how you will credit and onboard new members to your project. If you’d like to share your thoughts with future project participants, you can include information about:

Jan 1, 0001 12:00 AM