Title:
Project lead: Shady El Damaty - @hebbianloop
Project collaborators: @nkhalsa
Registered Brainhack Global 2020 Event: Brainhack DC
Project Description: A substantial barrier to open science practice is the sharing and accessibility of datasets. Often datasets are stored in a centralized location such as a lab’s server or in costly enterprise cloud systems.
There are multiple problems associated with centralized data storage: 1) outages may make data temporarily unavailable, 2) data can disappear forever if the central location suffers failure, 3) centralized data storage enables censorship and can limit accessibility.
The datalad version control software takes steps to address this by including git annex in the back-end to support multiple types of “special remotes” for downloading and publishing datasets. However, there has been no attempt to bridge a decentralized file storage protocol into the datalad suite of supported remotes.
The interplanetary file system allows peer-to-peer sharing of data and storage on distributed networks such as bittorrent, filecoin and cloudflare.
Data storage on these distributed networks also enables tokenization of individual datasets on the Ethereum blockchain and is an important first step for establishing data marketplaces for the peer-to-peer exchange of data and models.
The current project aims to explore the requirements and feasibility of upgrading datalad to support ipfs by including wrapper code for the definition of an ipfs special remote. Once implemented, the project will satisfy requirements for tools needed to automate the tokenization of datasets on the ethereum blockchain.
What we are Doing Including IPFS special remote capability to datalad
For Who? For Decentralized Science!
Why? Centralized data storage is not sustainable in the era of web 3.0
Resources Git Annex IPFS Datalad FAQ IPFS Infura (IPFS API) A tokenized brain
Data to use: Open Neuro
Link to project repository/sources:
Goals for Brainhack Global 2020:
Create test IPFS repository with git annex
Research datlad CLI and outline strategy for modifying special remotes. Open well-documented and clear issue on datalad github repository.
Implement modification to datalad for special remotes with ipfs
Test modification with open source data and host on IPFS
Tokenize an example dataset on the ethereum blockchain
Good first issues:
Skills: You don’t require much background besides familiarity with the terminal and working with the command line in a unix-y environment. We will work together and research how to add the special remote. Familiarity with git highly recommended.
Tools/Software/Methods to Use: git git annex datalad python
Communication channels: https://mattermost.brainhack.org/brainhack/channels/bhg-washingtondc
Project labels
Type of project: #coding_methods, #data_management
Project development status: #0_concept_no_content
Topic of the project: #reproducible_scientific_methods
Tools used in the project: #BIDS, #Datalad, #Jupyter
Tools skill level required to enter the project (more than one possible): #familiar, #no_skills_required
Programming language used in the project: #Python, R, #shell_scripting, #Unix_command_line, #Web, workflows
Modalities involved in the project (if any): none
Git skills reuired to enter the project (more than one possible): #2_branches_PRs
I added all of the labels I want an associate to my project
Once the issue is submitted, please check items in this list as you add under ‘Additional project info’
We would like to think about how you will credit and onboard new members to your project. If you’d like to share your thoughts with future project participants, you can include information about: