Downloading public metabolomics data¶
This tutorial explains how to download public mass spectral raw data from MetaboLights, Metabolomics Workbench, or MassIVE/GNPS through GNPS tooling.
There are three main ways to download data.
Download raw data file-by-file in the GNPS2 Dashboard/Pan-ReDU Metadata Dashboard¶
Download raw data file-by-file in the GNPS2 Dataset Explorer¶
Download raw data in batch through our public data downloader¶
This can be done through our public data downloader, which requires just a few command line steps.
This downloadpublicdata tool enables you to download several different types of data in different ways using MRI reference to each public file. Specifically, you are able to download:
- MRIs of open formats, e.g. mzML, mzXML, MGF
- MRIs of vendor raw data, e.g. .raw, .d, etc. while maintaining the full folder structure for formats like .d
- MRIs of vendor raw data automatically converted to the mzML open format
Using Downloader Steps¶
-
Clone the repository through your terminal by running:
git clone https://github.com/Wang-Bioinformatics-Lab/downloadpublicdata.git
- Navigate to the directory in your terminal with:
cd downloadpublicdata
- Install required packages with:
pip install -r requirements.txt
- Test if it works with:
python ./bin/download_public_data_usi.py ./data/test_download.tsv ./data/ ./data/summary.tsv
- Replace
./data/test_download.tsv
with the path to a TSV file containing the USIs you want to download. An example file can be found here. This should download the raw data into the folder./data/
.
Note: By default, files are converted to
.mzML
format before the download. If you wish to download without conversion, you can use the--noconversion
flag:
python ./bin/download_public_data_usi.py ./data/test_download.tsv ./data/ ./data/summary.tsv --noconversion
Further details can be found in the GitHub README.