Getting Started
Getting Started with ModiFinder
ModiFinder is your Swiss Army knife for mass spectrometry data analysis in Python. This guide will get you up and running quickly.
Installation
ModiFinder requires Python 3.9 or above.
This turorial assumes you have already setup the Python environment you want to work in and intend to install modifinder inside of it. If you want
to create and work with Python virtual environments, please follow instructions
on venv <https://docs.python.org/3/library/venv.html>_ and virtual environments <http://docs.python-guide.org/en/latest/dev/virtualenvs/>_ or use Conda/Mamba <https://github.com/mamba-org/mamba>_ to create a new environment.
First, make sure you have the latest version of pip installed. `Pip documentation
https://pip.pypa.io/en/stable/installing/`_
Using pip (Recommended)
pip install modifinder
From Source
git clone https://github.com/your-repo/modifinder.git
cd modifinder
pip install -e .
Verify Installation
import modifinder
print(modifinder.__version__)
Core Concepts
ModiFinder revolves around two main classes:
Spectrum: Represents MS/MS spectral data
Compound: Represents a chemical compound with its structure and spectrum
These classes can be created from various sources: GNPS identifiers, raw data arrays, MGF files, SMILES strings, and more.
Your First Steps
1. Fetch and Visualize a Compound
from modifinder import Compound
from modifinder.utilities import visualizer as viz
import matplotlib.pyplot as plt
# Get compound from GNPS
compound = Compound("CCMSLIB00010113829")
# Draw the molecule
img = viz.draw_molecule(compound.structure, label=compound.name)
plt.figure(figsize=(8, 8))
plt.imshow(img)
plt.axis('off')
plt.show()
print(f"Compound: {compound.name}")
print(f"Formula: {compound.formula if hasattr(compound, 'formula') else 'N/A'}")
print(f"Peaks: {len(compound.spectrum.mz)}")
2. Work with Spectra
from modifinder import Spectrum
# Create from GNPS
spectrum = Spectrum("mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00010113829")
# Access data
print(f"Precursor m/z: {spectrum.precursor_mz}")
print(f"Number of peaks: {len(spectrum.mz)}")
print(f"Base peak m/z: {spectrum.mz[spectrum.intensity.argmax()]}")
# Normalize
spectrum.normalize()
print(f"Max intensity after normalization: {max(spectrum.intensity)}")
3. Process MGF Files
from modifinder.utilities import general_utils as gu
# Read MGF file
df = gu.read_mgf("data.mgf")
print(f"Loaded {len(df)} spectra")
print(f"Columns: {df.columns.tolist()}")
# Filter by precursor mass
filtered = df[(df['precursor_mz'] >= 200) & (df['precursor_mz'] <= 500)]
print(f"Filtered to {len(filtered)} spectra")
4. Compare Structures
from modifinder import Compound
from modifinder.utilities import visualizer as viz
# Two related compounds
compound1 = Compound("CCMSLIB00010113829")
compound2 = Compound("CCMSLIB00010125628")
# Visualize the difference
img = viz.draw_modifications(
compound1.structure,
compound2.structure,
show_legend=True
)
plt.figure(figsize=(12, 6))
plt.imshow(img)
plt.axis('off')
plt.title("Structural Comparison")
plt.show()
5. Find Modification Sites
Now for ModiFinder’s main purpose:
from modifinder import ModiFinder, Compound
# Known and modified compounds
known = Compound("CCMSLIB00010113829")
modified = Compound("CCMSLIB00010125628")
# Run ModiFinder
mf = ModiFinder(known, modified, mz_tolerance=0.01, ppm_tolerance=40)
probabilities = mf.generate_probabilities()
# Visualize prediction
img = mf.draw_prediction(
probabilities,
known.id,
show_legend=True,
show_labels=True
)
plt.figure(figsize=(10, 10))
plt.imshow(img)
plt.axis('off')
plt.title("Predicted Modification Sites")
plt.show()
Common Use Cases
Extract Data from GNPS
from modifinder.utilities import network
# Fetch data
data = network.get_data("CCMSLIB00010113829")
print(f"Name: {data['compound_name']}")
print(f"SMILES: {data['smiles']}")
print(f"Precursor: {data['precursor_mz']}")
Batch Process Multiple Compounds
from modifinder import Compound
accessions = ["CCMSLIB00010113829", "CCMSLIB00010125628", "CCMSLIB00010114304"]
compounds = []
for acc in accessions:
try:
compound = Compound(acc)
compounds.append(compound)
print(f"✓ {acc}: {compound.name}")
except Exception as e:
print(f"✗ {acc}: {e}")
print(f"\nSuccessfully loaded {len(compounds)} compounds")
Create Publication-Quality Figures
from modifinder import Compound
from modifinder.utilities import visualizer as viz
compound = Compound("CCMSLIB00010113829")
# High-resolution molecule image
img_mol = viz.draw_molecule(
compound.structure,
size=(1200, 1200),
label=compound.name,
label_font_size=48
)
# High-resolution spectrum
img_spec = viz.draw_spectrum(
compound.spectrum,
title=f"MS/MS Spectrum - m/z {compound.precursor_mz:.2f}"
)
# Save high-DPI
from PIL import Image
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
axes[0].imshow(img_mol)
axes[0].axis('off')
axes[1].imshow(img_spec)
axes[1].axis('off')
plt.savefig("publication_figure.png", dpi=300, bbox_inches='tight')
What’s Next?
Explore the detailed tutorials:
Working with USIs: Fetch and process data from GNPS
Working with MGF Files: Read, filter, and batch process MGF files
Visualization: Create beautiful figures of molecules and spectra
ModiFinder Basics: Deep dive into modification site prediction
Customization: Advanced ModiFinder configuration
Need Help?
Check the API Reference for detailed function documentation
See example notebooks in the
docs/source/tutorials/directoryReport issues on GitHub
Tips for Success
Start Simple: Begin with GNPS identifiers before working with custom data
Validate Data: Always check that objects were created successfully
Visualize Often: Use the drawing tools to understand your data
Process in Batches: For large datasets, process data in manageable chunks
Preserve Metadata: Keep track of IDs and other metadata throughout your workflow
Happy analyzing! 🚀