128-dim GraphConv model trained for edge prediction with a dot-product head

This model was trained using Napistu-Torch, a PyTorch framework for training graph neural networks on biological pathway networks.

The dataset used for training is the 8-source "Octopus" human consensus network, which integrates pathway data from STRING, OmniPath, Reactome, and others. The network encompasses ~50K genes, metabolites, and complexes connected by ~8M interactions.

Task

This model performs edge prediction on biological pathway networks. Given node embeddings, the model predicts the likelihood of edges (interactions) between biological entities such as genes, proteins, and metabolites. This is useful for:

  • Discovering novel biological interactions
  • Validating experimentally observed interactions
  • Completing incomplete pathway databases
  • Predicting functional relationships between genes/proteins

The model learns to score potential edges based on learned embeddings of source and target nodes, optionally incorporating relation types for relation-aware prediction.

Model Description

  • Encoder
    • Type: graph_conv
    • Hidden Channels: 128
    • Number of Layers: 3
    • Dropout: 0.2
    • Edge Encoder: βœ“ (dim=32)
  • Head
    • Type: dot_product
    • Relation-Aware: βœ—

Training Date: 2025-12-04

For detailed experiment and training settings see this repository's config.json file.

Performance

Metric Value
Validation AUC 0.7957
Test AUC 0.7964
Validation AP 0.7938
Test AP 0.7947

Links

Usage

1. Setup Environment

To reproduce the environment used for training, run the following commands:

pip install torch==2.8.0
pip install torch-scatter torch-sparse -f https://data.pyg.org/whl/2.8.0+cpu.html
pip install 'napistu==0.8.2'
pip install 'napistu-torch[pyg,lightning]==0.2.13'

2. Setup Data Store

First, download the Octopus consensus network data to create a local NapistuDataStore:

from napistu_torch.load.gcs import gcs_model_to_store

# Download data and create store
napistu_data_store = gcs_model_to_store(
    napistu_data_dir="path/to/napistu_data",
    store_dir="path/to/store",
    asset_name="human_consensus",
    # Pin to stable version for reproducibility
    asset_version="20250923"
)

3. Load Pretrained Model from HuggingFace Hub

from napistu_torch.ml.hugging_face import HuggingFaceLoader

# Load checkpoint
loader = HuggingFaceLoader("seanhacks/edge_prediction_dotprod_128e")
checkpoint = loader.load_checkpoint()

# Load config to reproduce experiment
experiment_config = loader.load_config()

4. Use Pretrained Model for Training

You can use this pretrained model as initialization for training via the CLI:

# Create a training config that uses the pretrained model
cat > my_config.yaml << EOF
name: my_finetuned_model

model:
  use_pretrained_model: true
  pretrained_model_source: huggingface
  pretrained_model_path: seanhacks/edge_prediction_dotprod_128e
  pretrained_model_freeze_encoder_weights: false  # Allow fine-tuning

data:
  sbml_dfs_path: path/to/sbml_dfs.pkl
  napistu_graph_path: path/to/graph.pkl
  napistu_data_name: edge_prediction

training:
  epochs: 100
  lr: 0.001
EOF

# Train with pretrained weights
napistu-torch train my_config.yaml

Citation

If you use this model, please cite:

@software{napistu_torch,
  title = {Napistu-Torch: Graph Neural Networks for Biological Pathway Analysis},
  author = {Hackett, Sean R.},
  url = {https://github.com/napistu/Napistu-Torch},
  year = {2025},
  note = {Model: graph_conv-dot_product_h128_l3_edge_prediction}
}

License

MIT License - See LICENSE for details.

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support