Explainability Analysis of ppiGPT Interaction Predictions in Prochlorococcus MED4

This repository hosts large result files and the model checkpoint used in the explainability analyses described in Daakour et al., "Topological entrenchment of adaptive proteins in the streamlined interactome of Prochlorococcus MED4." Analysis code and source data are in the companion GitHub repository.

What This Repository Contains

Explainability Results

These files are the outputs of interpretability analyses applied to ppiGPT predictions:

File Size Description
results/deeplift_motif_analysis_results.pkl 78 MB Captum DeepLift per-residue attribution scores, motif discovery results, and position-wise statistics for all 2,168 protein pairs (1,084 PRS + 1,084 RRS)
results/integrated_gradients_random_ppi_per_token_attributions.csv 174 MB Captum Integrated Gradients per-token attribution scores for the 1,084 random reference set pairs

ppiGPT Model Checkpoint (for reproducibility)

The ppiGPT model was created by Kourosh Salehi-Ashtiani and is included here solely to enable reproduction of the explainability analyses. It is not a product of the explainability work.

File Size Description
model/out_3e/ckpt.pt 1.0 GB ppiGPT model checkpoint (3 epochs)
model/data/meta.pkl 343 B Character-level tokenizer metadata (29-token vocabulary)

ppiGPT architecture: GPT-2 decoder-only transformer, 12 layers, 12 attention heads, 768 embedding dimensions, ~84.98M parameters. Trained from scratch on Prochlorococcus MED4 protein sequences with a 29-token character-level vocabulary (20 amino acids + 9 special tokens).

Code Repository

Analysis scripts, source datasets, publication figures, and documentation: https://github.com/olympus-terminal/Prochlorococcus_interactome_model_explainability

Citation

This repository is part of:

Daakour et al., "Topological entrenchment of adaptive proteins in the streamlined interactome of Prochlorococcus MED4."

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support