| | --- |
| | license: cc-by-nc-nd-4.0 |
| | extra_gated_prompt: "By submitting any personal information (e.g., name, contact details), you agree to the collection and processing of this data |
| | for the purpose of evaluating access requests for this model. Repository authors will store this data securely and will not share it with third parties |
| | without your explicit consent. You retain all rights to your personal information and may request its deletion at any time.\n\n |
| | By accessing the repository you agree not to use this model in experiments which may result in harm to human or animal subjects. |
| | " |
| | extra_gated_fields: |
| | Date of Agreement: date_picker |
| | I accept the terms of the license and I agree not to use this model for commercial purposes or profit generation: checkbox |
| | tags: |
| | - molecular-generation |
| | - diffusion-models |
| | - cheminformatics |
| | - 3D-conformer |
| | - rdkit |
| | - non-commercial |
| | language: en |
| | library_name: mlconfgen |
| | datasets: |
| | - ChEMBL |
| | metrics: |
| | - shape-tanimoto |
| | - validity |
| | - uniqueness |
| | - novelty |
| | - Fréchet Distance |
| | model-index: |
| | - name: ML Conformer Generator |
| | results: |
| | - task: |
| | type: molecular-generation |
| | name: 3D Conformer Generation |
| | dataset: |
| | name: ChEMBL (filtered) |
| | type: molecules |
| | metrics: |
| | - name: Valid molecules |
| | type: validity |
| | value: 48-93% |
| | - name: Chemical novelty |
| | type: novelty |
| | value: 99.84% |
| | - name: Shape Tanimoto Similarity (avg) |
| | type: shape-tanimoto |
| | value: 53.32% |
| | - name: Shape Tanimoto Similarity (max) |
| | type: shape-tanimoto |
| | value: 99.69% |
| | - name: Average Synthesis Access score |
| | type: sa_score |
| | value: 3.18 |
| | - name: Unique molecules |
| | type: uniqueness |
| | value: 99.94% |
| | - name: Fréchet Fingerprint Distance |
| | type: Fréchet Distance |
| | value: 4.13 |
| | --- |
| | |
| | # ML Conformer Generator |
| |
|
| | [](https://doi.org/10.1039/D5DD00318K) |
| |
|
| | <img src="./mlconfgen_logo.png" width="200" style="display: block; margin: 0 10%;"> |
| |
|
| | **ML Conformer Generator** is a shape-constrained molecule generation model that combines |
| | an Equivariant Diffusion Model (EDM) and Graph Convolutional Network (GCN). It generates 3D conformations |
| | that are chemically valid and geometrically aligned with a reference shape. |
| |
|
| | --- |
| |
|
| | ## 📦 Model Summary |
| |
|
| | - **Architecture**: Equivariant Diffusion Model (EDM) + Graph Convolutional Network (GCN) |
| | - **Training Data**: 1.6 million ChEMBL compounds, filtered for molecules with 15–39 heavy atoms |
| | - **Post-Processing**: Deterministic standardization pipeline using RDKit with constrained MMFF94 geometry optimization |
| | - **Primary Metric**: Shape Tanimoto Similarity |
| | - **Developed by:** Denis Sapegin |
| |
|
| | --- |
| |
|
| | ## 🚀 Intended Use |
| |
|
| | - Non-Commercial Research in 3D molecular generation |
| | - Academic/educational use |
| | - Generation of molecules similar to a reference conformer |
| | - Generation of molecules similar to a reference arbitrary shape |
| |
|
| | --- |
| |
|
| | ## 🚫 Out of Scope / Limitations |
| |
|
| | - **Commercial Use**: Not licensed for commercial use without explicit permission. |
| | - **Training Bias**: Trained on ChEMBL data — results may be biased toward drug-like molecules and chemistries. |
| | - **Elements Supported**: Only the following elements are supported for generation: `H`, `C`, `N`, `O`, `F`, `P`, `S`, `Cl`, `Br`. |
| | - **Molecular Size Limitations**: |
| | - Trained on molecules containing **15–39 heavy atoms**. |
| | - By architectural design, the model can **only generate molecules with up to 42 heavy atoms**. |
| |
|
| | --- |
| |
|
| | ## 🧪 Evaluation Metrics (100,000 requested samples, 100 denoising steps) |
| |
|
| | - ✅ **Valid molecules (post-standardization, % from requested)**: 48% |
| | - 🧬 **Chemical novelty**: 99.84% |
| | - 📐 **Avg Shape Tanimoto**: 53.32% |
| | - 🎯 **Max Shape Tanimoto**: 99.69% |
| | - 🔁 **Unique molecules**: 99.94% |
| | - ⚡ **Generation speed**: 4.18 valid molecules/sec (NVIDIA H100) |
| | - 💾 **Memory (per thread)**: up to 4.0 GB |
| | - 🧬 **Fréchet Fingerprint Distance (to ChEMBL)**: 4.13 |
| |
|
| | --- |
| |
|
| | ## 🧠 How It Works |
| |
|
| | ### Core Components: |
| | - **EDM** generates atom coordinates and types under shape constraints |
| | - **GCN** predicts adjacency matrices (bonding) |
| | - **RDKit** pipeline enforces valence, performs sanitization, and optimizes geometry |
| |
|
| | ### Shape Alignment: |
| | Evaluated using **Gaussian molecular volume overlap** and **Shape Tanimoto Similarity**. |
| |
|
| | Hydrogens are excluded from similarity computation. |
| |
|
| | --- |
| |
|
| | ## 💾 Access & Licensing |
| |
|
| | The **Python package and inference code are available on GitHub** under Apache 2.0 License |
| | > https://github.com/Membrizard/ml_conformer_generator |
| |
|
| | The trained model **Weights** are available at |
| |
|
| | > https://huggingface.co/Membrizard/ml_conformer_generator |
| |
|
| | And are licensed under CC BY-NC-ND 4.0 |
| |
|
| | The usage of the trained weights for any profit-generating activity is restricted. |
| |
|
| | For commercial licensing and inference-as-a-service, contact: |
| | [Denis Sapegin](https://github.com/Membrizard) |
| |
|
| | --- |
| |
|
| | ## Citation |
| |
|
| | If you use **MLConfGen** in your research, please cite: |
| |
|
| | Denis Sapegin, Fedor Bakharev, Dmitry Krupenya, Azamat Gafurov, Konstantin Pildish, and Joseph C. Bear. |
| | *Moment of inertia as a simple shape descriptor for diffusion-based shape-constrained molecular generation.* |
| | Digital Discovery, 2025. |
| | DOI: [10.1039/D5DD00318K](https://doi.org/10.1039/D5DD00318K) |
| |
|
| | --- |
| |
|
| | ## Installation |
| |
|
| | 1. Install the package: |
| |
|
| | `pip install mlconfgen` |
| |
|
| | 2. Load the weights from Huggingface |
| | > https://huggingface.co/Membrizard/ml_conformer_generator |
| |
|
| | **PyTorch** |
| |
|
| | `edm_moi_chembl_15_39.pt` |
| |
|
| | `adj_mat_seer_chembl_15_39.pt` |
| |
|
| | **ONNX** |
| |
|
| | `edm_moi_chembl_15_39.onnx` |
| |
|
| | `adj_mat_seer_chembl_15_39.onnx` |
| |
|
| | --- |
| |
|
| | ## 🐍 Python API |
| |
|
| | **PyTorch** |
| |
|
| | ```python |
| | from rdkit import Chem |
| | from mlconfgen import MLConformerGenerator, evaluate_samples |
| | |
| | model = MLConformerGenerator( |
| | edm_weights="./edm_moi_chembl_15_39.pt", |
| | adj_mat_seer_weights="./adj_mat_seer_chembl_15_39.pt", |
| | diffusion_steps=100, |
| | ) |
| | |
| | reference = Chem.MolFromMolFile('ceyyag.mol') |
| | |
| | samples = model.generate_conformers(reference_conformer=reference, n_samples=20, variance=2) |
| | |
| | aligned_reference, std_samples = evaluate_samples(reference, samples) |
| | ``` |
| | --- |
| |
|
| | **ONNX** |
| |
|
| | ```python |
| | from mlconfgen import MLConformerGeneratorONNX |
| | from rdkit import Chem |
| | |
| | model = MLConformerGeneratorONNX( |
| | egnn_onnx="./egnn_chembl_15_39.onnx", |
| | adj_mat_seer_onnx="./adj_mat_seer_chembl_15_39.onnx", |
| | diffusion_steps=100, |
| | ) |
| | |
| | reference = Chem.MolFromMolFile('ceyyag.mol') |
| | samples = model.generate_conformers(reference_conformer=reference, n_samples=20, variance=2) |
| | |
| | ``` |