DROPPS protein builder ====================== The ``pdb2dps`` tool converts an **amino-acid sequence** (optionally with PTMs) into **coarse-grained CGPS topology and PDB structure files**. It supports: - chain generation from **sequence alone**, - backbone extraction from an **all-atom PDB**, - post-translational modifications (PTMs), - terminal charge patches (NTD/CTD), - random-coil ensemble generation with overlap avoidance, - forcefield-aware bond/parameter assignment. This tool replaces the functionality of *pdb2gmx + seq-builder* for the CGPS/HPS-based coarse-grained model used in DROPPS. Implemented in ``pdb2cgps.py``. Purpose ------- ``pdb2dps`` generates: **1. Coarse-grained conformation(s)** (each bead is one residue) written to ``.pdb`` **2. Coarse-grained topology** with: - atom types - per-residue masses, charges, sigma, lambda - bonded terms (bonds) - PTM-aware residue types - NTD/CTD patches written to ``.itp`` The output is immediately compatible with: - ``dps genmesh`` - ``dps grompp`` - ``dps mdrun`` - CG/HPS DROPPS workflows. Usage ----- Typical usage (generate both PDB & ITP): .. code-block:: bash dps pdb2dps \ -s MDGVGAPKT \ -oc peptide.pdb \ -op peptide \ -on PEPTIDE \ -ff hps Include PTM: .. code-block:: bash dps pdb2dps \ -s MDGVSSKT \ -ptm S5SMP \ -oc out.pdb \ -op out \ -ff hps Extract coordinates from all-atom PDB: .. code-block:: bash dps pdb2dps \ -s MTDGVAKE \ -f input_allatom.pdb \ -oc cg.pdb -op cg -ff hps Arguments --------- Required -------- .. option:: -s, --sequence SEQ Protein sequence (1-letter abbreviations of CGPS forcefield AA types). Optional -------- .. option:: -f, --input-pdb FILE All-atom PDB file. When provided: - CA atoms are extracted (converted Å → nm) - CG coordinates follow the PDB shape, not random coil - Number of CA atoms must match sequence length. .. option:: -ri, --residue-index INT Residue index of the first residue (default: 1). Affects numbering in output PDB/ITP. .. option:: -ptm, --post-translational-modification STRING Add or mutate residues using format: :: OriginalAA ResidueNumber MutatedType Example: S129SMP Multiple ``-ptm`` entries allowed. .. option:: -cNTD, --charged-NTD Add +1 charge to the N-terminal residue (via ``_N`` suffix). .. option:: -cCTD, --charged-CTD Add −1 charge to the C-terminal residue (via ``_C`` suffix). .. option:: -r, --radius FLOAT Maximum spatial extent allowed for generated conformations (nm). Prevents unrealistic expansion. Default: ``2.0``. .. option:: -n, --number INT Number of conformations to generate. Useful for ensembles. Default: ``1``. .. option:: -e, --degree-extend FLOAT Controls chain stiffness. Range: ``0 – 1`` - 0 → fully random - 1 → fully extended Default: ``0.5``. .. option:: -ff, --forcefield NAME Select CGPS/HPS forcefield. Must correspond to a ``*.ff`` file. .. option:: -oc, --output-conformation FILE Output PDB filename or prefix. If more than one structure is generated: ``prefix_1.pdb``, ``prefix_2.pdb`` … .. option:: -op, --output-topology FILE Output topology file prefix (``.itp`` auto-added). .. option:: -on, --output-name NAME Topology molecule name (default: ``MOL``). PTMs (Post-Translational Modifications) --------------------------------------- PTM strings are parsed via regex: :: ([A-Za-z]+)(\d+)([A-Za-z]+) Given ``S129SMP``: - original residue = S - target index = 129 - mutated → SMP Validation rules: - index must lie inside chain - original AA must match the sequence - mutated type must exist in the selected forcefield - terminal residues allowed except where forcefield forbids Conformation Generation ----------------------- Two modes available: ### 1. Using input all-atom PDB (``--input-pdb``) - Extract CA atom coordinates - Convert Å → nm - Ensure number of CAs equals sequence length - Write as CG beads with proper residue names ### 2. Random-coil generator (default) The function ``generate_chain_conformation_single()``: - Places residue 1 at origin - Places residue 2 using random spherical angles - For residues 3..N: - preserve directionality with ``degree_extend`` blend - enforce correct bond length from forcefield - avoid bead–bead overlap - reject positions if outside ``--radius`` - allow up to 50 attempts per bead - Ensures the chain is continuous and physically plausible Output: PDB File ---------------- Each generated conformation produces lines like: :: ATOM 1 ALA A 1 x y z Key features: - Coordinates written in Å - CRYST1 box size = ``radius × 2`` - Residue names from forcefield mapping - Atom names from the coarse-grained abbreviation - Element symbol determined automatically - Residue numbers begin at ``--residue-index`` Output: ITP Topology -------------------- Generated topology contains: ### `[moleculetype]` .. code-block:: text [ moleculetype ] MOL 1 ### `[atomtypes]` Generated for **each unique CGPS bead abbreviation**: Fields include: - atom abbreviation - GROMACS-style atom name - sigma - lambda - T0, T1, T2 temperature coefficients (HPS forcefield) ### `[atoms]` For each sequence position: - ID - atom name - residue name - residue index - mass - charge - cumulative ``qtot`` comment ### `[bonds]` Bond parameters come directly from: .. code-block:: python forcefield.bond2param[ forcefield.abbr2bondtype[ f"{AAi}-{AAj}" ] ] Bond length and force constant are written per residue pair. Forcefield Handling ------------------- Forcefields searched from two locations: 1. internal: ``dropps/share/forcefields/*.ff`` 2. working directory: ``*.ff`` If both contain same filename → error: :: ERROR: Conflict detected! The following .ff file(s) exist in both ... User may choose forcefield via: - explicit ``--forcefield`` - interactive prompt Forcefield functions used: - ``getff()`` - mapping dictionaries: - ``abbr2aa`` - ``abbr2charge`` - ``abbr2mass`` - ``abbr2sigma`` - ``abbr2lambda`` - ``abbr2tempcoff`` - ``abbr2bondtype`` - ``abbr2bondtypeindex`` Error Messages -------------- **Residue index out of range** :: ERROR: Mutated residue %d out of range ... **PTM mismatch** :: ERROR ... residue %d is not %s **PDB/sequence mismatch** :: ERROR: There are %d CA atoms but sequence length is %d **Invalid degree of extend** Range must be 0–1. **Forcefield conflict** Printed when duplicate ``.ff`` files found. Summary ------- ``dps pdb2cgps`` is the core **sequence-to-CGPS model builder** in DROPPS. It provides: - PTM insertion - terminal charge patches - sequence-only or PDB-guided CG structure generation - collision-free random-coil algorithm - forcefield-aware ITP generation - exact bond, mass, charge, sigma/lambda assignments - multi-conformation output for ensemble simulations It is the recommended starting point for building CGPS systems for LLPS and disordered-protein simulations.