DROPPS protein builder

The pdb2dps tool converts an amino-acid sequence (optionally with PTMs) into coarse-grained CGPS topology and PDB structure files. It supports:

  • chain generation from sequence alone,

  • backbone extraction from an all-atom PDB,

  • post-translational modifications (PTMs),

  • terminal charge patches (NTD/CTD),

  • random-coil ensemble generation with overlap avoidance,

  • forcefield-aware bond/parameter assignment.

This tool replaces the functionality of pdb2gmx + seq-builder for the CGPS/HPS-based coarse-grained model used in DROPPS.

Implemented in pdb2cgps.py.

Purpose

pdb2dps generates:

1. Coarse-grained conformation(s)

(each bead is one residue) written to .pdb

2. Coarse-grained topology

with:

  • atom types

  • per-residue masses, charges, sigma, lambda

  • bonded terms (bonds)

  • PTM-aware residue types

  • NTD/CTD patches

written to .itp

The output is immediately compatible with:

  • dps genmesh

  • dps grompp

  • dps mdrun

  • CG/HPS DROPPS workflows.

Usage

Typical usage (generate both PDB & ITP):

dps pdb2dps \
    -s MDGVGAPKT \
    -oc peptide.pdb \
    -op peptide \
    -on PEPTIDE \
    -ff hps

Include PTM:

dps pdb2dps \
    -s MDGVSSKT \
    -ptm S5SMP \
    -oc out.pdb \
    -op out \
    -ff hps

Extract coordinates from all-atom PDB:

dps pdb2dps \
    -s MTDGVAKE \
    -f input_allatom.pdb \
    -oc cg.pdb -op cg -ff hps

Arguments

Required

-s, --sequence SEQ

Protein sequence (1-letter abbreviations of CGPS forcefield AA types).

Optional

-f, --input-pdb FILE

All-atom PDB file. When provided: - CA atoms are extracted (converted Å → nm) - CG coordinates follow the PDB shape, not random coil - Number of CA atoms must match sequence length.

-ri, --residue-index INT

Residue index of the first residue (default: 1). Affects numbering in output PDB/ITP.

-ptm, --post-translational-modification STRING

Add or mutate residues using format:

OriginalAA ResidueNumber MutatedType
Example: S129SMP

Multiple -ptm entries allowed.

-cNTD, --charged-NTD

Add +1 charge to the N-terminal residue (via _N suffix).

-cCTD, --charged-CTD

Add −1 charge to the C-terminal residue (via _C suffix).

-r, --radius FLOAT

Maximum spatial extent allowed for generated conformations (nm). Prevents unrealistic expansion. Default: 2.0.

-n, --number INT

Number of conformations to generate. Useful for ensembles. Default: 1.

-e, --degree-extend FLOAT

Controls chain stiffness. Range: 0 1 - 0 → fully random - 1 → fully extended Default: 0.5.

-ff, --forcefield NAME

Select CGPS/HPS forcefield. Must correspond to a *.ff file.

-oc, --output-conformation FILE

Output PDB filename or prefix. If more than one structure is generated: prefix_1.pdb, prefix_2.pdb

-op, --output-topology FILE

Output topology file prefix (.itp auto-added).

-on, --output-name NAME

Topology molecule name (default: MOL).

PTMs (Post-Translational Modifications)

PTM strings are parsed via regex:

([A-Za-z]+)(\d+)([A-Za-z]+)

Given S129SMP:

  • original residue = S

  • target index = 129

  • mutated → SMP

Validation rules:

  • index must lie inside chain

  • original AA must match the sequence

  • mutated type must exist in the selected forcefield

  • terminal residues allowed except where forcefield forbids

Conformation Generation

Two modes available:

### 1. Using input all-atom PDB (--input-pdb)

  • Extract CA atom coordinates

  • Convert Å → nm

  • Ensure number of CAs equals sequence length

  • Write as CG beads with proper residue names

### 2. Random-coil generator (default)

The function generate_chain_conformation_single():

  • Places residue 1 at origin

  • Places residue 2 using random spherical angles

  • For residues 3..N: - preserve directionality with degree_extend blend - enforce correct bond length from forcefield - avoid bead–bead overlap - reject positions if outside --radius - allow up to 50 attempts per bead

  • Ensures the chain is continuous and physically plausible

Output: PDB File

Each generated conformation produces lines like:

ATOM      1 ALA A   1    x y z

Key features:

  • Coordinates written in Å

  • CRYST1 box size = radius × 2

  • Residue names from forcefield mapping

  • Atom names from the coarse-grained abbreviation

  • Element symbol determined automatically

  • Residue numbers begin at --residue-index

Output: ITP Topology

Generated topology contains:

### [moleculetype]

[ moleculetype ]
MOL     1

### [atomtypes]

Generated for each unique CGPS bead abbreviation:

Fields include:

  • atom abbreviation

  • GROMACS-style atom name

  • sigma

  • lambda

  • T0, T1, T2 temperature coefficients (HPS forcefield)

### [atoms]

For each sequence position:

  • ID

  • atom name

  • residue name

  • residue index

  • mass

  • charge

  • cumulative qtot comment

### [bonds]

Bond parameters come directly from:

forcefield.bond2param[ forcefield.abbr2bondtype[ f"{AAi}-{AAj}" ] ]

Bond length and force constant are written per residue pair.

Forcefield Handling

Forcefields searched from two locations:

  1. internal: dropps/share/forcefields/*.ff

  2. working directory: *.ff

If both contain same filename → error:

ERROR: Conflict detected! The following .ff file(s) exist in both ...

User may choose forcefield via:

  • explicit --forcefield

  • interactive prompt

Forcefield functions used:

  • getff()

  • mapping dictionaries: - abbr2aa - abbr2charge - abbr2mass - abbr2sigma - abbr2lambda - abbr2tempcoff - abbr2bondtype - abbr2bondtypeindex

Error Messages

Residue index out of range

ERROR: Mutated residue %d out of range ...

PTM mismatch

ERROR ... residue %d is not %s

PDB/sequence mismatch

ERROR: There are %d CA atoms but sequence length is %d

Invalid degree of extend

Range must be 0–1.

Forcefield conflict

Printed when duplicate .ff files found.

Summary

dps pdb2cgps is the core sequence-to-CGPS model builder in DROPPS. It provides:

  • PTM insertion

  • terminal charge patches

  • sequence-only or PDB-guided CG structure generation

  • collision-free random-coil algorithm

  • forcefield-aware ITP generation

  • exact bond, mass, charge, sigma/lambda assignments

  • multi-conformation output for ensemble simulations

It is the recommended starting point for building CGPS systems for LLPS and disordered-protein simulations.