DROPPS protein builder¶

The pdb2dps tool converts an amino-acid sequence (optionally with PTMs) into coarse-grained CGPS topology and PDB structure files. It supports:

chain generation from sequence alone,
backbone extraction from an all-atom PDB,
post-translational modifications (PTMs),
terminal charge patches (NTD/CTD),
random-coil ensemble generation with overlap avoidance,
forcefield-aware bond/parameter assignment.

This tool replaces the functionality of pdb2gmx + seq-builder for the CGPS/HPS-based coarse-grained model used in DROPPS.

Implemented in pdb2cgps.py.

Purpose¶

pdb2dps generates:

1. Coarse-grained conformation(s): (each bead is one residue) written to .pdb
2. Coarse-grained topology: with:

atom types
per-residue masses, charges, sigma, lambda
bonded terms (bonds)
PTM-aware residue types
NTD/CTD patches

written to .itp

The output is immediately compatible with:

dps genmesh
dps grompp
dps mdrun
CG/HPS DROPPS workflows.

Usage¶

Typical usage (generate both PDB & ITP):

dps pdb2dps \
    -s MDGVGAPKT \
    -oc peptide.pdb \
    -op peptide \
    -on PEPTIDE \
    -ff hps

Include PTM:

dps pdb2dps \
    -s MDGVSSKT \
    -ptm S5SMP \
    -oc out.pdb \
    -op out \
    -ff hps

Extract coordinates from all-atom PDB:

dps pdb2dps \
    -s MTDGVAKE \
    -f input_allatom.pdb \
    -oc cg.pdb -op cg -ff hps

Arguments¶

Required¶

-s, --sequence SEQ¶: Protein sequence (1-letter abbreviations of CGPS forcefield AA types).

Optional¶

-f, --input-pdb FILE¶: All-atom PDB file. When provided: - CA atoms are extracted (converted Å → nm) - CG coordinates follow the PDB shape, not random coil - Number of CA atoms must match sequence length.

-ri, --residue-index INT¶: Residue index of the first residue (default: 1). Affects numbering in output PDB/ITP.

-ptm, --post-translational-modification STRING¶

Add or mutate residues using format:

OriginalAA ResidueNumber MutatedType
Example: S129SMP

Multiple -ptm entries allowed.

-cNTD, --charged-NTD¶: Add +1 charge to the N-terminal residue (via _N suffix).

-cCTD, --charged-CTD¶: Add −1 charge to the C-terminal residue (via _C suffix).

-r, --radius FLOAT¶: Maximum spatial extent allowed for generated conformations (nm). Prevents unrealistic expansion. Default: 2.0.

-n, --number INT¶: Number of conformations to generate. Useful for ensembles. Default: 1.

-e, --degree-extend FLOAT¶: Controls chain stiffness. Range: 0 – 1 - 0 → fully random - 1 → fully extended Default: 0.5.

-ff, --forcefield NAME¶: Select CGPS/HPS forcefield. Must correspond to a *.ff file.

-oc, --output-conformation FILE¶: Output PDB filename or prefix. If more than one structure is generated: prefix_1.pdb, prefix_2.pdb …

-op, --output-topology FILE¶: Output topology file prefix (.itp auto-added).

-on, --output-name NAME¶: Topology molecule name (default: MOL).

PTMs (Post-Translational Modifications)¶

PTM strings are parsed via regex:

([A-Za-z]+)(\d+)([A-Za-z]+)

Given S129SMP:

original residue = S
target index = 129
mutated → SMP

Validation rules:

index must lie inside chain
original AA must match the sequence
mutated type must exist in the selected forcefield
terminal residues allowed except where forcefield forbids

Conformation Generation¶

Two modes available:

### 1. Using input all-atom PDB (--input-pdb)

Extract CA atom coordinates
Convert Å → nm
Ensure number of CAs equals sequence length
Write as CG beads with proper residue names

### 2. Random-coil generator (default)

The function generate_chain_conformation_single():

Places residue 1 at origin
Places residue 2 using random spherical angles
For residues 3..N: - preserve directionality with degree_extend blend - enforce correct bond length from forcefield - avoid bead–bead overlap - reject positions if outside --radius - allow up to 50 attempts per bead
Ensures the chain is continuous and physically plausible

Output: PDB File¶

Each generated conformation produces lines like:

ATOM      1 ALA A   1    x y z

Key features:

Coordinates written in Å
CRYST1 box size = radius × 2
Residue names from forcefield mapping
Atom names from the coarse-grained abbreviation
Element symbol determined automatically
Residue numbers begin at --residue-index

Output: ITP Topology¶

Generated topology contains:

### [moleculetype]

[ moleculetype ]
MOL     1

### [atomtypes]

Generated for each unique CGPS bead abbreviation:

Fields include:

atom abbreviation
GROMACS-style atom name
sigma
lambda
T0, T1, T2 temperature coefficients (HPS forcefield)

### [atoms]

For each sequence position:

ID
atom name
residue name
residue index
mass
charge
cumulative qtot comment

### [bonds]

Bond parameters come directly from:

forcefield.bond2param[ forcefield.abbr2bondtype[ f"{AAi}-{AAj}" ] ]

Bond length and force constant are written per residue pair.

Forcefield Handling¶

Forcefields searched from two locations:

internal: dropps/share/forcefields/*.ff
working directory: *.ff

If both contain same filename → error:

ERROR: Conflict detected! The following .ff file(s) exist in both ...

User may choose forcefield via:

explicit --forcefield
interactive prompt

Forcefield functions used:

getff()
mapping dictionaries: - abbr2aa - abbr2charge - abbr2mass - abbr2sigma - abbr2lambda - abbr2tempcoff - abbr2bondtype - abbr2bondtypeindex

Error Messages¶

Residue index out of range

ERROR: Mutated residue %d out of range ...

PTM mismatch

ERROR ... residue %d is not %s

PDB/sequence mismatch

ERROR: There are %d CA atoms but sequence length is %d

Invalid degree of extend

Range must be 0–1.

Forcefield conflict

Printed when duplicate .ff files found.

Summary¶

dps pdb2cgps is the core sequence-to-CGPS model builder in DROPPS. It provides:

PTM insertion
terminal charge patches
sequence-only or PDB-guided CG structure generation
collision-free random-coil algorithm
forcefield-aware ITP generation
exact bond, mass, charge, sigma/lambda assignments
multi-conformation output for ensemble simulations

It is the recommended starting point for building CGPS systems for LLPS and disordered-protein simulations.