DROPPS protein builder¶
The pdb2dps tool converts an amino-acid sequence (optionally with PTMs)
into coarse-grained CGPS topology and PDB structure files.
It supports:
chain generation from sequence alone,
backbone extraction from an all-atom PDB,
post-translational modifications (PTMs),
terminal charge patches (NTD/CTD),
random-coil ensemble generation with overlap avoidance,
forcefield-aware bond/parameter assignment.
This tool replaces the functionality of pdb2gmx + seq-builder for the CGPS/HPS-based coarse-grained model used in DROPPS.
Implemented in pdb2cgps.py.
Purpose¶
pdb2dps generates:
- 1. Coarse-grained conformation(s)
(each bead is one residue) written to
.pdb- 2. Coarse-grained topology
with:
atom types
per-residue masses, charges, sigma, lambda
bonded terms (bonds)
PTM-aware residue types
NTD/CTD patches
written to .itp
The output is immediately compatible with:
dps genmeshdps gromppdps mdrunCG/HPS DROPPS workflows.
Usage¶
Typical usage (generate both PDB & ITP):
dps pdb2dps \
-s MDGVGAPKT \
-oc peptide.pdb \
-op peptide \
-on PEPTIDE \
-ff hps
Include PTM:
dps pdb2dps \
-s MDGVSSKT \
-ptm S5SMP \
-oc out.pdb \
-op out \
-ff hps
Extract coordinates from all-atom PDB:
dps pdb2dps \
-s MTDGVAKE \
-f input_allatom.pdb \
-oc cg.pdb -op cg -ff hps
Arguments¶
Required¶
- -s, --sequence SEQ¶
Protein sequence (1-letter abbreviations of CGPS forcefield AA types).
Optional¶
- -f, --input-pdb FILE¶
All-atom PDB file. When provided: - CA atoms are extracted (converted Å → nm) - CG coordinates follow the PDB shape, not random coil - Number of CA atoms must match sequence length.
- -ri, --residue-index INT¶
Residue index of the first residue (default: 1). Affects numbering in output PDB/ITP.
- -ptm, --post-translational-modification STRING¶
Add or mutate residues using format:
OriginalAA ResidueNumber MutatedType Example: S129SMP
Multiple
-ptmentries allowed.
- -cNTD, --charged-NTD¶
Add +1 charge to the N-terminal residue (via
_Nsuffix).
- -cCTD, --charged-CTD¶
Add −1 charge to the C-terminal residue (via
_Csuffix).
- -r, --radius FLOAT¶
Maximum spatial extent allowed for generated conformations (nm). Prevents unrealistic expansion. Default:
2.0.
- -n, --number INT¶
Number of conformations to generate. Useful for ensembles. Default:
1.
- -e, --degree-extend FLOAT¶
Controls chain stiffness. Range:
0 – 1- 0 → fully random - 1 → fully extended Default:0.5.
- -ff, --forcefield NAME¶
Select CGPS/HPS forcefield. Must correspond to a
*.fffile.
- -oc, --output-conformation FILE¶
Output PDB filename or prefix. If more than one structure is generated:
prefix_1.pdb,prefix_2.pdb…
- -op, --output-topology FILE¶
Output topology file prefix (
.itpauto-added).
- -on, --output-name NAME¶
Topology molecule name (default:
MOL).
PTMs (Post-Translational Modifications)¶
PTM strings are parsed via regex:
([A-Za-z]+)(\d+)([A-Za-z]+)
Given S129SMP:
original residue = S
target index = 129
mutated → SMP
Validation rules:
index must lie inside chain
original AA must match the sequence
mutated type must exist in the selected forcefield
terminal residues allowed except where forcefield forbids
Conformation Generation¶
Two modes available:
### 1. Using input all-atom PDB (--input-pdb)
Extract CA atom coordinates
Convert Å → nm
Ensure number of CAs equals sequence length
Write as CG beads with proper residue names
### 2. Random-coil generator (default)
The function generate_chain_conformation_single():
Places residue 1 at origin
Places residue 2 using random spherical angles
For residues 3..N: - preserve directionality with
degree_extendblend - enforce correct bond length from forcefield - avoid bead–bead overlap - reject positions if outside--radius- allow up to 50 attempts per beadEnsures the chain is continuous and physically plausible
Output: PDB File¶
Each generated conformation produces lines like:
ATOM 1 ALA A 1 x y z
Key features:
Coordinates written in Å
CRYST1 box size =
radius × 2Residue names from forcefield mapping
Atom names from the coarse-grained abbreviation
Element symbol determined automatically
Residue numbers begin at
--residue-index
Output: ITP Topology¶
Generated topology contains:
### [moleculetype]
[ moleculetype ]
MOL 1
### [atomtypes]
Generated for each unique CGPS bead abbreviation:
Fields include:
atom abbreviation
GROMACS-style atom name
sigma
lambda
T0, T1, T2 temperature coefficients (HPS forcefield)
### [atoms]
For each sequence position:
ID
atom name
residue name
residue index
mass
charge
cumulative
qtotcomment
### [bonds]
Bond parameters come directly from:
forcefield.bond2param[ forcefield.abbr2bondtype[ f"{AAi}-{AAj}" ] ]
Bond length and force constant are written per residue pair.
Forcefield Handling¶
Forcefields searched from two locations:
internal:
dropps/share/forcefields/*.ffworking directory:
*.ff
If both contain same filename → error:
ERROR: Conflict detected! The following .ff file(s) exist in both ...
User may choose forcefield via:
explicit
--forcefieldinteractive prompt
Forcefield functions used:
getff()mapping dictionaries: -
abbr2aa-abbr2charge-abbr2mass-abbr2sigma-abbr2lambda-abbr2tempcoff-abbr2bondtype-abbr2bondtypeindex
Error Messages¶
Residue index out of range
ERROR: Mutated residue %d out of range ...
PTM mismatch
ERROR ... residue %d is not %s
PDB/sequence mismatch
ERROR: There are %d CA atoms but sequence length is %d
Invalid degree of extend
Range must be 0–1.
Forcefield conflict
Printed when duplicate .ff files found.
Summary¶
dps pdb2cgps is the core sequence-to-CGPS model builder in DROPPS.
It provides:
PTM insertion
terminal charge patches
sequence-only or PDB-guided CG structure generation
collision-free random-coil algorithm
forcefield-aware ITP generation
exact bond, mass, charge, sigma/lambda assignments
multi-conformation output for ensemble simulations
It is the recommended starting point for building CGPS systems for LLPS and disordered-protein simulations.