Command Line Usage

Basic Usage

Command line usage for peptides is simple, and takes the following form:

pygen-structures SEQUENCE -o OUTPUT_PREFIX

Sequences are specified using the one letter protein code by default, and terminal patches can be supplied by using hyphens as delimiters (e.g. NNEU-AFK-CT2, note that both termini must be supplied). D-amino acids need only be preceded by a lowercase ‘d’ (e.g. dA for D-alanine).

OUTPUT_PREFIX.psf and OUTPUT_PREFIX.pdb are created. If -o is not specified, the content of the .pdb file is written to stdout and no .psf file is generated.

The histidine form used can be set using --histidine, and defaults to HSE (neutral form with proton on N).

Patches can be supplied using --patches, the name of the patch, and the 0-based indices the patch is to be applied to (or strings 'FIRST'/ 'LAST' to refer to the first and last residue). See the note below on the difference between FIRST and 0 / LAST and -1.

To generate more complex structures, such as sugars, the residue names should be supplied (hyphen delimited) and the -u/--use-charmm-names option selected.

--name and --segid control the names given in the COMPND record and the segment ID (4 character max) respectively.

What’s Special About FIRST and LAST?

If patches are specified as being applied to 'FIRST' or 'LAST' rather than 0 or -1, the default first/last patch is not applied. This is to distinguish between cases where a different terminal patch is being applied (such as if protein patch CT2 was being applied instead of CTER) and cases where patches just happen to affect the first or last residue.


To produce a simple peptide sequence, the one letter code can be used. To produce the peptide HIS-GLU-TYR, creating HEY.psf and HEY.pdb:

pygen-structures HEY -o HEY

Supposing we think that histidine should be protonated, we can change the protonation state of histidine by specifying a different histidine form:

pygen-structures HEY -o HEY --histidine HSP

Or simply use the three letter residue codes by using the -u flag:

pygen-structures -u HSP-GLU-TYR -o HEY

Looking at non-protein examples, we could create the trisaccharide raffinose. This requires the use of the residue codes and a patch. The default segment ID, PROT, is less applicable here, so we can specify that with the --segid option, and set the name in the COMPND header using --name. The following command produces RAFF.psf and RAFF.pdb:

pygen-structures -u AGLC-BFRU-AGAL --patches RAFF 0 1 2 --segid RAFF --name Raffinose -o RAFF

We can also make glycopeptides. To link alpha-glucose to an arginine residue (in this case, from an ALA-ASN-ALA peptide), we can use the NGLA patch. Note that because the protein residue is not the last in the chain, we have to apply the C-terminus patch manually:

pygen-structures -u ALA-ASN-ALA-AGLC --patches CTER -2 NGLA 1 -1 -o ANA-NAGLC

By default, if parameters are missing then the files are not created and the missing parameters are written to stdout. Using the -v flag will disable verification:

$ # Note that this is fixed in v0.2.3, and will now pass verification
$ pygen-structures AdP -o AdP
Missing parameters:
bonds {('CPD1', 'CC')}
$ pygen-structures -v AdP -o AdP

A different CHARMM distribution can be loaded using the -t option, with the path to the folder. pygen-structures ships with the latest CHARMM distribution (July 2019) at the time of writing, with some modifications to correct the D-amino acid parameters (these modifications are highlighted in the toppar README). The function which parses the folder will pick the latest versions of the parameter and topology files (36 over 27, 36m over 36), so if you plan on using an older version of the forcefield (this is not recommended) you will have to remove the newer versions and change the file extensions to match the current conventions (.rtf for topology files and, .prm for parameter files).