CHARMM General Force Field (CGenFF): A force field for drug-like
molecules compatible with the CHARMM all-atom additive
biological force fields
K. Vanommeslaeghe, E. Hatcher, C. Acharya, S. Kundu, S. Zhong, J. Shim, E. Darian, O.
Guvench, P. Lopes, I. Vorobyov, and A. D. MacKerell Jr.*
Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, Baltimore,
Maryland 21201
Abstract
The widely used CHARMM additive all-atom force field includes parameters for proteins, nucleic
acids, lipids and carbohydrates. In the present paper an extension of the CHARMM force field to
drug-like molecules is presented. The resulting CHARMM General Force Field (CGenFF) covers a
wide range of chemical groups present in biomolecules and drug-like molecules, including a large
number of heterocyclic scaffolds. The parametrization philosophy behind the force field focuses on
quality at the expense of transferability, with the implementation concentrating on an extensible force
field. Statistics related to the quality of the parametrization with a focus on experimental validation
are presented. Additionally, the parametrization procedure, described fully in the present paper in
the context of the model systems, pyrrolidine, and 3-phenoxymethylpyrrolidine will allow users to
readily extend the force field to chemical groups that are not explicitly covered in the force field as
well as add functional groups to and link together molecules already available in the force field.
CGenFF thus makes it possible to perform “all-CHARMM” simulations on drug-target interactions
thereby extending the utility of CHARMM force fields to medicinally relevant systems.
Introduction
Background
Computational biochemistry and biophysics is an ever growing field that is being applied to a
wide range of heterogeneous systems of increasing size and complexity. Recent examples
include simulations of the nucleosome,1 ion channels2 and the ribosome.3 While greater
computational resources, including massively parallel architectures, and efficient codes such
as NAMD4 and Desmond5 have made important contributions to these advances, the quality
of the force fields that act as the framework of computational biochemistry and biophysics
have improved to the point that stable simulations of these systems are possible. Towards this
goal the CHARMM additive, all-atom force field6 has made an important contribution. Apart
from proteins,7 it supports nucleic acids8–10 and lipids,11,12 and has limited support for
carbohydrates,13–15 with a more complete carbohydrate extension in preparation,16 allowing
simulations on all commonly encountered motifs in biological systems. However, its coverage
of the wide range of chemical space required for the field of computational medicinal chemistry
is limited.
*To whom correspondence should be addressed: alex@outerbanks.umaryland.edu.
Supplementary material
A listing of target data and corresponding CGenFF values is available as supplementary material. Additionally, the most recent CGenFF
topology and parameter files are available at http://mackerell.umaryland.edu/.
NIH Public Access
Author Manuscript
J Comput Chem. Author manuscript; available in PMC 2011 March 1.
Published in final edited form as:
J Comput Chem. 2010 March ; 31(4): 671–690. doi:10.1002/jcc.21367.
N
IH
-PA Author M
anuscript
N
IH
-PA Author M
anuscript
N
IH
-PA Author M
anuscript
To date, a number of force fields with wide coverage of drug-like molecules are available.
Here, we will limit our discussion to force fields that were parametrized to reproduce condensed
phase properties. Indeed, as classical additive force fields do not account for polarizability, a
given additive force field will only perform well in a given dielectric medium. As a special
case, Allinger’s most recent MM4 force field accurately predicts gas-phase conformational
energetics of organic molecules and includes terms that account for polarizability in an
approximate way.17 Nevertheless, this force field has not been evaluated in a condensed phase,
high-dielectric medium, and may be assumed to be unsuitable for condensed phase studies of,
for example, drug-protein interactions. The Merck Molecular Force Field (MMFF94), on the
other hand, was developed with the explicit goal of performing MD simulations on
pharmaceutically relevant systems in the condensed phase. Indeed, it was originally conceived
as “a combined organic/protein force field that is equally applicable to proteins and other
systems of biological significance”.18 One drawback of this approach is that the general nature
of the force field inherently compromises its accuracy in representing classes of molecules in
a well-defined chemical space, such as proteins. To overcome this problem, efforts were started
to create general force fields that are meant to be used with existing, highly optimized and
tested biomolecular force fields. As a result of this effort, Amber19,20 now includes the General
Amber Force Field (GAFF)21 and the Antechamber toolkit,22 which allow the user to generate
an Amber force field model for an arbitrary input molecule. Another example is OPLS-AA,
whose optimizations emphasized condensed phase properties of small molecules and has been
extended to cover a diverse set of small molecule model compounds.23,24 However, none of
these force fields is expected to be applicable in combination with the CHARMM force field,
as interactions between molecules are dominated by a delicate balance of parameters. Indeed,
while it is tempting to combine a biomolecular force field (like CHARMM) for a system’s
biological part with an organic force field (such as MMFF94) for its drug-like part, it is unlikely
that this will yield properly balanced intermolecular interactions, because the nonbonded
parameters are developed using different strategies for different force fields.25 To cure this
problem, the CHARMM General Force Field (CGenFF) was created. CGenFF is an organic
force field explicitly aimed at simulating drug-like molecules in a biological environment
represented by the CHARMM additive biomolecular force fields. Consequently, CGenFF uses
the same class I potential energy function as the other CHARMM force fields,6 and more
generally, all the conventions and recommendations for usage of the biomolecular CHARMM
force fields apply to CGenFF as well. The form of the CHARMM potential energy function
used to calculate the energy, V(r), were r represents the Cartesian coordinates of the system,
is shown in equation 1.
(1)
The intramolecular portion of the potential energy function includes terms for the bonds,
valence angles, torsion or dihedral angles, improper dihedral angles and a Urey-Bradley 1,3-
term, where bo, θo, ψo and r1,3,o are the bond, angle, improper and Urey-Bradley equilibrium
terms, respectively, n and δ are the dihedral multiplicity and phase and the K’s are the respective
Vanommeslaeghe et al. Page 2
J Comput Chem. Author manuscript; available in PMC 2011 March 1.
N
IH
-PA Author M
anuscript
N
IH
-PA Author M
anuscript
N
IH
-PA Author M
anuscript
force constants. The intermolecular terms include electrostatic and van der Waals (vdW)
interactions, where qi and qj is the partial atomic charge of atom i and j, respectively, εij is the
well depth, Rmin,ij is the radius in the Lennard-Jones (LJ) 6–12 term used to treat the vdW
interactions, and rij is the distance between i and j. In addition, the energy function in equation
1 has been extended to include a 2D dihedral energy correction map, referred to as CMAP,
which has been used to improve the treatment of the conformational properties of the φ, ψ
terms in the peptide backbone,26,27 though it may be applied to other systems. More details of
the CHARMM potential energy function may be obtained from reference 25.
Parametrization Philosophy
The main challenge in creating a general force field is to cover enough of the vast chemical
space occupied by drug-like molecules to make the model of utility. To attain this, a systematic
optimization protocol is required that is 1) of a level of accuracy appropriate for the proposed
application of the force field, 2) fast enough to parametrize large numbers of model compounds
and 3) simple enough to enable CHARMM users to easily extend the force field to chemical
groups of their interest. For these reasons, the standard CHARMM optimization procedure for
biomolecular force fields has been simplified for the purpose of creating CGenFF, with a
stronger emphasis on quantum mechanical (QM) calculations than with the biomolecular
CHARMM force fields. Nevertheless, those simplifications are designed to maintain the
consistency of the force field required for computational studies of heterogeneous systems.
Indeed, improvements in computer power and QM methodology allow for QM calculations of
a sufficiently high level of theory to attain the required accuracy to be performed in a timely
fashion.
In order to build a successful force field for drug-like molecules, it is essential to select
appropriate model compounds. For CGenFF, two classes of model compounds were targeted.
The first class comprises a wide range of heterocycles. As this class of compounds acts as the
scaffold for the majority of pharmaceuticals, a comprehensive set of heterocycles would act
as building blocks for a diverse range of compounds. Accordingly, emphasis was placed on
accurate optimization of the intermolecular parameters in these molecules as well as
reproduction of target geometries, vibrational spectra and conformational properties. The
second class consists of simple functional groups. A wide variety of chemical groups occur in
drug-like compounds, fulfilling roles that range from linking different parts of the molecule to
being substituents on aromatic and heterocyclic groups, often in orientations that involve direct
interactions with the biomacromolecular binding partner. Accordingly, a large number of
functional groups were explicitly parametrized when building CGenFF.
Given the availability of a wide range of heterocycles and functional groups, it is anticipated
that eventually the majority of effort to extend the force field will involve linking those groups
together to create the molecule of interest. This procedure requires the selection of the
appropriate fragments from the palette of molecules in CGenFF, applying standard approaches
for covalently linking those rings and functional groups, followed by evaluation of the model.
The force field includes a wide range of default internal parameters for many of the links, such
that once the user creates the topology for their molecule, they will generally be able to perform
molecular mechanics calculations. However, it is strongly suggested that the user perform a
series of well-defined QM calculations to test the conformational properties of the linkers
between the rings comprising their molecule, compare the molecular mechanical (MM) and
QM conformational properties and perform optimization of the appropriate (typically dihedral)
parameters as described below. Accordingly, this manuscript focuses on presentation of the
parameter optimization methodology, a detailed description of the methods required to extend
the force field, and validation of the obtained model for ring systems and functional groups.
Efforts to automate many of these tasks are in progress and will be presented in future works.
Vanommeslaeghe et al. Page 3
J Comput Chem. Author manuscript; available in PMC 2011 March 1.
N
IH
-PA Author M
anuscript
N
IH
-PA Author M
anuscript
N
IH
-PA Author M
anuscript
Methodology
Principles
Class I additive force fields (see equation 1), which do not explicitly treat electronic
polarization, have been designed for use in polar environments typically found in proteins and
in solution. To achieve this, the use of experimental target data, supplemented by QM data,
was strongly emphasized during optimization of the nonbonded parameters in the biomolecular
CHARMM force fields, in order to ensure physical behavior in the bulk phase. However,
reproducing experimental data requires molecular dynamics (MD) simulations, which have to
be set up carefully and repeated multiple times in the course of the parametrization, making
the usage of experimental target data non-trivial and time-consuming. In addition, for many
functional groups that may occur in drug-like molecules experimental data may not be
available. Due to this lack of data, and since one of the main goals of CGenFF is easy and fast
extensibility, a slightly different philosophy was adapted, with more emphasis on QM results
as target data for parameter optimization. This is possible due to the wide range of
functionalities already available whose parameters were optimized based largely on
experimental data, along with the establishment of empirical scaling factors that can be applied
to QM data in order to make them relevant for the bulk phase.
The only cases where experimental data would be required are situations where novel atom
types are present for which LJ parameters are not already available in CGenFF. These cases
would require optimization of the LJ parameters, supplemented with Hartree-Fock (HF) model
compound-water minimum interaction energies and distances (see step 2.a under “Generation
of target data for parameter validation and optimization” and step 1 under “Parametrization
procedure”), based on the reproduction of bulk phase properties, typically pure solvent
molecular volumes and heats of vaporization or crystal lattice parameters and heats of
sublimation. Descriptions of the optimization protocol have been published previously.7,9,25
However, it should be noted that CGenFF has been designed to cover the majority of atom
types in pharmaceutical compounds, such that optimization of LJ parameters is typically not
required.
The remainder of this section includes 1) the procedure to add new model compounds and
chemical groups to the force field, 2) the procedure for generating the QM target data, and 3)
the procedure for application of the QM information to parametrize new molecules. To put
these procedures in better context, example systems including pyrollidine, the addition of
substituents to pyrollidine and the development of a linker between pyrollidine and benzene
are presented.
Addition of model compounds
Model compounds are added in the context of a hierarchical and extensible force field.
Accordingly, step one of extending CGenFF involves identifying available compounds that
are chemically similar to the compound of interest. These available compounds are then used
as the starting point to create an initial topology entry; information from the available
compounds in CGenFF should be used to select the appropriate atom types and initial estimates
of the partial atomic charges. At this stage the molecule, along with its bonded and nonbonded
lists, can be generated in CHARMM,28,29 which will return a list of missing bonded parameters
as an error message. These parameters may be either 1) treated with wildcards or 2) explicitly
added to the force field via their inclusion in the parameter file. In case 1, energies as well as
the full array of calculations available in CHARMM may readily be performed; however, it is
recommended that the user perform validation calculations to determine that the parameters
are yielding satisfactory geometries and conformational energies. This involves performing
the appropriate QM calculations followed by the analogous MM calculations and checking
Vanommeslaeghe et al. Page 4
J Comput Chem. Author manuscript; available in PMC 2011 March 1.
N
IH
-PA Author M
anuscript
N
IH
-PA Author M
anuscript
N
IH
-PA Author M
anuscript
that the target properties are adequately reproduced. In case 2 the user would undertake the
parameter optimization procedure on only the new parameters required for that molecule. This
includes the QM calculations, followed by the analogous MM calculations and parameter
optimization to minimize the difference between the MM and QM data (see below). It should
be emphasized that in the context of a drug optimization project these procedures typically
need to be performed only once on the parent compound as the addition of various functional
groups, as required to create a congeneric series, may be performed without additional
parameter optimization.
Addition of functional groups to parent compounds/linking rings
Following the assignment of parameters and their optimization for the new parent molecule,
standard functional groups such as acids, amines, and alkyl moieties, may readily be added.
The same approach is applicable both to adding functional groups to available compounds in
CGenFF and to linking ring systems to create more complex chemical species. The first step
of this procedure consists of removing hydrogen atoms from the heavy (ie. non-hydrogen)
atoms between which the new covalent bond will be created. Next, taking advantage of the
modular nature of the distribution of charges in CHARMM, the charges on those hydrogens
are summed into their original parent heavy atoms, thereby preserving the integer charge of
the molecule. In the case of adding functional groups to an existing molecule, typically, no
additional parameters are required and the molecule is ready for study. However, if new
parameters are required, which is often the case when more complex functional groups or
linkers are being added, it will be necessary to assign wildcard or add explicit parameters
followed by the appropriate level of validation and optimization.
Generation of target data for parameter validation and optimization
Central to the development of the CHARMM additive force fields was the use of a consistent
optimization protocol, especially with the nonbonded parameters. To maintain this consistency,
it is necessary to generate the appropriate target data. As discussed above, all the target data
required for CGenFF extensions is obtained from QM calculations, with the exception of cases
where explicit optimization of LJ parameters is required. Details of target data generation
follow.
1. Internal parameters: Target data for the internal or bonded parameters include
geometries, vibrational spectra and conformational energies. While not mutually
exclusive, these may be partitioned into three aspects.
a. Bond, valence angle, and Urey-Bradley equilibrium terms and the dihedral
phase and multiplicity are based on geometries optimized at the MP2/6–31G
(d) (MP2/6–31+G(d) in the case of anions) level. This level of theory has
been shown to yield satisfactory agreement with experimental geometries
for complex systems such as nucleic acid bases and sugars9,30,31 while being
computationally accessible.
b. Bond, valence angle, Urey-Bradley, improper and torsion force constants
are based on MP2/6–31G(d) vibrational spectra. The F matrix is scaled by
a factor 0.89, which corresponds to scaling all frequencies by a factor
0.943,32 and a symbolic potential energy distribution (PED) analysis is
performed in the “local internal valence coordinate” space that was first
proposed by Pulay et al.33 The PED analysis may be performed using the
MOLVIB34 module in CHARMM.28,29
c. Force constants for torsions comprised of only non-hydrogen atoms are
usually optimized based on (relaxed) one-dimensional potential energy
scans performed at the MP2/6–31G(d) level. This level of theory has been
Vanommeslaeghe et al. Page 5
J Comput Chem. Author manuscript; available in PMC 2011 March 1.
N
IH
-PA Author M
anuscript
N
IH
-PA Author M
anuscript
N
IH
-PA Author M
anuscript
shown to yield energy surfaces consistent with distributions of dihedrals in
oligonucleotides9,30,31. However, for systems in which dispersion
dominates the conformation properties, like alkanes,35 higher levels of
theory may be required. Single point calculations at th