Parsing code output and using containers¶
Available parsers¶
A core feature of pwtools is a set of parsers for commonly used atomistic
simulation codes. The parsers are located in parse
. These
parsers are available:
The parsers called *OutputFile
are for parsing simulation code output. The
others parse structure files (cif and pdb).
All parsers have a common API and can be used like
>>> pp = PwSCFOutputFile('pw.out')
>>> pp.get_cell()
>>> # or call all get_*() methods at once and access attributes
>>> pp.parse()
>>> pp.cell
The problem with using the parsers directly is that there are of course
differences between the codes. For instance, each one uses different units
(Bohr, Angstrom, Ry, Ha, eV, …). Also, not all structure attributes are
contained in the output of all codes. For example,
PwSCFOutputFile
will parse Cartesian atomic coordinates into
pp.coords
and the unit cell into pp.cell
. However, often we need to use
the fractional coordinates pp.coords_frac
. This quantity is not present in
the PWscf output and thus pp.coords_frac
will be None. However, we know
that we can manually calculate it from coords
and cell
.
Container classes Structure
and Trajectory
¶
In order to abstract away the differences between codes as much as possible, we
have implemented unified container classes. These classes are
Structure
and Trajectory
. The
former is used to represent a single crystal structure (unit cell, atom
coordinates, total energy, stress tensor, …). The latter represents a
sequence of structures, for instance an MD or relaxation run.
They have two important features:
A defined set of units (eV, Angstrom,…), to which all quantities are converted.
Calculate all missing attributes automatically and thus provide a unified API.
Note that the latter is a convenience feature and will also produce some redundant data. You may want to turn it off for parsing/storing big data.
The auto-calculation of missing properties in Trajectory
and Structure
is done by trying to calculate all
properties for which there is a get_*
method. For example, if a parser
finds coords
and cell
in the MD data, then in
Trajectory
coords_frac
is calculated from that.
You can of course use these classes to build new structures and trajectories by
hand (just as with ase.Atoms
, or you use atoms2struct()
):
>>> st = crys.Structure(coords_frac=np.array([[0]*3, [.5]*3]),
cryst_const=np.array([3.0]*3 + [60]*3),
symbols=['Al','N'])
>>> tr = crys.Trajectory(coords_frac=rand(1000,20,3),
cell=rand(1000,3,3),
symbols=['H']*20)
By doing this, the set_all()
method is
automatically called, which will calculate all possible attributes from the
input data (for example st.coords
, st.cell
).
However, some attributes may be undefined. For example, the st
above will
have no etot
or stress
attribute (they are None), since that was not
defined in the input and there is no ways to calculate it, of course, whereas a
Structure returned by read_pw_scf()
will have that.
By using the dump()
method, you can store
the object as binary file [using Python’s pickle
module] for fast
re-loading later:
>>> st.dump('struck.pk')
>>> st_loaded = io.read_pickle('struck.pk')
A Trajectory object can be viewed a list of Structure instances [even though it is implemented differently due to efficiency: we use 3d numpy arrays], it supports iteration and slicing, for example:
>>> # extract first and last Structure objects
>>> st_first = tr[0]
>>> st_last = tr[-1]
>>> # slice out a part of the trajectory
>>> tr_middle = tr[100:500]
>>> # use every 5th step
>>> tr[::5]
Structure and Trajectory objects can also be freely concatenated into a new Trajectory:
>>> tr_new = crys.concatenate((st1, st2))
>>> tr_new = crys.concatenate((st, tr))
>>> tr_new = crys.concatenate((tr1, tr2, st))
High-level parsing functions¶
The most simple way to parse code output and get a container class is to use
the high-level functions in io
.
- These return a
Structure
: - These return a
Trajectory
:
For example:
>>> st = io.read_pw_scf('pw.out') # SCF run
>>> st.etot
-258.58148870118305
>>> st.cell
array([[-2.71536701, 0. , 2.71536701],
[ 0. , 2.71536701, 2.71536701],
[-2.71536701, 2.71536701, 0. ]])
>>> tr = io.read_pw_md('pw.out') # MD/relax run
>>> tr.etot[:5]
>>> array([-17812.03671913, -17810.74831561, -17811.38315829,
-17808.90413645, -17807.96259264])
>>> plot(tr.etot)
These functions use the appropriate parser class and transform the result of
the parsing to a Structure
or
Trajectory
. For example, what is essentially done is
simply:
>>> # same as tr=io.read_pw_md('pw.out')
>>> pp = parse.PwMDOutputFile('pw.out')
>>> tr = pp.get_traj()
>>> # same as st=io.read_cp2k_scf('cp2k.out')
>>> pp = parse.Cp2kSCFOutputFile('cp2k.out')
>>> st = pp.get_struct()
It is important to note that Structure and Trajectory instances built by hand
can be used in exactly the same way as those obtained by using one of the
io.read_*()
functions.
Units¶
Each parser will (try to) return the “natural” units of each code:
property |
PWscf |
CPMD |
CP2K |
LAMMPS (metal units) |
---|---|---|---|---|
length |
Bohr |
Bohr |
Angstrom |
Angstrom |
energy |
Ry |
Ha |
Ha |
eV |
forces |
Ry/Bohr |
Ha/Bohr |
Ha/Bohr |
eV/Angstrom |
stress |
kbar |
kbar |
bar[MD], GPa[SCF] |
bar |
temperature |
K |
K |
K |
K |
velocity |
Bohr/thart (?) |
Bohr/thart |
Angstrom/ps |
|
time |
tryd |
thart |
thart |
ps |
See constants
for thart and tryd.
For PWscf, we also detect things like “ATOMIC_POSITIONS crystal | alat | bohr” and transform accordingly. Nevertheless, always verify that the units you get are the ones you expect!
In Structure
and Trajectory
, we have
units eV, Angstrom,…
property |
unit |
SI |
---|---|---|
length |
Angstrom |
(1e-10 m) |
energy |
eV |
(1.602176487e-19 J) |
forces |
eV / Angstrom |
|
stress |
GPa |
(not eV/Angstrom**3) |
temperature |
K |
|
velocity |
Angstrom / fs |
|
time |
fs |
(1e-15 s) |
mass |
amu |
(1.6605387820000001e-27 kg) |