Parsing code output and using containers

Available parsers

A core feature of pwtools is a set of parsers for commonly used atomistic simulation codes. The parsers are located in parse. These parsers are available:

The parsers called *OutputFile are for parsing simulation code output. The others parse structure files (cif and pdb).

All parsers have a common API and can be used like

>>> pp = PwSCFOutputFile('pw.out')
>>> pp.get_cell()
>>> # or call all get_*() methods at once and access attributes
>>> pp.parse()
>>> pp.cell

The problem with using the parsers directly is that there are of course differences between the codes. For instance, each one uses different units (Bohr, Angstrom, Ry, Ha, eV, …). Also, not all structure attributes are contained in the output of all codes. For example, PwSCFOutputFile will parse Cartesian atomic coordinates into pp.coords and the unit cell into pp.cell. However, often we need to use the fractional coordinates pp.coords_frac. This quantity is not present in the PWscf output and thus pp.coords_frac will be None. However, we know that we can manually calculate it from coords and cell.

Container classes Structure and Trajectory

In order to abstract away the differences between codes as much as possible, we have implemented unified container classes. These classes are Structure and Trajectory. The former is used to represent a single crystal structure (unit cell, atom coordinates, total energy, stress tensor, …). The latter represents a sequence of structures, for instance an MD or relaxation run.

They have two important features:

  • A defined set of units (eV, Angstrom,…), to which all quantities are converted.

  • Calculate all missing attributes automatically and thus provide a unified API.

Note that the latter is a convenience feature and will also produce some redundant data. You may want to turn it off for parsing/storing big data.

The auto-calculation of missing properties in Trajectory and Structure is done by trying to calculate all properties for which there is a get_* method. For example, if a parser finds coords and cell in the MD data, then in Trajectory coords_frac is calculated from that.

You can of course use these classes to build new structures and trajectories by hand (just as with ase.Atoms, or you use atoms2struct()):

>>> st = crys.Structure(coords_frac=np.array([[0]*3, [.5]*3]),
                        cryst_const=np.array([3.0]*3 + [60]*3),
                        symbols=['Al','N'])
>>> tr = crys.Trajectory(coords_frac=rand(1000,20,3),
                         cell=rand(1000,3,3),
                         symbols=['H']*20)

By doing this, the set_all() method is automatically called, which will calculate all possible attributes from the input data (for example st.coords, st.cell).

However, some attributes may be undefined. For example, the st above will have no etot or stress attribute (they are None), since that was not defined in the input and there is no ways to calculate it, of course, whereas a Structure returned by read_pw_scf() will have that.

By using the dump() method, you can store the object as binary file [using Python’s pickle module] for fast re-loading later:

>>> st.dump('struck.pk')
>>> st_loaded = io.read_pickle('struck.pk')

A Trajectory object can be viewed a list of Structure instances [even though it is implemented differently due to efficiency: we use 3d numpy arrays], it supports iteration and slicing, for example:

>>> # extract first and last Structure objects
>>> st_first = tr[0]
>>> st_last = tr[-1]
>>> # slice out a part of the trajectory
>>> tr_middle = tr[100:500]
>>> # use every 5th step
>>> tr[::5]

Structure and Trajectory objects can also be freely concatenated into a new Trajectory:

>>> tr_new = crys.concatenate((st1, st2))
>>> tr_new = crys.concatenate((st, tr))
>>> tr_new = crys.concatenate((tr1, tr2, st))

High-level parsing functions

The most simple way to parse code output and get a container class is to use the high-level functions in io.

These return a Structure:
These return a Trajectory:

For example:

>>> st = io.read_pw_scf('pw.out') # SCF run
>>> st.etot
-258.58148870118305
>>> st.cell
array([[-2.71536701,  0.        ,  2.71536701],
       [ 0.        ,  2.71536701,  2.71536701],
       [-2.71536701,  2.71536701,  0.        ]])
>>> tr = io.read_pw_md('pw.out') # MD/relax run
>>> tr.etot[:5]
>>> array([-17812.03671913, -17810.74831561, -17811.38315829,
           -17808.90413645, -17807.96259264])
>>> plot(tr.etot)

These functions use the appropriate parser class and transform the result of the parsing to a Structure or Trajectory. For example, what is essentially done is simply:

>>> # same as tr=io.read_pw_md('pw.out')
>>> pp = parse.PwMDOutputFile('pw.out')
>>> tr = pp.get_traj()

>>> # same as st=io.read_cp2k_scf('cp2k.out')
>>> pp = parse.Cp2kSCFOutputFile('cp2k.out')
>>> st = pp.get_struct()

It is important to note that Structure and Trajectory instances built by hand can be used in exactly the same way as those obtained by using one of the io.read_*() functions.

Units

Each parser will (try to) return the “natural” units of each code:

property

PWscf

CPMD

CP2K

LAMMPS (metal units)

length

Bohr

Bohr

Angstrom

Angstrom

energy

Ry

Ha

Ha

eV

forces

Ry/Bohr

Ha/Bohr

Ha/Bohr

eV/Angstrom

stress

kbar

kbar

bar[MD], GPa[SCF]

bar

temperature

K

K

K

K

velocity

Bohr/thart (?)

Bohr/thart

Angstrom/ps

time

tryd

thart

thart

ps

See constants for thart and tryd.

For PWscf, we also detect things like “ATOMIC_POSITIONS crystal | alat | bohr” and transform accordingly. Nevertheless, always verify that the units you get are the ones you expect!

In Structure and Trajectory, we have units eV, Angstrom,…

property

unit

SI

length

Angstrom

(1e-10 m)

energy

eV

(1.602176487e-19 J)

forces

eV / Angstrom

stress

GPa

(not eV/Angstrom**3)

temperature

K

velocity

Angstrom / fs

time

fs

(1e-15 s)

mass

amu

(1.6605387820000001e-27 kg)