Parsing code output and using containers¶

Available parsers¶

A core feature of pwtools is a set of parsers for commonly used atomistic simulation codes. The parsers are located in parse. These parsers are available:

CifFile

PDBFile

PwSCFOutputFile

PwMDOutputFile

PwVCMDOutputFile

CpmdSCFOutputFile

CpmdMDOutputFile

Cp2kSCFOutputFile

Cp2kMDOutputFile

Cp2kDcdMDOutputFile

Cp2kRelaxOutputFile

LammpsTextMDOutputFile

LammpsDcdMDOutputFile

The parsers called *OutputFile are for parsing simulation code output. The others parse structure files (cif and pdb).

All parsers have a common API and can be used like

>>> pp = PwSCFOutputFile('pw.out')
>>> pp.get_cell()
>>> # or call all get_*() methods at once and access attributes
>>> pp.parse()
>>> pp.cell

The problem with using the parsers directly is that there are of course differences between the codes. For instance, each one uses different units (Bohr, Angstrom, Ry, Ha, eV, …). Also, not all structure attributes are contained in the output of all codes. For example, PwSCFOutputFile will parse Cartesian atomic coordinates into pp.coords and the unit cell into pp.cell. However, often we need to use the fractional coordinates pp.coords_frac. This quantity is not present in the PWscf output and thus pp.coords_frac will be None. However, we know that we can manually calculate it from coords and cell.

Container classes `Structure` and `Trajectory`¶

In order to abstract away the differences between codes as much as possible, we have implemented unified container classes. These classes are Structure and Trajectory. The former is used to represent a single crystal structure (unit cell, atom coordinates, total energy, stress tensor, …). The latter represents a sequence of structures, for instance an MD or relaxation run.

They have two important features:

A defined set of units (eV, Angstrom,…), to which all quantities are converted.
Calculate all missing attributes automatically and thus provide a unified API.

Note that the latter is a convenience feature and will also produce some redundant data. You may want to turn it off for parsing/storing big data.

The auto-calculation of missing properties in Trajectory and Structure is done by trying to calculate all properties for which there is a get_* method. For example, if a parser finds coords and cell in the MD data, then in Trajectory coords_frac is calculated from that.

You can of course use these classes to build new structures and trajectories by hand (just as with ase.Atoms, or you use atoms2struct()):

>>> st = crys.Structure(coords_frac=np.array([[0]*3, [.5]*3]),
                        cryst_const=np.array([3.0]*3 + [60]*3),
                        symbols=['Al','N'])
>>> tr = crys.Trajectory(coords_frac=rand(1000,20,3),
                         cell=rand(1000,3,3),
                         symbols=['H']*20)

By doing this, the set_all() method is automatically called, which will calculate all possible attributes from the input data (for example st.coords, st.cell).

However, some attributes may be undefined. For example, the st above will have no etot or stress attribute (they are None), since that was not defined in the input and there is no ways to calculate it, of course, whereas a Structure returned by read_pw_scf() will have that.

By using the dump() method, you can store the object as binary file [using Python’s pickle module] for fast re-loading later:

>>> st.dump('struck.pk')
>>> st_loaded = io.read_pickle('struck.pk')

A Trajectory object can be viewed a list of Structure instances [even though it is implemented differently due to efficiency: we use 3d numpy arrays], it supports iteration and slicing, for example:

>>> # extract first and last Structure objects
>>> st_first = tr[0]
>>> st_last = tr[-1]
>>> # slice out a part of the trajectory
>>> tr_middle = tr[100:500]
>>> # use every 5th step
>>> tr[::5]

Structure and Trajectory objects can also be freely concatenated into a new Trajectory:

>>> tr_new = crys.concatenate((st1, st2))
>>> tr_new = crys.concatenate((st, tr))
>>> tr_new = crys.concatenate((tr1, tr2, st))

High-level parsing functions¶

The most simple way to parse code output and get a container class is to use the high-level functions in io.

These return a Structure:: read_cif()

read_pdb()

read_pw_scf()

read_cpmd_scf()

read_cp2k_scf()
These return a Trajectory:: read_pw_md()

read_pw_vcmd()

read_cpmd_md()

read_cp2k_md()

read_cp2k_md_dcd()

read_cp2k_relax()

read_lammps_md_txt()

read_lammps_md_dcd()

For example:

>>> st = io.read_pw_scf('pw.out') # SCF run
>>> st.etot
-258.58148870118305
>>> st.cell
array([[-2.71536701,  0.        ,  2.71536701],
       [ 0.        ,  2.71536701,  2.71536701],
       [-2.71536701,  2.71536701,  0.        ]])
>>> tr = io.read_pw_md('pw.out') # MD/relax run
>>> tr.etot[:5]
>>> array([-17812.03671913, -17810.74831561, -17811.38315829,
           -17808.90413645, -17807.96259264])
>>> plot(tr.etot)

These functions use the appropriate parser class and transform the result of the parsing to a Structure or Trajectory. For example, what is essentially done is simply:

>>> # same as tr=io.read_pw_md('pw.out')
>>> pp = parse.PwMDOutputFile('pw.out')
>>> tr = pp.get_traj()

>>> # same as st=io.read_cp2k_scf('cp2k.out')
>>> pp = parse.Cp2kSCFOutputFile('cp2k.out')
>>> st = pp.get_struct()

It is important to note that Structure and Trajectory instances built by hand can be used in exactly the same way as those obtained by using one of the io.read_*() functions.

Units¶

Each parser will (try to) return the “natural” units of each code:

property	PWscf	CPMD	CP2K	LAMMPS (metal units)
length	Bohr	Bohr	Angstrom	Angstrom
energy	Ry	Ha	Ha	eV
forces	Ry/Bohr	Ha/Bohr	Ha/Bohr	eV/Angstrom
stress	kbar	kbar	bar[MD], GPa[SCF]	bar
temperature	K	K	K	K
velocity		Bohr/thart (?)	Bohr/thart	Angstrom/ps
time	tryd	thart	thart	ps

See constants for thart and tryd.

For PWscf, we also detect things like “ATOMIC_POSITIONS crystal | alat | bohr” and transform accordingly. Nevertheless, always verify that the units you get are the ones you expect!

In Structure and Trajectory, we have units eV, Angstrom,…

property	unit	SI
length	Angstrom	(1e-10 m)
energy	eV	(1.602176487e-19 J)
forces	eV / Angstrom
stress	GPa	(not eV/Angstrom**3)
temperature	K
velocity	Angstrom / fs
time	fs	(1e-15 s)
mass	amu	(1.6605387820000001e-27 kg)

pwtools

Navigation

Related Topics

Parsing code output and using containers¶

Available parsers¶

Container classes `Structure` and `Trajectory`¶

High-level parsing functions¶

Units¶

Parsing code output and using containers¶

Available parsers¶

Container classes Structure and Trajectory¶

High-level parsing functions¶

Units¶

Container classes `Structure` and `Trajectory`¶