
Parser classes for different file formats. Input- and output files.

We need the following basic Unix tools installed:

awk (better mawk)

The tested egrep versions don’t know the \s character class for whitespace as sed, Perl, Python or any other sane regex implementation does. Use [ ] instead.

Using Parsing classes

All parsing classes:


are derived from FlexibleGetters -> UnitsHandler -> {Structure,Trajectory}FileParser

As a general rule: If a getter (self.get_<attr>() or self._get_<attr>_raw() cannot find anything in the file, it returns None. All getters which depend on it will also return None.

  • After initialization

    pp = SomeParsingClass(<filename>)

    all attrs whoose name is in pp.attr_lst will be set to None.

  • parse() will invoke self.try_set_attr(<attr>), which does

    self.<attr> = self.get_<attr>()

    for each <attr> in self.attr_lst, thus setting self.<attr> to a defined value: None if nothing was found in the file or not None else

  • All getters get_<attr>() will do their parsing, possibly looking for a file self.filename, regardless of the fact that the attribute self.<attr> may already be defined (e.g. if parse() has been called before).

  • For interactive use (you need <attr> only once), prefer get_<attr>() over parse().

  • Use dump(‘’) only for temporary storage and fast re-reading. Use‘’). See also the *FileParser.load() docstring.

  • Use relative paths in <filename>.

  • If loading a dump()’ed pickle file from disk,


    then use direct attr access


    instead of


    b/c latter would simply parse self.filname again.

For debugging, we still have many getters which produce redundant information, e.g.

cell + cryst_const
_<attr>_raw + <attr> (where <attr> = cell, forces, …)

especially in MD parsers, not so much in StructureFileParser drived classes. If parse() is used, all this information retrieved and stored.

  • All parsers try to return the default units of the program output, e.g. Ry, Bohr, tryd for PWscf; Ha, Bohr, thart for Abinit and CPMD.

  • Use get_struct() / get_traj() to get a Structure / Trajectory object with pwtools standard units (eV, Ang, fs).

Using parse():


  • Simplicity. All getters are called when parse() is invoked. You get it all.

  • In theory, you can delete the file pointed to by self.filename, assuming all getters have extracted all information that you need.


  • The object is full of (potentially big) arrays holding redundant information. Thus, the dump()’ed file may be large. Use the compress() method.

  • Parsing may be slow if each getter (of possibly many) is called.

Using get_<attr>():


  • You only parse what you really need.


  • self.<attr> will NOT be set, since get_<attr>() only returns <attr> but doesn’t set self.<attr> = self.get_<attr>(), so dump() would save an “empty” file.


CifFile([filename, block])

Parse Cif file.

Cp2kDcdMDOutputFile(*args, **kwds)

Same as Cp2kMDOutputFile (all PROJECT* files are text), only that the coordinates file is a dcd format binary file PROJECT-pos-1.dcd.

Cp2kMDOutputFile(*args, **kwds)

CP2K MD output parser.

Cp2kRelaxOutputFile(*args, **kwds)

Parse cp2k global/run_type cell_opt.

Cp2kSCFOutputFile(*args, **kwds)

CP2K SCF output parser ("global/run_type energy_force,print_level low").

CpmdMDOutputFile(*args, **kwds)

Parse CPMD MD output.

CpmdSCFOutputFile(*args, **kwds)

Parse output from a CPMD "single point calculation" (wave function optimization).


Base class which implements dcd file reading.


LammpsDcdMDOutputFile(*args, **kwds)

Parse Lammps DCD binary output + log.lammps text output.

LammpsTextMDOutputFile([filename, order])

Parse LAMMPS text output.


Very very simple pdb file parser.

PwMDOutputFile([filename, use_alat])

Parse pw.x MD-like output.

PwSCFOutputFile([filename, use_alat])

Parse a pw.x SCF output file (calculation='scf').

PwVCMDOutputFile(*args, **kwds)

Parse only calculation='vc-md'.


StructureFileParser([filename, units])

Base class for single-structure parsers.

TrajectoryFileParser([filename, units])

Base class for MD-like parsers.

arr1d_from_txt(txt[, dtype])

arr2d_from_txt(txt[, dtype])

axis_lens(seq[, axis])

Return length of axis of all arrays in seq.


traj_from_txt(txt, shape[, axis, dtype, sep])

Used for 3d trajectories where the exact shape of the array as written by the MD code must be known, e.g. (nstep,N,3) where N=3 (cell, stress) or N=natoms (coords, forces, ...).