parse

Parser classes for different file formats. Input- and output files.

We need the following basic Unix tools installed:

grep/egrep
sed
awk (better mawk)
tail
wc

The tested egrep versions don’t know the \s character class for whitespace as sed, Perl, Python or any other sane regex implementation does. Use [ ] instead.

Using Parsing classes

All parsing classes:

Pw*OutputFile
Cpmd*OutputFile
Cp2k*OutputFile
Lammps*OutputFile

are derived from FlexibleGetters -> UnitsHandler -> {Structure,Trajectory}FileParser

As a general rule: If a getter (self.get_<attr>() or self._get_<attr>_raw() cannot find anything in the file, it returns None. All getters which depend on it will also return None.

  • After initialization

    pp = SomeParsingClass(<filename>)

    all attrs whoose name is in pp.attr_lst will be set to None.

  • parse() will invoke self.try_set_attr(<attr>), which does

    self.<attr> = self.get_<attr>()

    for each <attr> in self.attr_lst, thus setting self.<attr> to a defined value: None if nothing was found in the file or not None else

  • All getters get_<attr>() will do their parsing, possibly looking for a file self.filename, regardless of the fact that the attribute self.<attr> may already be defined (e.g. if parse() has been called before).

  • For interactive use (you need <attr> only once), prefer get_<attr>() over parse().

  • Use dump(‘foo.pk’) only for temporary storage and fast re-reading. Use pwtool.io.read_pickle(‘foo.pk’). See also the *FileParser.load() docstring.

  • Use relative paths in <filename>.

  • If loading a dump()’ed pickle file from disk,

    pp=io.read_pickle(…)

    then use direct attr access

    pp.<attr>

    instead of

    pp.get_<attr>()

    b/c latter would simply parse self.filname again.

For debugging, we still have many getters which produce redundant information, e.g.

cell + cryst_const
_<attr>_raw + <attr> (where <attr> = cell, forces, …)

especially in MD parsers, not so much in StructureFileParser drived classes. If parse() is used, all this information retrieved and stored.

  • All parsers try to return the default units of the program output, e.g. Ry, Bohr, tryd for PWscf; Ha, Bohr, thart for Abinit and CPMD.

  • Use get_struct() / get_traj() to get a Structure / Trajectory object with pwtools standard units (eV, Ang, fs).

Using parse():

Pro:

  • Simplicity. All getters are called when parse() is invoked. You get it all.

  • In theory, you can delete the file pointed to by self.filename, assuming all getters have extracted all information that you need.

Con:

  • The object is full of (potentially big) arrays holding redundant information. Thus, the dump()’ed file may be large. Use the compress() method.

  • Parsing may be slow if each getter (of possibly many) is called.

Using get_<attr>():

Pro:

  • You only parse what you really need.

Con:

  • self.<attr> will NOT be set, since get_<attr>() only returns <attr> but doesn’t set self.<attr> = self.get_<attr>(), so dump() would save an “empty” file.

AWK

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Ang

Convert a string or number to a floating point number, if possible.

Angstrom

Convert a string or number to a floating point number, if possible.

Bohr

Convert a string or number to a floating point number, if possible.

CifFile([filename, block])

Parse Cif file.

Cp2kDcdMDOutputFile(*args, **kwds)

Same as Cp2kMDOutputFile (all PROJECT* files are text), only that the coordinates file is a dcd format binary file PROJECT-pos-1.dcd.

Cp2kMDOutputFile(*args, **kwds)

CP2K MD output parser.

Cp2kRelaxOutputFile(*args, **kwds)

Parse cp2k global/run_type cell_opt.

Cp2kSCFOutputFile(*args, **kwds)

CP2K SCF output parser ("global/run_type energy_force,print_level low").

CpmdMDOutputFile(*args, **kwds)

Parse CPMD MD output.

CpmdSCFOutputFile(*args, **kwds)

Parse output from a CPMD "single point calculation" (wave function optimization).

DcdOutputFile()

Base class which implements dcd file reading.

Ha

Convert a string or number to a floating point number, if possible.

LammpsDcdMDOutputFile(*args, **kwds)

Parse Lammps DCD binary output + log.lammps text output.

LammpsTextMDOutputFile([filename, order])

Parse LAMMPS text output.

PDBFile([filename])

Very very simple pdb file parser.

PwMDOutputFile([filename, use_alat])

Parse pw.x MD-like output.

PwSCFOutputFile([filename, use_alat])

Parse a pw.x SCF output file (calculation='scf').

PwVCMDOutputFile(*args, **kwds)

Parse only calculation='vc-md'.

Ry

Convert a string or number to a floating point number, if possible.

StructureFileParser([filename, units])

Base class for single-structure parsers.

TrajectoryFileParser([filename, units])

Base class for MD-like parsers.

arr1d_from_txt(txt[, dtype])

arr2d_from_txt(txt[, dtype])

axis_lens(seq[, axis])

Return length of axis of all arrays in seq.

eV

Convert a string or number to a floating point number, if possible.

float_from_txt(txt)

fs

Convert a string or number to a floating point number, if possible.

int_from_txt(txt)

nstep_from_txt(txt)

pi

Convert a string or number to a floating point number, if possible.

ps

Convert a string or number to a floating point number, if possible.

thart

Convert a string or number to a floating point number, if possible.

traj_from_txt(txt, shape[, axis, dtype, sep])

Used for 3d trajectories where the exact shape of the array as written by the MD code must be known, e.g. (nstep,N,3) where N=3 (cell, stress) or N=natoms (coords, forces, ...).