psweep.psweep module#
- exception psweep.psweep.PsweepHashError[source]#
Bases:
TypeError- add_note()#
Exception.add_note(note) – add a note to the exception
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- psweep.psweep.capture_logs_wrapper(pset, func, capture_logs, db_field='_logs')[source]#
Capture and redirect stdout and stderr produced in func().
Note the limitations mentioned in [1]:
Note that the global side effect on sys.stdout means that this context manager is not suitable for use in library code and most threaded applications. It also has no effect on the output of subprocesses. However, it is still a useful approach for many utility scripts.
So if users rely on playing with sys.stdout/stderr in func(), then they should not use this feature and take care of logging themselves.
[1] https://docs.python.org/3/library/contextlib.html#contextlib.redirect_stdout
- Return type:
dict
- psweep.psweep.check_calc_dir(calc_dir, df)[source]#
Check calc dir for consistency with database.
Assuming dirs are named:
<calc_dir>/<pset_id1> <calc_dir>/<pset_id2> ...
check if we have matching dirs to
_pset_idvalues in the database.
- psweep.psweep.df_ensure_dtypes(df, fill_value=<NA>)[source]#
Make sure that df’s dtype is
object. Convert anypd.isna()values to fill_value.This is part of our attempt to prevent pandas from doing type inference and conversion.
- psweep.psweep.df_extract_dicts(df, py_types=False)[source]#
Convert df’s rows to dicts.
- Parameters:
df (
DataFrame)py_types (
bool) – If True, let Pandas (Series.to_dict()) decide types. It tries to return Python native types (e.g. it convertspd.NAtoNone.) Else, try to preserve types as they are in df.
- Return type:
Sequence[dict]
- psweep.psweep.df_extract_params(df, py_types=False)[source]#
Extract params (list of psets) from df.
Same as
df_extract_dicts(), but limit columns tokind="pset"(seefilter_cols()). This will reproduce the params fed torun()when following the prefix/postfix convention (see_get_col_filter()), meaning that the pset hashes will be the same.- Parameters:
df (
DataFrame)py_types (
bool) – Seedf_extract_dicts()
- Return type:
Sequence[dict]
Examples
>>> import psweep as ps >>> from numpy.random import rand >>> params=ps.pgrid(ps.plist("a", [1,2,3]), ps.plist("b", [77,88])) >>> params [{'a': 1, 'b': 77}, {'a': 1, 'b': 88}, {'a': 2, 'b': 77}, {'a': 2, 'b': 88}, {'a': 3, 'b': 77}, {'a': 3, 'b': 88}]
>>> df=ps.run(func=lambda pset: dict(result_=rand()), params=params, save=False) >>> ps.df_extract_params(df) [{'a': 1, 'b': 77}, {'a': 1, 'b': 88}, {'a': 2, 'b': 77}, {'a': 2, 'b': 88}, {'a': 3, 'b': 77}, {'a': 3, 'b': 88}]
- psweep.psweep.df_extract_pset(df, pset_id, py_types=False)[source]#
Extract a single pset dict for pset_id from df.
This is a convenience function doing just
>>> df_extract_row(df, pset_id=pset_id, py_types=py_types, kind="pset")
- Parameters:
df (
DataFrame)pset_id (
str)py_types (
bool) – Seedf_extract_dicts()
- Return type:
dict
- psweep.psweep.df_extract_row(df, pset_id, kind=None, py_types=False)[source]#
Extract a single row dict for pset_id from df.
When kind is given, limit columns to this (see
filter_cols()).- Parameters:
df (
DataFrame)pset_id (
str)py_types (
bool) – Seedf_extract_dicts()kind (
str) – Seefilter_cols()
- Return type:
dict
- psweep.psweep.df_filter_conds(df, conds, op='and')[source]#
Filter DataFrame using bool arrays/Series/DataFrames in conds.
Fuse all bool sequences in conds using op. For instance, if
op="and", then we logical-and them, which is equal to>>> df[conds[0] & conds[1] & conds[2] & ...]
but conds can be programmatically generated while the expression above would need to be changed by hand if conds changes.
- Parameters:
df (
DataFrame) – DataFrameconds (
Sequence[Sequence[bool]]) – Sequence of bool masks, each of length len(df).op (
str) – Bool operator, used asnumpy.logical_{op}, e.g. “and”, “or”, “xor”.
- Return type:
DataFrame
Examples
>>> df=pd.DataFrame({'a': arange(10), 'b': arange(10)+4}) >>> c1=df.a > 3 >>> c2=df.b < 9 >>> c3=df.a % 2 == 0 >>> df[c1 & c2 & c3] a b 4 4 8 >>> ps.df_filter_conds(df, [c1, c2, c3]) a b 4 4 8
- psweep.psweep.df_print(df, index=False, special_cols=None, prefix_cols=False, cols=[], skip_cols=[])[source]#
Print DataFrame, by default without the index and prefix columns such as _pset_id.
Similar logic as in bin/psweep-db2table, w/o tabulate support but more features (skip_cols for instance).
Column names are always sorted, so the order of names in e.g. cols doesn’t matter.
- Parameters:
df (
DataFrame)index (
bool) – include DataFrame indexprefix_cols (
bool) – include all prefix columns (_pset_id etc.), we don’t support skipping user-added postfix columns (e.g. result_)cols (
Sequence[str]) – explicit sequence of columns, overrides prefix_cols when prefix columns are specifiedskip_cols (
Sequence[str]) – skip those columns instead of selecting them (like cols would), use either this or cols; overrides prefix_cols when prefix columns are specified
Examples
>>> import pandas as pd >>> df=pd.DataFrame(dict(a=rand(3), b=rand(3), _c=rand(3)))
>>> df a b _c 0 0.373534 0.304302 0.161799 1 0.698738 0.589642 0.557172 2 0.343316 0.186595 0.822023
>>> ps.df_print(df) a b 0.373534 0.304302 0.698738 0.589642 0.343316 0.186595
>>> ps.df_print(df, prefix_cols=True) a b _c 0.373534 0.304302 0.161799 0.698738 0.589642 0.557172 0.343316 0.186595 0.822023
>>> ps.df_print(df, index=True) a b 0 0.373534 0.304302 1 0.698738 0.589642 2 0.343316 0.186595
>>> ps.df_print(df, cols=["a"]) a 0.373534 0.698738 0.343316
>>> ps.df_print(df, cols=["a"], prefix_cols=True) a _c 0.373534 0.161799 0.698738 0.557172 0.343316 0.822023
>>> ps.df_print(df, cols=["a", "_c"]) a _c 0.373534 0.161799 0.698738 0.557172 0.343316 0.822023
>>> ps.df_print(df, skip_cols=["a"]) b 0.304302 0.589642 0.186595
- psweep.psweep.df_read(fn, fmt='pickle', **kwds)[source]#
Read DataFrame from file fn. See
df_write().
- psweep.psweep.df_to_json(df, **kwds)[source]#
Like df.to_json but with defaults for orient, date_unit, date_format, double_precision.
- Parameters:
df (
DataFrame) – DataFrame to convertkwds – passed to
df.to_json()
- Return type:
str
- psweep.psweep.df_update_pset_cols(df, pset_cols, fill_value=<NA>, copy=False)[source]#
Make sure that df has at least pset_cols columns. If not, add missing columns, fill with fill_value. Always refresh
_pset_hash.- Return type:
DataFrame
- psweep.psweep.df_update_pset_hash(df, copy=False)[source]#
Add or update
_pset_hashcolumn.- Return type:
DataFrame
- psweep.psweep.df_write(fn, df, fmt='pickle', **kwds)[source]#
Write DataFrame to disk.
- Parameters:
fn (
str) – filenamedf (
DataFrame) – DataFrame to writefmt –
{'pickle', 'json'}kwds – passed to
pickle.dump()ordf_to_json()
- Return type:
None
- psweep.psweep.filter_cols(cols, kind='pset')[source]#
Filter database field names (“columns”) by type.
Here we use the default package-wide naming convention (see
_get_col_filter()).- Parameters:
kind (
str) –pre, prefix: things like
_pset_idpost, postfix: results like
result_pset: neither of the above, params like
a,b
- Return type:
Sequence[str]
- psweep.psweep.filter_params_dup_hash(params, hashes)[source]#
Return params with psets whose hash is not in hashes.
Use
pset["_pset_hash"]if present, else calculate hash on the fly.- Parameters:
params (
Sequence[dict])hashes (
Sequence[str])
- Return type:
Sequence[dict]
- psweep.psweep.filter_params_unique(params)[source]#
Reduce params to unique psets.
Use
pset["_pset_hash"]if present, else calculate hash on the fly.- Parameters:
params (
Sequence[dict])- Return type:
Sequence[dict]
- psweep.psweep.flatten_dict(dct, join_str='_')[source]#
Flatten nested dict.
Will string-convert keys and join using join_str.
- Return type:
dict
Examples
>>> ps.flatten_dict(dict(a=1, b=dict(c=2, d={23: 42}))) {'a': 1, 'b_c': 2, 'b_d_23': 42}
- psweep.psweep.func_wrapper(pset, func, *, tmpsave=False, verbose=False, simulate=False)[source]#
Add those prefix fields (e.g. _time_utc) to pset which can be determined at call time.
Call func on exactly one pset. Return updated pset built from
pset.update(func(pset)). Do verbose printing.- Return type:
dict
- psweep.psweep.intspace(*args, dtype=<class 'numpy.int64'>, **kwds)[source]#
Like
np.linspacebut round to integers.The length of the returned array may be lower than specified by num if rounding to ints results in duplicates.
- Parameters:
*args – Same as
np.linspace**kwds – Same as
np.linspace
- psweep.psweep.itr(func)[source]#
Wrap func to allow passing args not as sequence.
Assuming
func()requires a sequence as input:func([a,b,c]), allow passingfunc(a,b,c).- Return type:
Callable
- psweep.psweep.itr2params(loops)[source]#
Transform the (possibly nested) result of a loop over plists (or whatever has been used to create psets) to a proper list of psets by flattening and merging dicts.
Examples
>>> a = ps.plist('a', [1,2]) >>> b = ps.plist('b', [77,88]) >>> c = ps.plist('c', ['const'])
>>> # result of loops >>> list(itertools.product(a,b,c)) [({'a': 1}, {'b': 77}, {'c': 'const'}), ({'a': 1}, {'b': 88}, {'c': 'const'}), ({'a': 2}, {'b': 77}, {'c': 'const'}), ({'a': 2}, {'b': 88}, {'c': 'const'})]
>>> # flatten into list of psets >>> ps.itr2params(itertools.product(a,b,c)) [{'a': 1, 'b': 77, 'c': 'const'}, {'a': 1, 'b': 88, 'c': 'const'}, {'a': 2, 'b': 77, 'c': 'const'}, {'a': 2, 'b': 88, 'c': 'const'}]
>>> # also more nested stuff is no problem >>> list(itertools.product(zip(a,b),c)) [(({'a': 1}, {'b': 77}), {'c': 'const'}), (({'a': 2}, {'b': 88}), {'c': 'const'})]
>>> ps.itr2params(itertools.product(zip(a,b),c)) [{'a': 1, 'b': 77, 'c': 'const'}, {'a': 2, 'b': 88, 'c': 'const'}]
Notes
When merging dicts, we don’t allow dicts to have same keys, as in
>>> [{'a': 1, 'b': 23}, {'a': 77, 'c': 'const'}]
since this would lead to unexpected results, for example when people use
pgrid()to merge together params of previous studies to create params for a new study.
- psweep.psweep.logspace(start, stop, num=50, offset=0, log_func=<ufunc 'log10'>, **kwds)[source]#
Like
numpy.logspacebutstart and stop are not exponents but the actual bounds
tuneable log scale strength
Control the strength of the log scale by offset, where we use by default
log_func=np.log10andbase=10and returnnp.logspace(np.log10(start + offset), np.log10(stop + offset)) - offset. offset=0 is equal tonp.logspace(np.log10(start), np.log10(stop)). Higher offset values result in more evenly spaced points.- Parameters:
start – same as in
np.logspacestop – same as in
np.logspacenum – same as in
np.logspace**kwds – same as in
np.logspaceoffset – Control strength of log scale.
log_func (
Callable) – Must match base (pass that as part of **kwds). Default isbase=10as innp.logspaceand solog_func=np.log10. If you want a different base, also provide a matching log_func, e.g.base=e, log_func=np.log.
Examples
Effect of different offset values:
>>> from matplotlib import pyplot as plt >>> from psweep import logspace >>> import numpy as np >>> for ii, offset in enumerate([1e-16,1e-3, 1,2,3]): ... x=logspace(0, 2, 20, offset=offset) ... plt.plot(x, np.ones_like(x)*ii, "o-", label=f"{offset=}") >>> plt.legend()
- psweep.psweep.makedirs(path)[source]#
Create path recursively, no questions asked.
- Return type:
None
- psweep.psweep.merge_dicts(args, *, allow_dup_keys=True)[source]#
Start with an empty dict and update with each arg dict left-to-right.
- Parameters:
args (
Sequence[dict]) – dicts to mergeallow_dup_keys – Whether to allow later dicts to overwrite entries of former ones.
- Return type:
dict
- psweep.psweep.pgrid(plists)[source]#
Convenience function for the most common loop: nested loops with
itertools.product:ps.itr2params(itertools.product(a,b,c,...)).- Parameters:
plists (
Sequence[Sequence[dict]]) – List ofplist()results. If more than one, you can also provide plists as args, sopgrid(a,b,c)instead ofpgrid([a,b,c]).- Return type:
Sequence[dict]
Examples
>>> a = ps.plist('a', [1,2]) >>> b = ps.plist('b', [77,88]) >>> c = ps.plist('c', ['const']) >>> # same as pgrid([a,b,c]) >>> ps.pgrid(a,b,c) [{'a': 1, 'b': 77, 'c': 'const'}, {'a': 1, 'b': 88, 'c': 'const'}, {'a': 2, 'b': 77, 'c': 'const'}, {'a': 2, 'b': 88, 'c': 'const'}]
>>> ps.pgrid(zip(a,b),c) [{'a': 1, 'b': 77, 'c': 'const'}, {'a': 2, 'b': 88, 'c': 'const'}]
Notes
For a single plist arg, you have to use
pgrid([a]).pgrid(a)won’t work. However, this edge case (passing one plist to pgrid) is not super useful, since>>> a=ps.plist("a", [1,2,3]) >>> a [{'a': 1}, {'a': 2}, {'a': 3}] >>> ps.pgrid([a]) [{'a': 1}, {'a': 2}, {'a': 3}]
When merging dicts in
itr2params(), we don’t allow dicts to have same keys, as in>>> [{'a': 1, 'b': 23}, {'a': 77, 'c': 'const'}]
since this would lead to unexpected results, for example when people use
pgrid()to merge together params of previous studies to create params for a new study.
- psweep.psweep.plist(name, seq)[source]#
Create a list of single-item dicts holding the parameter name and a value.
>>> plist('a', [1,2,3]) [{'a': 1}, {'a': 2}, {'a': 3}]
- psweep.psweep.prep_batch(params, *, calc_templ_dir='templates/calc', machine_templ_dir='templates/machines', git=False, write_pset=False, template_mode='jinja', **kwds)[source]#
Write files based on templates.
- Parameters:
params (
Sequence[dict]) – Seerun()calc_templ_dir (
str) – Dir with templates.machine_templ_dir (
str) – Dir with templates.git (
bool) – Use git to commit local changes.write_pset (
bool) – Write the input pset to<calc_dir>/<pset_id>/pset.pk.template_mode (
str) – ‘dollar’ or ‘jinja’**kwds – Passed to
run().
- Returns:
The database build from params.
- Return type:
DataFrame
- psweep.psweep.pset_hash(dct, method='sha1', raise_error=True, **kwds)[source]#
Reproducible hash of a dict for usage in database (hash of a pset).
We implement the convention to ignore prefix fields (book-keeping) and postfix fields (results). You can pass skip_prefix_cols / skip_postfix_cols to change that (see
_get_col_filter()).
- psweep.psweep.run(func, params, df=None, poolsize=None, dask_client=None, save=True, tmpsave=False, verbose=False, calc_dir='calc', simulate=False, database_basename='database.pk', backup=False, git=False, skip_dups=False, capture_logs=None, fill_value=<NA>)[source]#
Call func for each pset in params. Populate a DataFrame with rows from each call
func(pset).- Parameters:
func (
Callable) – must accept one parameter: pset (a dict{'a': 1, 'b': 'foo', ...}), return either an update to pset or a new dict, result will be processes aspset.update(func(pset))params (
Sequence[dict]) – each dict is a pset{'a': 1, 'b': 'foo', ...}df (
DataFrame) – append rows to this DataFrame, if None then either create new one or read existing database file from disk if foundpoolsize (
int) –None : use serial execution
int : use multiprocessing.Pool (even for
poolsize=1)
dask_client – A dask client. Use this or
poolsize.save (
bool) – save finalDataFrameto<calc_dir>/<database_basename>(pickle format only), default: “calc/database.pk”, see also calc_dir and database_basenametmpsave (
bool) – save the result dict from eachpset.update(func(pset))from each pset to<calc_dir>/tmpsave/<run_id>/<pset_id>.pk(pickle format only), the data is a dict, not a DataFrame rowverbose (
Union[bool,Sequence[str]]) –bool : print the current DataFrame row
sequence : list of DataFrame column names, print the row but only those columns
calc_dir (
str) – Dir where calculation artifacts can be saved if needed, such as dirs per pset<calc_dir>/<pset_id>. Will be added to the database in_calc_dirfield.simulate (
bool) – run everything in<calc_dir>.simulate, don’t call func, i.e. save what the run would create, but without the results from func, useful to check if params are correct before starting a production rundatabase_basename (
str) –<calc_dir>/<database_basename>, default: “database.pk”backup (
bool) – Make backup of<calc_dir>to<calc_dir>.bak_<timestamp>_run_id_<run_id>where<run_id>is the latest_run_idpresent indfgit (
bool) – Usegitto commit all files written and changed by the current run (_run_id). Make sure to create a.gitignoremanually before if needed.skip_dups (
bool) – Skip psets whose hash is already present in df. Useful when repeating (parts of) a study.capture_logs (
str) – {‘db’, ‘file’, ‘db+file’, None} Redirect stdout and stderr generated infunc()to database (‘db’) column_logs, file<calc_dir>/<pset_id>/logs.txt, or both. IfNonethen do nothing (default). Useful for capturing per-pset log text, e.g.print()calls in func will be captured.fill_value – NA value used for missing values in the database DataFrame.
- Returns:
The database build from params.
- Return type:
DataFrame
- psweep.psweep.stargrid(const, vary, vary_labels=None, vary_label_col='_vary', skip_dups=True)[source]#
Helper to create a specific param sampling pattern.
Vary params in a “star” pattern (and not a full pgrid) around constant values (middle of the “star”).
- Parameters:
const (
dict) – constant paramsvary (
Sequence[Sequence[dict]]) – list of plistsvary_labels (
Sequence[str]) – database col names for parameters in varyskip_dups – filter duplicate psets (see Notes below)
- Return type:
Sequence[dict]
Notes
skip_dups: When creating a star pattern, duplicate psets can occur. By default try to filter them out (using
filter_params_unique()) but ignore hash calculation errors and return non-reduced params in that case.Examples
>>> import psweep as ps >>> const=dict(a=1, b=77, c=11) >>> a=ps.plist("a", [1,2,3,4]) >>> b=ps.plist("b", [77,88,99]) >>> c=ps.plist("c", [11,22,33,44])
>>> ps.stargrid(const, vary=[a, b]) [{'a': 1, 'b': 77, 'c': 11}, {'a': 2, 'b': 77, 'c': 11}, {'a': 3, 'b': 77, 'c': 11}, {'a': 4, 'b': 77, 'c': 11}, {'a': 1, 'b': 88, 'c': 11}, {'a': 1, 'b': 99, 'c': 11}]
>>> ps.stargrid(const, vary=[a, b], skip_dups=False) [{'a': 1, 'b': 77, 'c': 11}, {'a': 2, 'b': 77, 'c': 11}, {'a': 3, 'b': 77, 'c': 11}, {'a': 4, 'b': 77, 'c': 11}, {'a': 1, 'b': 77, 'c': 11}, {'a': 1, 'b': 88, 'c': 11}, {'a': 1, 'b': 99, 'c': 11}]
>>> ps.stargrid(const, vary=[a, b], vary_labels=["a", "b"]) [{'a': 1, 'b': 77, 'c': 11, '_vary': 'a'}, {'a': 2, 'b': 77, 'c': 11, '_vary': 'a'}, {'a': 3, 'b': 77, 'c': 11, '_vary': 'a'}, {'a': 4, 'b': 77, 'c': 11, '_vary': 'a'}, {'a': 1, 'b': 88, 'c': 11, '_vary': 'b'}, {'a': 1, 'b': 99, 'c': 11, '_vary': 'b'}]
>>> ps.stargrid(const, vary=[ps.itr2params(zip(a,c)),b], vary_labels=["a+c", "b"]) [{'a': 1, 'b': 77, 'c': 11, '_vary': 'a+c'}, {'a': 2, 'b': 77, 'c': 22, '_vary': 'a+c'}, {'a': 3, 'b': 77, 'c': 33, '_vary': 'a+c'}, {'a': 4, 'b': 77, 'c': 44, '_vary': 'a+c'}, {'a': 1, 'b': 88, 'c': 11, '_vary': 'b'}, {'a': 1, 'b': 99, 'c': 11, '_vary': 'b'}]
>>> ps.stargrid(const, vary=[ps.pgrid([zip(a,c)]),b], vary_labels=["a+c", "b"]) [{'a': 1, 'b': 77, 'c': 11, '_vary': 'a+c'}, {'a': 2, 'b': 77, 'c': 22, '_vary': 'a+c'}, {'a': 3, 'b': 77, 'c': 33, '_vary': 'a+c'}, {'a': 4, 'b': 77, 'c': 44, '_vary': 'a+c'}, {'a': 1, 'b': 88, 'c': 11, '_vary': 'b'}, {'a': 1, 'b': 99, 'c': 11, '_vary': 'b'}]