psweep.psweep.run#
- psweep.psweep.run(func, params, df=None, poolsize=None, dask_client=None, save=True, tmpsave=False, verbose=False, calc_dir='calc', simulate=False, database_basename='database.pk', backup=False, git=False, skip_dups=False, capture_logs=None, fill_value=<NA>)[source]#
Call func for each pset in params. Populate a DataFrame with rows from each call
func(pset).- Parameters:
func (
Callable) – must accept one parameter: pset (a dict{'a': 1, 'b': 'foo', ...}), return either an update to pset or a new dict, result will be processes aspset.update(func(pset))params (
Sequence[dict]) – each dict is a pset{'a': 1, 'b': 'foo', ...}df (
DataFrame) – append rows to this DataFrame, if None then either create new one or read existing database file from disk if foundpoolsize (
int) –None : use serial execution
int : use multiprocessing.Pool (even for
poolsize=1)
dask_client – A dask client. Use this or
poolsize.save (
bool) – save finalDataFrameto<calc_dir>/<database_basename>(pickle format only), default: “calc/database.pk”, see also calc_dir and database_basenametmpsave (
bool) – save the result dict from eachpset.update(func(pset))from each pset to<calc_dir>/tmpsave/<run_id>/<pset_id>.pk(pickle format only), the data is a dict, not a DataFrame rowverbose (
Union[bool,Sequence[str]]) –bool : print the current DataFrame row
sequence : list of DataFrame column names, print the row but only those columns
calc_dir (
str) – Dir where calculation artifacts can be saved if needed, such as dirs per pset<calc_dir>/<pset_id>. Will be added to the database in_calc_dirfield.simulate (
bool) – run everything in<calc_dir>.simulate, don’t call func, i.e. save what the run would create, but without the results from func, useful to check if params are correct before starting a production rundatabase_basename (
str) –<calc_dir>/<database_basename>, default: “database.pk”backup (
bool) – Make backup of<calc_dir>to<calc_dir>.bak_<timestamp>_run_id_<run_id>where<run_id>is the latest_run_idpresent indfgit (
bool) – Usegitto commit all files written and changed by the current run (_run_id). Make sure to create a.gitignoremanually before if needed.skip_dups (
bool) – Skip psets whose hash is already present in df. Useful when repeating (parts of) a study.capture_logs (
str) – {‘db’, ‘file’, ‘db+file’, None} Redirect stdout and stderr generated infunc()to database (‘db’) column_logs, file<calc_dir>/<pset_id>/logs.txt, or both. IfNonethen do nothing (default). Useful for capturing per-pset log text, e.g.print()calls in func will be captured.fill_value – NA value used for missing values in the database DataFrame.
- Return type:
DataFrame- Returns:
df – The database build from params.