psweep.psweep.run

Contents

psweep.psweep.run#

psweep.psweep.run(worker, params, df=None, poolsize=None, dask_client=None, save=True, tmpsave=False, verbose=False, calc_dir='calc', simulate=False, database_dir=None, database_basename='database.pk', backup=False, git=False, skip_dups=False, capture_logs=None)[source]#

Call worker for each pset in params. Populate a DataFrame with rows from each call worker(pset).

Parameters:
  • worker (Callable) – must accept one parameter: pset (a dict {'a': 1, 'b': 'foo', ...}), return either an update to pset or a new dict, result will be processes as pset.update(worker(pset))

  • params (Sequence[dict]) – each dict is a pset {'a': 1, 'b': 'foo', ...}

  • df (Optional[DataFrame]) – append rows to this DataFrame, if None then either create new one or read existing database file from disk if found

  • poolsize (Optional[int]) –

    • None : use serial execution

    • int : use multiprocessing.Pool (even for poolsize=1)

  • dask_client – A dask client. Use this or poolsize.

  • save (bool) – save final DataFrame to <database_dir>/<database_basename> (pickle format only), default: “calc/database.pk”, see also database_dir, calc_dir and database_basename

  • tmpsave (bool) – save the result dict from each pset.update(worker(pset)) from each pset to <calc_dir>/tmpsave/<run_id>/<pset_id>.pk (pickle format only), the data is a dict, not a DataFrame row

  • verbose (Union[bool, Sequence[str]]) –

    • bool : print the current DataFrame row

    • sequence : list of DataFrame column names, print the row but only those columns

  • calc_dir (str) – Dir where calculation artifacts can be saved if needed, such as dirs per pset <calc_dir>/<pset_id>. Will be added to the database in _calc_dir field.

  • simulate (bool) – run everything in <calc_dir>.simulate, don’t call worker, i.e. save what the run would create, but without the results from worker, useful to check if params are correct before starting a production run

  • database_dir (Optional[str]) – Path for the database. Default is <calc_dir>.

  • database_basename (str) – <database_dir>/<database_basename>, default: “database.pk”

  • backup (bool) – Make backup of <calc_dir> to <calc_dir>.bak_<timestamp>_run_id_<run_id> where <run_id> is the latest _run_id present in df

  • git (bool) – Use git to commit all files written and changed by the current run (_run_id). Make sure to create a .gitignore manually before if needed.

  • skip_dups (bool) – Skip psets whose hash is already present in df. Useful when repeating (parts of) a study.

  • capture_logs (Optional[str]) – {‘db’, ‘file’, ‘db+file’, None} Redirect stdout and stderr generated in worker() to database (‘db’) column _logs, file <calc_dir>/<pset_id>/logs.txt, or both. If None then do nothing (default). Useful for capturing per-pset log text, e.g. print() calls in worker will be captured.

Return type:

DataFrame

Returns:

df – The database build from params.