psweep.psweep.run#
- psweep.psweep.run(worker, params, df=None, poolsize=None, dask_client=None, save=True, tmpsave=False, verbose=False, calc_dir='calc', simulate=False, database_dir=None, database_basename='database.pk', backup=False, git=False, skip_dups=False, capture_logs=None)[source]#
Call worker for each pset in params. Populate a DataFrame with rows from each call
worker(pset)
.- Parameters:
worker (
Callable
) – must accept one parameter: pset (a dict{'a': 1, 'b': 'foo', ...}
), return either an update to pset or a new dict, result will be processes aspset.update(worker(pset))
params (
Sequence
[dict
]) – each dict is a pset{'a': 1, 'b': 'foo', ...}
df (
Optional
[DataFrame
]) – append rows to this DataFrame, if None then either create new one or read existing database file from disk if foundpoolsize (
Optional
[int
]) –None : use serial execution
int : use multiprocessing.Pool (even for
poolsize=1
)
dask_client – A dask client. Use this or
poolsize
.save (
bool
) – save finalDataFrame
to<database_dir>/<database_basename>
(pickle format only), default: “calc/database.pk”, see also database_dir, calc_dir and database_basenametmpsave (
bool
) – save the result dict from eachpset.update(worker(pset))
from each pset to<calc_dir>/tmpsave/<run_id>/<pset_id>.pk
(pickle format only), the data is a dict, not a DataFrame rowverbose (
Union
[bool
,Sequence
[str
]]) –bool : print the current DataFrame row
sequence : list of DataFrame column names, print the row but only those columns
calc_dir (
str
) – Dir where calculation artifacts can be saved if needed, such as dirs per pset<calc_dir>/<pset_id>
. Will be added to the database in_calc_dir
field.simulate (
bool
) – run everything in<calc_dir>.simulate
, don’t call worker, i.e. save what the run would create, but without the results from worker, useful to check if params are correct before starting a production rundatabase_dir (
Optional
[str
]) – Path for the database. Default is<calc_dir>
.database_basename (
str
) –<database_dir>/<database_basename>
, default: “database.pk”backup (
bool
) – Make backup of<calc_dir>
to<calc_dir>.bak_<timestamp>_run_id_<run_id>
where<run_id>
is the latest_run_id
present indf
git (
bool
) – Usegit
to commit all files written and changed by the current run (_run_id
). Make sure to create a.gitignore
manually before if needed.skip_dups (
bool
) – Skip psets whose hash is already present in df. Useful when repeating (parts of) a study.capture_logs (
Optional
[str
]) – {‘db’, ‘file’, ‘db+file’, None} Redirect stdout and stderr generated inworker()
to database (‘db’) column_logs
, file<calc_dir>/<pset_id>/logs.txt
, or both. IfNone
then do nothing (default). Useful for capturing per-pset log text, e.g.print()
calls in worker will be captured.
- Return type:
DataFrame
- Returns:
df – The database build from params.