This repository was archived by the owner on Jun 22, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 32
planning 2018.05.07
Kamil A. Kaczmarek edited this page May 14, 2018
·
17 revisions
- API design
- docstrings (all in
stepsdir) and readthedocs page - pip package
- assertions
- Steps Use Cases
- desired logic of
Stepbehavior and its interface
- What is pipeline? (definition)
- What problem Steps solves? Useful reading for this:
- new repo name:
-
steps,ml-steps,vulcan
-
- How do we handle partitioning into training and test datasets? (just use separate data nodes?)
- Why do we have to define a caching directory for each and every transformer? Is there a better way to do it (Per project? Context handler? Something else?)
- Input has complicated notation.
data = {'input':
{
'X': X_train,
'y': y_train,
}
}
- https://github.com/minerva-ml/steps/issues/1
- https://github.com/minerva-ml/steps/issues/16
- https://github.com/minerva-ml/steps/issues/24
- https://github.com/minerva-ml/steps/issues/28
- https://github.com/minerva-ml/steps/issues/30
- https://github.com/minerva-ml/steps/issues/31
Installation remove graphviz from the projects -> use plotly (refactor) steps-core only few dependencies
API-design nested dicts in input -> simplify interface -> DataStep should merge input_step and input_data into one API piece.
Write down difference and relation between Step and transformer (part of Step) You add input that is never used -> you see it on the graph.
cache_transformer -> persisted_transformer
Do Not transform twice!!!
Question: add single step as input to ensembling.