This page describes the different config files used in ProLoaF.
ProLoaF’s configuration file is written using JSON. As such, whitespace is allowed and ignored in your syntax.
We mostly use strings, boolean, and null
, to specify the paths, and parameters of our targeted forecasting project.
Numbers and booleans should be unquoted.
For better readability, we assume in the examples below, that your working directory is set to the main project path of the cloned repository. There is also template config file availble in the targets folder.
We will split this config superficially into multiple thematical parts and discuss all available options.
Through the config file the user specifies the data source location, and the directories for logging, exporting performance analyses and most importantly, the trained RNN model binary.
Path Specs:
"data_path": "./data/<FILE-NAME>.csv",
"evaluation_path": "./oracles/eval_<MODEL-NAME>/",
"output_path": "./oracles/",
"exploration_path": "./targets/opsd/tuning.json",
"log_path": "./logs/"
The output-, exploration- and log- paths may stay unchanged, but the data path and evaluation path must be specified.
evaluation.py
is run.exploration = false
see below.) this can be ommited (or null
).These settings describe the data and how it should be used.
Data Specs
"start_date": null,
"history_horizon": 147,
"forecast_horizon": 24,
"frequency": "1h",
"train_split": 0.6,
"validation_split": 0.8,
"periodicity": 24,
"feature_groups": [
{
"name": "main",
"scaler": [
"minmax",
0.0,
1.0
],
"features": [
"AT_load_actual_entsoe_transparency",
"DE_load_actual_entsoe_transparency",
"DE_temperature",
"DE_radiation_direct_horizontal",
"DE_radiation_diffuse_horizontal"
]
},
{
"name": "aux",
"scaler": null,
"features": [
"hour_sin",
"weekday_sin",
"mnth_sin"
]
}
],
"1h"
for quarter-hourly "15min"
.validation_split
. Gives the ratio of the whole dataset is to be used for training as opposed to validation and evaluataion. It is also possible to specify the split by time in the iso 8601 (YYYY-MM-DD HH:MM:SS
) format.train_split
. Defines the second split between validation data and evaluation data. It is also possible to specify the split by time in the iso 8601 (YYYY-MM-DD HH:MM:SS
) format.These settings describe the model itself, including inputs, outputs and model structure.
Model Specs
"model_name": "opsd_recurrent",
"target_id": [
"DE_load_actual_entsoe_transparency",
"AT_load_actual_entsoe_transparency"
],
"encoder_features": [
"AT_load_actual_entsoe_transparency",
"DE_load_actual_entsoe_transparency",
"DE_temperature",
"DE_radiation_direct_horizontal",
"DE_radiation_diffuse_horizontal"
],
"decoder_features": [
"DE_temperature"
],
"aux_features": [
"hour_sin",
"weekday_sin",
"mnth_sin"
],
"model_class": "recurrent",
"model_parameters": {
"recurrent": {
"core_net": "torch.nn.LSTM",
"core_layers": 1,
"dropout_fc": 0.4,
"dropout_core": 0.3,
"rel_linear_hidden_size": 1.0,
"rel_core_hidden_size": 1.0,
"relu_leak": 0.1
},
"simple_transformer": {
"num_layers": 3,
"dropout": 0.4,
"n_heads": 6
}
}
"recurrent"
, "simple_transformer"
, "autoencoder"
, "dualmodel"
. This also selects which parameters are used from model_parameters
(see below) For more information the models see.model_class
setting. Multiple keys are possible to easily switch between different setups but only the one select using model_class
is ever active and required. For a description of the available model classes seeThese settings, configure how the training is conducted.
"optimizer_name": "adam",
"exploration": false,
"cuda_id": null,
"max_epochs": 50,
"batch_size": 28,
"learning_rate": 9.9027931032814e-05,
"early_stopping_patience": 7,
"early_stopping_margin": 0.0,
"adam"
(default), "sgd"
(stochastic gradient decent), "adagrad"
, "adamax"
, "rmsprop"
. For more information consult the pytorch documentationtrue
a an exploration_path
needs to be provided in the path settingsThe tuning config manages the exploration hyper-parameter. To use exploration, activate it in the main config and set the exploration path. In this config specify the tumber of test runs that should be conducted using the attribute
"number_of_tests": 5
In each test a model will be trained and evaluate against each other.
The remainder of the config is under the key "settings"
.
Its value is an object that represents the structure of the main config only the values work differently and none of the parameters is required, the main config will be used for absent parameters.
"batch_size": {
"function": "suggest_int",
"kwargs": {
"name": "batch_size",
"low": 12,
"high": 120
}
},
The value of the parameter specifies a function and map (“kwargs”) that specifies the inputs to that function. Eligible functions are the methods of optuna trials. Obviously only settings can be used whose output corresponds to valid values in main config.
The saliency config dictates how the generation of saliency maps is conducted.
{
"rel_interpretation_path": "oracles/interpretation",
"date": "27.07.2019 00:00:00",
"ref_batch_size": 40,
"max_epochs": 3000,
"n_trials": 5,
"lr_low": 1e-3,
"lr_high": 0.1,
"relative_errors": true,
"lambda": 1,
"cuda_id" : null
}
realatve_erros: true
this is a relative weighting Value >1 increases the value on applying as much noise as possible, <1 increases the focus on not negatively impacting the prediction quality. If realatve_erros: false
this is absolute and the value might vary greatly (e. g. `lambda: 1e-4 might be reasonable) and is highly dependent on the use-case.The default location of the main configuration file is ./targets/
or better ./targets/<STATION>
.
The best practice is to generate sub-directories for each forecasting exercise, i.e. a new station.
As the project originated from electrical load forecasting on substation-level,
the term station or target-station is used to refer to the location or substation identifier
from which the measurement data is originating.
Most of the example scripts in ProLoaF use the config file for training and evaluation,
as it serves as a central place for parametrization.
At this stage you should have the main config file for your forecasting project: ./targets/<STATION>/config.json
.
ProLoaF comes with basic functions to parse,
edit and store the config file. We make use of this when calling e.g. our example training script:
$ python src/train.py -s opsd
The flag -s
allows us to specify the station name (=target directory) through the string that follows, i.e. opsd. The ‘train’ script will expect and parse the config.json given in the target directory. You can also manually specify the path to the config file by adding -c <CONFIG_PATH>
to the above mentioned
statement. The equivalent statement would be:
$ python src/train.py -c ./targets/opsd/config.json