preprocess.py
Basic information on the script for data preparation.
Overview
Preprocess your input data for use with ProLoaF
This script transforms the data to a common format (pandas.DataFrame as csv) for all stations.
Note:
This script can load xlsx or csv files.
If your data does not match the criteria, you can use a custom script that saves your data as a pandas.DataFrame with datetimeindex to a csv file with a “;” as separator to accomplish the same thing.
config
The prep config defines 4 parameters:
- “data_path”: (str) path to the outputfile, relative to the main directory w.r.t. the project folder.
- “raw_path”: (str) directory were all the raw data files are located w.r.t. the project folder.
- “weather_files”: (list) of dicts for each weather file. Each dict should define:
- “file_name”: (str) full name of the file.
- “date_column”: (str) name of the column which contains the date.
- “dayfirst”: (boolean) whether the date format writes day first or not.
- “sep”: (str) separator used in the raw_data csv.
- “combine”: (boolean) all files that have true are appended to each other, w.r.t. time.
- “use_columns”: (list or null) list of columns to use from the file, uses all columns if null.
- “load_files”: (list) of dicts for each load file. Each dict should define:
- “file_name”: (str) full name of the file
- “date_column”: (str) name of the column which contains the date.
- “time_zone”: (str) short for the timezone the data is recorded in.
- “sheet_name”: (int, str, list, null) number starting at 0 or name or list of those of the sheet names that should be loaded, null corresponds to all sheets.
- “combine”: (boolean) all files that have true are appended to each other, w.r.t. time.
- “start_column”: (str) name of the first column affected by data_abs
- “end_column”: (str) first column not affected by data_abs anymore
- “data_abs”: (boolean) column between start_column and end_column.
- historical data provided by the user
Outputs
- data path as defined in the used config.json
Reference Documentation
If you need more details, please take a look at the docs for
this script.
Last modified
April 2, 2022
:
update links in docs (0f90685)