Datasets and shipped artifacts
elcf4R package
Source:vignettes/elcf4R-datasets-vignette.Rmd
elcf4R-datasets-vignette.RmdOverview
elcf4R now supports four household-oriented public data
sources through a common normalized panel schema:
The package also ships compact example panels and saved benchmark
results so the main vignettes run without external downloads. Raw source
files stay in data-raw/ and are not redistributed through
the package unless a compact derived artifact has been explicitly built
and saved.
Two additional unshipped scaffolds are also available:
-
elcf4r_download_ideal()/elcf4r_read_ideal()for aggregate-electricity hourly summaries from IDEAL. -
elcf4r_download_gx()/elcf4r_read_gx()for the GX transformer/community-level dataset.
Supported dataset matrix
The current dataset surface is:
| Dataset | Reader | Resolution | Temperature in normalized panel | Shipped example | Shipped benchmark |
|---|---|---|---|---|---|
| iFlex | elcf4r_read_iflex() |
hourly | yes | elcf4r_iflex_example |
elcf4r_iflex_benchmark_results |
StoreNet (H6_W) |
elcf4r_read_storenet() |
1 minute | optional, source-dependent | elcf4r_storenet_example |
elcf4r_storenet_benchmark_results |
| Low Carbon London | elcf4r_read_lcl() |
30 minutes | no | elcf4r_lcl_example |
elcf4r_lcl_benchmark_results |
| REFIT | elcf4r_read_refit() |
user-selected resample | no | elcf4r_refit_example |
elcf4r_refit_benchmark_results |
| ELMAS | not part of the common household reader set | hourly | no | elcf4r_elmas_toy |
none |
All four household readers return the same core columns:
datasetentity_idtimestampdatetime_indexytempdowmonthresolution_minutes
Dataset-specific metadata columns are preserved when available.
Scaffolded, unshipped datasets
IDEAL and GX are intentionally documented
separately from the core shipped household matrix.
| Dataset | Helper surface | Level | Current scaffold scope | Shipped example | Shipped benchmark | Licence note |
|---|---|---|---|---|---|---|
| IDEAL |
elcf4r_download_ideal(),
elcf4r_read_ideal()
|
household | aggregate-electricity hourly summaries from
auxiliarydata.zip
|
no | no | the current Edinburgh DataShare record states
CC BY 4.0
|
| GX |
elcf4r_download_gx(),
elcf4r_read_gx()
|
transformer/community | SQLite or flat-export normalization to the common panel schema | no | no | treat licence terms as dataset-record specific and recheck before redistribution |
Notes:
- IDEAL support in this release is limited to aggregate electricity and does not attempt to parse the raw 1 Hz stream.
- GX is not an individual-household dataset. It is useful as a secondary benchmark source for weather and community-level demand behavior, but it is not folded into the package’s core household benchmark claims.
Shipped example panels
The shipped examples are small normalized panels intended for package examples and vignette code.
example_sizes <- Filter(
Negate(is.null),
list(
.elcf4r_example_size_row("elcf4r_iflex_example", "iflex"),
.elcf4r_example_size_row("elcf4r_storenet_example", "storenet"),
.elcf4r_example_size_row("elcf4r_lcl_example", "lcl"),
.elcf4r_example_size_row("elcf4r_refit_example", "refit")
)
)
if (length(example_sizes) == 0L) {
data.frame()
} else {
do.call(rbind, example_sizes)
}
#> data frame with 0 columns and 0 rowsThese objects can be passed directly to:
Shipped benchmark result datasets
Each supported household dataset now has a saved benchmark-result object built from a fixed local cohort and a deterministic rolling-origin design.
benchmark_summary <- Filter(
Negate(is.null),
list(
.elcf4r_benchmark_summary("elcf4r_iflex_benchmark_results", "iflex"),
.elcf4r_benchmark_summary("elcf4r_storenet_benchmark_results", "storenet"),
.elcf4r_benchmark_summary("elcf4r_lcl_benchmark_results", "lcl"),
.elcf4r_benchmark_summary("elcf4r_refit_benchmark_results", "refit")
)
)
if (length(benchmark_summary) == 0L) {
data.frame()
} else {
benchmark_summary <- do.call(rbind, benchmark_summary)
benchmark_summary[, c("dataset", "method", "nmae", "nrmse", "smape", "mase")]
}
#> data frame with 0 columns and 0 rowsThese shipped benchmark tables are poster-style artifacts. They are not intended to replace full local benchmarking on the raw datasets.
Rebuilding the shipped artifacts
Each shipped dataset is reproducible from a data-raw/
script:
data-raw/elcf4r_iflex_subsets.Rdata-raw/elcf4r_iflex_benchmark_results.Rdata-raw/elcf4r_storenet_artifacts.Rdata-raw/elcf4r_lcl_artifacts.Rdata-raw/elcf4r_refit_artifacts.R
The general pattern is:
- Place the original raw files in
data-raw/. - Read them through
elcf4r_read_*(). - Build a normalized day index with
elcf4r_build_benchmark_index(). - Save a compact example panel.
- Run
elcf4r_benchmark()on a fixed cohort and save the result table.
This keeps the package lightweight while making the shipped examples and benchmark summaries reproducible.
Example: daily segments from a shipped panel
if (exists("elcf4r_iflex_example", inherits = FALSE)) {
iflex_segments <- elcf4r_build_daily_segments(
elcf4r_iflex_example,
carry_cols = c("dataset", "participation_phase", "price_signal")
)
dim(iflex_segments$segments)
head(iflex_segments$covariates[, c("entity_id", "date", "temp_mean", "price_signal")])
} else {
data.frame()
}
#> data frame with 0 columns and 0 rowsExample: rerun a tiny benchmark locally
if (exists("elcf4r_lcl_example", inherits = FALSE)) {
tiny_index <- elcf4r_build_benchmark_index(
elcf4r_lcl_example,
carry_cols = "dataset"
)
tiny_benchmark <- elcf4r_benchmark(
panel = elcf4r_lcl_example,
benchmark_index = tiny_index,
methods = c("gam", "kwf"),
cohort_size = 1,
train_days = 10,
test_days = 2,
include_predictions = FALSE
)
tiny_benchmark$results[, c("entity_id", "method", "test_date", "nmae", "mase")]
} else {
data.frame()
}
#> data frame with 0 columns and 0 rows