Title: | Curve Linear Regression via Dimension Reduction |
---|---|
Description: | A new methodology for linear regression with both curve response and curve regressors, which is described in Cho, Goude, Brossat and Yao (2013) <doi:10.1080/01621459.2012.722900> and (2015) <doi:10.1007/978-3-319-18732-7_3>. The key idea behind this methodology is dimension reduction based on a singular value decomposition in a Hilbert space, which reduces the curve regression problem to several scalar linear regression problems. |
Authors: | Amandine Pierrot with contributions and/or help from Qiwei Yao, Haeran Cho, Yannig Goude and Tony Aldon. |
Maintainer: | Amandine Pierrot <[email protected]> |
License: | LGPL (>= 2.0) |
Version: | 0.1.2.9000 |
Built: | 2024-11-21 03:19:23 UTC |
Source: | https://github.com/apierrot/clr |
clr
provides functions for curve linear regression via dimension
reduction.
The package implements a new methodology for linear regression with both curve response and curve regressors, which is described in Cho et al. (2013) and Cho et al. (2015). The CLR model performs a data-driven dimension reduction, based on a singular value decomposition in a Hilbert Space, as well as a data transformation so that the relationship between the transformed data is linear and can be captured by simple regression models.
Amandine Pierrot <[email protected]>
with contributions and help from Qiwei Yao, Haeran Cho, Yannig Goude and Tony Aldon.
These provide details for the underlying clr
methods.
Cho, H., Y. Goude, X. Brossat, and Q. Yao (2013) Modelling and Forecasting Daily Electricity Load Curves: A Hybrid Approach. Journal of the American Statistical Association 108: 7-21.
Cho, H., Y. Goude, X. Brossat, and Q. Yao (2015) Modelling and Forecasting Daily Electricity Load via Curve Linear Regression. In Modeling and Stochastic Learning for Forecasting in High Dimension, edited by Anestis Antoniadis and Xavier Brossat, 35-54, Springer.
Fits a curve linear regression (CLR) model to data, using dimension reduction based on singular value decomposition.
clr(Y, X, clust = NULL, qx_estimation = list(method = "pctvar", param = 0.999), ortho_Y = TRUE, qy_estimation = list(method = "pctvar", param = 0.999), d_estimation = list(method = "cor", param = 0.5))
clr(Y, X, clust = NULL, qx_estimation = list(method = "pctvar", param = 0.999), ortho_Y = TRUE, qy_estimation = list(method = "pctvar", param = 0.999), d_estimation = list(method = "cor", param = 0.5))
Y |
An object of class |
X |
An object of class |
clust |
If needed, a list of row indices for each cluster, to obtain (approximately) homogeneous dependence structure inside each cluster. |
qx_estimation |
A list containing both values for 'method' (among 'ratio', 'ratioM', 'pctvar', 'fixed') and for 'param' (depending on the selected method), in order to choose how to estimate the dimension of X (in the sense that its Karhunen-Lo\'eve decomposition has qx terms only. |
ortho_Y |
If TRUE then Y is orthogonalized. |
qy_estimation |
Same as for qx_estimation, if ortho_Y is set to TRUE. |
d_estimation |
A list containing both values for 'method' (among 'ratio', 'pctvar', 'cor') and for 'param' (depending on the selected method), in order to choose how to estimate the correlation dimension. |
An object of class clr
, which can be used to compute
predictions.
This clr
object is a list of lists: one list by cluster of data, each
list including:
residuals |
The matrix of the residuals of d_hat simple linear regressions. |
b_hat |
The vector of the estimated coefficient of the d_hat simple straight line regressions. |
eta |
The matrix of the projections of X. |
xi |
The matrix of the projections of Y. |
qx_hat |
The estimated dimension of X. |
qy_hat |
The estimated dimension of Y. |
d_hat |
The estimated correlation dimension. |
X_mean |
The mean of the regressor curves. |
X_sd |
The standard deviation of the regressor curves. |
Y_mean |
The mean of the response curves. |
ortho_Y |
The value which was selected for ortho_Y. |
GAMMA |
The standardized transformation for X. |
INV_DELTA |
The standardized transformation for Y to predict if ortho_Y was set to TRUE. |
phi |
The eigenvectors for Y to predict if ortho_Y was set to FALSE. |
idx |
The indices of the rows selected from X and Y for the current cluster. |
clr-package
, clrdata
and
predict.clr
.
library(clr) data(gb_load) data(clust_train) clr_load <- clrdata(x = gb_load$ENGLAND_WALES_DEMAND, order_by = gb_load$TIMESTAMP, support_grid = 1:48) ## data cleaning: replace zeros with NA clr_load[rowSums((clr_load == 0) * 1) > 0, ] <- NA matplot(t(clr_load), ylab = 'Daily loads', type = 'l') Y <- clr_load[2:nrow(clr_load), ] X <- clr_load[1:(nrow(clr_load) - 1), ] begin_pred <- which(substr(rownames(Y), 1, 4) == '2016')[1] Y_train <- Y[1:(begin_pred - 1), ] X_train <- X[1:(begin_pred - 1), ] ## Example without any cluster model <- clr(Y = Y_train, X = X_train) ## Example with clusters model <- clr(Y = Y_train, X = X_train, clust = clust_train)
library(clr) data(gb_load) data(clust_train) clr_load <- clrdata(x = gb_load$ENGLAND_WALES_DEMAND, order_by = gb_load$TIMESTAMP, support_grid = 1:48) ## data cleaning: replace zeros with NA clr_load[rowSums((clr_load == 0) * 1) > 0, ] <- NA matplot(t(clr_load), ylab = 'Daily loads', type = 'l') Y <- clr_load[2:nrow(clr_load), ] X <- clr_load[1:(nrow(clr_load) - 1), ] begin_pred <- which(substr(rownames(Y), 1, 4) == '2016')[1] Y_train <- Y[1:(begin_pred - 1), ] X_train <- X[1:(begin_pred - 1), ] ## Example without any cluster model <- clr(Y = Y_train, X = X_train) ## Example with clusters model <- clr(Y = Y_train, X = X_train, clust = clust_train)
clrdata
clrdata
is used to create a clrdata
object from raw data
inputs.
clrdata(x, order_by, support_grid)
clrdata(x, order_by, support_grid)
x |
A vector containing the time series values |
order_by |
A corresponding vector of unique time-dates - must be of class 'POSIXct' |
support_grid |
A vector corresponding to the support grid of functional data |
An object of class clrdata
with one function a row. As it
inherits the matrix
class, all matrix
methods remain valid.
If time-dates are missing in x, corresponding NA functions are added by
clrdata
so that time sequence is preserved between successive rows.
library(clr) data(gb_load) clr_load <- clrdata(x = gb_load$ENGLAND_WALES_DEMAND, order_by = gb_load$TIMESTAMP, support_grid = 1:48) head(clr_load) dim(clr_load) summary(clr_load) matplot(t(clr_load), ylab = 'Daily loads', type = 'l') lines(colMeans(clr_load, na.rm = TRUE), col = 'black', lwd = 2) clr_weather <- clrdata(x = gb_load$TEMPERATURE, order_by = gb_load$TIMESTAMP, support_grid = 1:48) summary(clr_weather) plot(1:48, colMeans(clr_weather, na.rm = TRUE), xlab = 'Instant', ylab = 'Mean of temperatures', type = 'l', col = 'cornflowerblue')
library(clr) data(gb_load) clr_load <- clrdata(x = gb_load$ENGLAND_WALES_DEMAND, order_by = gb_load$TIMESTAMP, support_grid = 1:48) head(clr_load) dim(clr_load) summary(clr_load) matplot(t(clr_load), ylab = 'Daily loads', type = 'l') lines(colMeans(clr_load, na.rm = TRUE), col = 'black', lwd = 2) clr_weather <- clrdata(x = gb_load$TEMPERATURE, order_by = gb_load$TIMESTAMP, support_grid = 1:48) summary(clr_weather) plot(1:48, colMeans(clr_weather, na.rm = TRUE), xlab = 'Instant', ylab = 'Mean of temperatures', type = 'l', col = 'cornflowerblue')
A list with observations by cluster for prediction
clust_test
clust_test
A list of length 14:
14 clusters of loads, depending on both daily and seasonal classification, banking holidays being removed
Amandine Pierrot <[email protected]>
A list with observations by cluster for fitting
clust_train
clust_train
A list of length 14:
14 clusters of loads, depending on both daily and seasonal classification, banking holidays being removed
Amandine Pierrot <[email protected]>
A dataset containing half-hourly electricity load from Great Britain from 2011 to 2016, together with observed temperatures. Temperatures are computed from weather stations all over the country. It is a weighted averaged temperature depending on population geographical distribution.
gb_load
gb_load
A data frame with 105216 rows and 7 variables:
date, the time zone being Europe/London
time of the day
date-time, the time zone being Europe/London
British electric load, measured in MW, on average over the half hour
observed temperature in Celsius
percentage of missing values when averaging over weather stations, depending on the weight of the station
type of the day of the week, from 1 for Sunday to 7 for Saturday, 8 being banking holidays
Amandine Pierrot <[email protected]>
National Grid
National Centers for Environmental Information
Takes a fitted clr
object produced by clr()
and produces
predictions given a new set of functions or the original values used for
the model fit.
## S3 method for class 'clr' predict(object, newX = NULL, newclust = NULL, newXmean = NULL, simplify = FALSE, ...)
## S3 method for class 'clr' predict(object, newX = NULL, newclust = NULL, newXmean = NULL, simplify = FALSE, ...)
object |
A fitted |
newX |
An object of class |
newclust |
A new list of indices to obtain (approximately) homogeneous dependence structure inside each cluster of functions. |
newXmean |
To complete when done |
simplify |
If TRUE, one matrix of predicted functions is returned instead of a list of matrices (one matrix by cluster). In the final matrix, rows are sorted by increasing row numbers. |
... |
Further arguments are ignored. |
predicted functions
library(clr) data(gb_load) clr_load <- clrdata(x = gb_load$ENGLAND_WALES_DEMAND, order_by = gb_load$TIMESTAMP, support_grid = 1:48) # data cleaning: replace zeros with NA clr_load[rowSums((clr_load == 0) * 1) > 0, ] <- NA Y <- clr_load[2:nrow(clr_load), ] X <- clr_load[1:(nrow(clr_load) - 1), ] begin_pred <- which(substr(rownames(Y), 1, 4) == '2016')[1] Y_train <- Y[1:(begin_pred - 1), ] X_train <- X[1:(begin_pred - 1), ] Y_test <- Y[begin_pred:nrow(Y), ] X_test <- X[begin_pred:nrow(X), ] ## Example without any cluster model <- clr(Y = Y_train, X = X_train) pred_on_train <- predict(model) head(pred_on_train[[1]]) pred_on_test <- predict(model, newX = X_test) head(pred_on_test[[1]]) ## Example with clusters model <- clr(Y = Y_train, X = X_train, clust = clust_train) pred_on_train <- predict(model) str(pred_on_train) head(pred_on_train[[1]]) pred_on_test <- predict(model, newX = X_test, newclust = clust_test, simplify = TRUE) str(pred_on_test) head(pred_on_test) # With dates as row names rownames(pred_on_test) <- rownames(Y_test)[as.numeric(rownames(pred_on_test))]
library(clr) data(gb_load) clr_load <- clrdata(x = gb_load$ENGLAND_WALES_DEMAND, order_by = gb_load$TIMESTAMP, support_grid = 1:48) # data cleaning: replace zeros with NA clr_load[rowSums((clr_load == 0) * 1) > 0, ] <- NA Y <- clr_load[2:nrow(clr_load), ] X <- clr_load[1:(nrow(clr_load) - 1), ] begin_pred <- which(substr(rownames(Y), 1, 4) == '2016')[1] Y_train <- Y[1:(begin_pred - 1), ] X_train <- X[1:(begin_pred - 1), ] Y_test <- Y[begin_pred:nrow(Y), ] X_test <- X[begin_pred:nrow(X), ] ## Example without any cluster model <- clr(Y = Y_train, X = X_train) pred_on_train <- predict(model) head(pred_on_train[[1]]) pred_on_test <- predict(model, newX = X_test) head(pred_on_test[[1]]) ## Example with clusters model <- clr(Y = Y_train, X = X_train, clust = clust_train) pred_on_train <- predict(model) str(pred_on_train) head(pred_on_train[[1]]) pred_on_test <- predict(model, newX = X_test, newclust = clust_test, simplify = TRUE) str(pred_on_test) head(pred_on_test) # With dates as row names rownames(pred_on_test) <- rownames(Y_test)[as.numeric(rownames(pred_on_test))]