Package 'clr' reference manual

Title:	Curve Linear Regression via Dimension Reduction
Description:	A new methodology for linear regression with both curve response and curve regressors, which is described in Cho, Goude, Brossat and Yao (2013) <doi:10.1080/01621459.2012.722900> and (2015) <doi:10.1007/978-3-319-18732-7_3>. The key idea behind this methodology is dimension reduction based on a singular value decomposition in a Hilbert space, which reduces the curve regression problem to several scalar linear regression problems.
Authors:	Amandine Pierrot with contributions and/or help from Qiwei Yao, Haeran Cho, Yannig Goude and Tony Aldon.
Maintainer:	Amandine Pierrot <[email protected]>
License:	LGPL (>= 2.0)
Version:	0.1.2.9000
Built:	2025-03-21 03:17:01 UTC
Source:	https://github.com/apierrot/clr

Curve Linear Regression

Description

clr provides functions for curve linear regression via dimension reduction.

Details

The package implements a new methodology for linear regression with both curve response and curve regressors, which is described in Cho et al. (2013) and Cho et al. (2015). The CLR model performs a data-driven dimension reduction, based on a singular value decomposition in a Hilbert Space, as well as a data transformation so that the relationship between the transformed data is linear and can be captured by simple regression models.

Author(s)

Amandine Pierrot <[email protected]>

with contributions and help from Qiwei Yao, Haeran Cho, Yannig Goude and Tony Aldon.

References

These provide details for the underlying clr methods.

Cho, H., Y. Goude, X. Brossat, and Q. Yao (2013) Modelling and Forecasting Daily Electricity Load Curves: A Hybrid Approach. Journal of the American Statistical Association 108: 7-21.

Cho, H., Y. Goude, X. Brossat, and Q. Yao (2015) Modelling and Forecasting Daily Electricity Load via Curve Linear Regression. In Modeling and Stochastic Learning for Forecasting in High Dimension, edited by Anestis Antoniadis and Xavier Brossat, 35-54, Springer.

Curve Linear Regression via dimension reduction

Description

Fits a curve linear regression (CLR) model to data, using dimension reduction based on singular value decomposition.

Usage

clr(Y, X, clust = NULL, qx_estimation = list(method = "pctvar", param =
  0.999), ortho_Y = TRUE, qy_estimation = list(method = "pctvar", param
  = 0.999), d_estimation = list(method = "cor", param = 0.5))
clr(Y, X, clust = NULL, qx_estimation = list(method = "pctvar", param =
  0.999), ortho_Y = TRUE, qy_estimation = list(method = "pctvar", param
  = 0.999), d_estimation = list(method = "cor", param = 0.5))

Arguments

`Y`	An object of class `clrdata` or `matrix`, of the response curves (one curve a row).
`X`	An object of class `clrdata` or `matrix`, of the regressor curves (one curve a row).
`clust`	If needed, a list of row indices for each cluster, to obtain (approximately) homogeneous dependence structure inside each cluster.
`qx_estimation`	A list containing both values for 'method' (among 'ratio', 'ratioM', 'pctvar', 'fixed') and for 'param' (depending on the selected method), in order to choose how to estimate the dimension of X (in the sense that its Karhunen-Lo\'eve decomposition has qx terms only.
`ortho_Y`	If TRUE then Y is orthogonalized.
`qy_estimation`	Same as for qx_estimation, if ortho_Y is set to TRUE.
`d_estimation`	A list containing both values for 'method' (among 'ratio', 'pctvar', 'cor') and for 'param' (depending on the selected method), in order to choose how to estimate the correlation dimension.

Value

An object of class clr, which can be used to compute predictions. This clr object is a list of lists: one list by cluster of data, each list including:

`residuals`	The matrix of the residuals of d_hat simple linear regressions.
`b_hat`	The vector of the estimated coefficient of the d_hat simple straight line regressions.
`eta`	The matrix of the projections of X.
`xi`	The matrix of the projections of Y.
`qx_hat`	The estimated dimension of X.
`qy_hat`	The estimated dimension of Y.
`d_hat`	The estimated correlation dimension.
`X_mean`	The mean of the regressor curves.
`X_sd`	The standard deviation of the regressor curves.
`Y_mean`	The mean of the response curves.
`ortho_Y`	The value which was selected for ortho_Y.
`GAMMA`	The standardized transformation for X.
`INV_DELTA`	The standardized transformation for Y to predict if ortho_Y was set to TRUE.
`phi`	The eigenvectors for Y to predict if ortho_Y was set to FALSE.
`idx`	The indices of the rows selected from X and Y for the current cluster.

Examples

library(clr)
data(gb_load)
data(clust_train)

clr_load <- clrdata(x = gb_load$ENGLAND_WALES_DEMAND,
                    order_by = gb_load$TIMESTAMP,
                    support_grid = 1:48)

## data cleaning: replace zeros with NA
clr_load[rowSums((clr_load == 0) * 1) > 0, ] <- NA
matplot(t(clr_load), ylab = 'Daily loads', type = 'l')

Y <- clr_load[2:nrow(clr_load), ]
X <- clr_load[1:(nrow(clr_load) - 1), ]

begin_pred <- which(substr(rownames(Y), 1, 4) == '2016')[1]
Y_train <- Y[1:(begin_pred - 1), ]
X_train <- X[1:(begin_pred - 1), ]

## Example without any cluster
model <- clr(Y = Y_train, X = X_train)

## Example with clusters
model <- clr(Y = Y_train, X = X_train, clust = clust_train)
library(clr)
data(gb_load)
data(clust_train)

clr_load <- clrdata(x = gb_load$ENGLAND_WALES_DEMAND,
                    order_by = gb_load$TIMESTAMP,
                    support_grid = 1:48)

## data cleaning: replace zeros with NA
clr_load[rowSums((clr_load == 0) * 1) > 0, ] <- NA
matplot(t(clr_load), ylab = 'Daily loads', type = 'l')

Y <- clr_load[2:nrow(clr_load), ]
X <- clr_load[1:(nrow(clr_load) - 1), ]

begin_pred <- which(substr(rownames(Y), 1, 4) == '2016')[1]
Y_train <- Y[1:(begin_pred - 1), ]
X_train <- X[1:(begin_pred - 1), ]

## Example without any cluster
model <- clr(Y = Y_train, X = X_train)

## Example with clusters
model <- clr(Y = Y_train, X = X_train, clust = clust_train)

Create an object of `clrdata`

Description

clrdata is used to create a clrdata object from raw data inputs.

Usage

clrdata(x, order_by, support_grid)
clrdata(x, order_by, support_grid)

Arguments

`x`	A vector containing the time series values
`order_by`	A corresponding vector of unique time-dates - must be of class 'POSIXct'
`support_grid`	A vector corresponding to the support grid of functional data

Value

An object of class clrdata with one function a row. As it inherits the matrix class, all matrix methods remain valid. If time-dates are missing in x, corresponding NA functions are added by clrdata so that time sequence is preserved between successive rows.

Examples

library(clr)
data(gb_load)

clr_load <- clrdata(x = gb_load$ENGLAND_WALES_DEMAND,
                    order_by = gb_load$TIMESTAMP,
                    support_grid = 1:48)

head(clr_load)
dim(clr_load)
summary(clr_load)

matplot(t(clr_load), ylab = 'Daily loads', type = 'l')
lines(colMeans(clr_load, na.rm = TRUE),
      col = 'black', lwd = 2)


clr_weather <- clrdata(x = gb_load$TEMPERATURE,
                       order_by = gb_load$TIMESTAMP,
                       support_grid = 1:48)
summary(clr_weather)
plot(1:48,
     colMeans(clr_weather, na.rm = TRUE),
     xlab = 'Instant', ylab = 'Mean of temperatures',
     type = 'l', col = 'cornflowerblue')
library(clr)
data(gb_load)

clr_load <- clrdata(x = gb_load$ENGLAND_WALES_DEMAND,
                    order_by = gb_load$TIMESTAMP,
                    support_grid = 1:48)

head(clr_load)
dim(clr_load)
summary(clr_load)

matplot(t(clr_load), ylab = 'Daily loads', type = 'l')
lines(colMeans(clr_load, na.rm = TRUE),
      col = 'black', lwd = 2)


clr_weather <- clrdata(x = gb_load$TEMPERATURE,
                       order_by = gb_load$TIMESTAMP,
                       support_grid = 1:48)
summary(clr_weather)
plot(1:48,
     colMeans(clr_weather, na.rm = TRUE),
     xlab = 'Instant', ylab = 'Mean of temperatures',
     type = 'l', col = 'cornflowerblue')

Electricity load example: clusters on test set

Description

A list with observations by cluster for prediction

Usage

clust_test
clust_test

Format

A list of length 14:

14 clusters of loads, depending on both daily and seasonal classification, banking holidays being removed

Author(s)

Amandine Pierrot <[email protected]>

Electricity load example: clusters on train set

Description

A list with observations by cluster for fitting

Usage

clust_train
clust_train

Format

A list of length 14:

14 clusters of loads, depending on both daily and seasonal classification, banking holidays being removed

Author(s)

Amandine Pierrot <[email protected]>

Electricity load from Great Britain

Description

A dataset containing half-hourly electricity load from Great Britain from 2011 to 2016, together with observed temperatures. Temperatures are computed from weather stations all over the country. It is a weighted averaged temperature depending on population geographical distribution.

Usage

gb_load
gb_load

Format

A data frame with 105216 rows and 7 variables:

SETTLEMENT_DATE: date, the time zone being Europe/London
SETTLEMENT_PERIOD: time of the day
TIMESTAMP: date-time, the time zone being Europe/London
ENGLAND_WALES_DEMAND: British electric load, measured in MW, on average over the half hour
TEMPERATURE: observed temperature in Celsius
MV: percentage of missing values when averaging over weather stations, depending on the weight of the station
DAY_TYPE: type of the day of the week, from 1 for Sunday to 7 for Saturday, 8 being banking holidays

Author(s)

Amandine Pierrot <[email protected]>

Source

National Grid
National Centers for Environmental Information

Prediction from fitted CLR model(s)

Description

Takes a fitted clr object produced by clr() and produces predictions given a new set of functions or the original values used for the model fit.

Usage

## S3 method for class 'clr'
predict(object, newX = NULL, newclust = NULL,
  newXmean = NULL, simplify = FALSE, ...)
## S3 method for class 'clr'
predict(object, newX = NULL, newclust = NULL,
  newXmean = NULL, simplify = FALSE, ...)

Arguments

`object`	A fitted `clr` object produced by `clr()`.
`newX`	An object of class `clrdata` or a matrix with one function a row. If this is not provided then predictions corresponding to the original data are returned. If `newX` is provided then it should contain the same type of functions as the original ones (same dimension, same clusters eventually, ...).
`newclust`	A new list of indices to obtain (approximately) homogeneous dependence structure inside each cluster of functions.
`newXmean`	To complete when done
`simplify`	If TRUE, one matrix of predicted functions is returned instead of a list of matrices (one matrix by cluster). In the final matrix, rows are sorted by increasing row numbers.
`...`	Further arguments are ignored.

Value

predicted functions

Examples

library(clr)
data(gb_load)

clr_load <- clrdata(x = gb_load$ENGLAND_WALES_DEMAND,
                    order_by = gb_load$TIMESTAMP,
                    support_grid = 1:48)

# data cleaning: replace zeros with NA
clr_load[rowSums((clr_load == 0) * 1) > 0, ] <- NA

Y <- clr_load[2:nrow(clr_load), ]
X <- clr_load[1:(nrow(clr_load) - 1), ]

begin_pred <- which(substr(rownames(Y), 1, 4) == '2016')[1]
Y_train <- Y[1:(begin_pred - 1), ]
X_train <- X[1:(begin_pred - 1), ]
Y_test <- Y[begin_pred:nrow(Y), ]
X_test <- X[begin_pred:nrow(X), ]


## Example without any cluster
model <- clr(Y = Y_train, X = X_train)

pred_on_train <- predict(model)
head(pred_on_train[[1]])

pred_on_test <- predict(model, newX = X_test)
head(pred_on_test[[1]])


## Example with clusters
model <- clr(Y = Y_train, X = X_train, clust = clust_train)

pred_on_train <- predict(model)
str(pred_on_train)
head(pred_on_train[[1]])

pred_on_test <- predict(model, newX = X_test, newclust = clust_test,
                        simplify = TRUE)
str(pred_on_test)
head(pred_on_test)

# With dates as row names
rownames(pred_on_test) <- rownames(Y_test)[as.numeric(rownames(pred_on_test))]
library(clr)
data(gb_load)

clr_load <- clrdata(x = gb_load$ENGLAND_WALES_DEMAND,
                    order_by = gb_load$TIMESTAMP,
                    support_grid = 1:48)

# data cleaning: replace zeros with NA
clr_load[rowSums((clr_load == 0) * 1) > 0, ] <- NA

Y <- clr_load[2:nrow(clr_load), ]
X <- clr_load[1:(nrow(clr_load) - 1), ]

begin_pred <- which(substr(rownames(Y), 1, 4) == '2016')[1]
Y_train <- Y[1:(begin_pred - 1), ]
X_train <- X[1:(begin_pred - 1), ]
Y_test <- Y[begin_pred:nrow(Y), ]
X_test <- X[begin_pred:nrow(X), ]


## Example without any cluster
model <- clr(Y = Y_train, X = X_train)

pred_on_train <- predict(model)
head(pred_on_train[[1]])

pred_on_test <- predict(model, newX = X_test)
head(pred_on_test[[1]])


## Example with clusters
model <- clr(Y = Y_train, X = X_train, clust = clust_train)

pred_on_train <- predict(model)
str(pred_on_train)
head(pred_on_train[[1]])

pred_on_test <- predict(model, newX = X_test, newclust = clust_test,
                        simplify = TRUE)
str(pred_on_test)
head(pred_on_test)

# With dates as row names
rownames(pred_on_test) <- rownames(Y_test)[as.numeric(rownames(pred_on_test))]

Package 'clr'

Help Index

Curve Linear Regression

Description

Details

Author(s)

References

Curve Linear Regression via dimension reduction

Description

Usage

Arguments

Value

See Also

Examples

Create an object of clrdata

Description

Usage

Arguments

Value

Examples

Electricity load example: clusters on test set

Description

Usage

Format

Author(s)

Electricity load example: clusters on train set

Description

Usage

Format

Author(s)

Electricity load from Great Britain

Description

Usage

Format

Author(s)

Source

Prediction from fitted CLR model(s)

Description

Usage

Arguments

Value

Examples

Create an object of `clrdata`