Short-Term Load Forecasting ReviewElliott PiercySchool of InformaticsUniversity of [email protected].

ac.ukAbstract— The increase in energy demand prediction tech-niques will not only be beneficial to energy companies butalso to the environment. This paper will focus on methodsin literature used for short-term load forecasting. Short-termload forecasting attempts to predict energy demand given a setof variables such as temperature, wind speed and humidity.Throughout the decades methods have been improved andrefined with new machine learning techniques taking the placeof old statistical analytical methods. This review will focus onmethods used in literature focusing on the evolution of thealgorithms used.

There has been considerable improvementbut still no concrete algorithm which performs best, as manydifferent factors can effect the results of the systems used.I. INTRODUCTIONThe rise of competitive energy markets and the increasingthreat from global warming has increased the need foraccurate robust energy load prediction models.

There aremany types of energy load prediction from short term predic-tions which vary from minutes to days in advance, to longterm predictions which can attempt to predict energy loadestimates up to years in advance. These long term estimatescan be used as an indicator to energy companies in how toinvest in future infrastructure. Though it is not necessarilyimplied that a larger lead-time leads to a higher forecastingerror 1. In this review short term load forecasting will bethe main focus. Generally when forecasting energy load theMAPE (mean absolute percentage error) is usedN 100 X A n ? F n M AP E =(1)n n=1 A n where A n is the actual value and F n is the forecast value.The reason for this is that if the MPE (mean percentage error)was used the average error could be much lower due to overand under estimates canceling each other out.MPE =N100 X A n ? F nn n=1 A n(2)where A n is the actual value and F n is the forecast value.

In 1984, from a survey of the UK power system itwas concluded that an increase in forecasting error of 1%would cost around £10 million more in operating costs 2.This prediction is now 33 years old with more modernestimates taking its place.34. These predictions show theneed for a robust and efficient method for load forecasting.There has been an explosion in artificial intelligenceapplications in the energy sector in recent decades. The useof algorithms that are largely non-linear and can adapt tosuit multiple types of problems are vital for energy systemsto predict load and flow.

These systems can be used onlong to short-term load forecasting problems each with theirrelative advantages and disadvantages.Initially modeling of STLF (short-term load forecasting)was attempted by using an ARIMA (auto-regressiveintegrated moving average) model which was popularizedby Box and Jenkins 5. ANN (artificial neural networks)were then initially used by Park et al 6 with some promisingresults which were built upon in following decades.

Astechnology has advanced and more computationally intensivealgorithms have been created, different algorithms have beenused to solve this problem. These algorithms include, butare not limited to, PNARIMA (periodic non-linear ARIMA)7, SVM (support vector machines) 8, fuzzy logic 910,multiple regression techniques and models in combinationwith genetic algorithms 11 and PCA (principal componentanalysis) 12.In the first part of this review, STLF will be described alongwith the difficulties faced by researchers.

Then well usedmodels will be discussed along with techniques of findingrelevant variables and the methods to separate them fromredundant variables. Relevant papers with state-of-the-artmethods shall then be discussed focusing on the reasonsbehind each model, showing their relative advantagesand disadvantages. Finally, the results will be stated andcritiqued with conclusions drawn from them.II. S HORT -T ERM L OAD F ORECASTINGShort-term load forecasting is the process of using modelsto predict energy load from minutes to days in advance. Ithas many economic and environmental advantages rangingfrom reducing unnecessary cost to reducing the creation ofbi-products from energy production which can damage theenvironment.Much literature states that there is a noticeable differencein load on Saturday, Sunday and Monday 13.

These dayscoincide with a Christian weekday whereas in countrieswhich practice other religions, these increased energyconsumption days can be seen on different days of theweek. This behavior is seen in Iranian datasets 14 whosedominant religion is Islam.Energy companies usually provide energy to differenttypes of customers.

These range from industrial, commercialand residential with each having different respective loadcurves. Figure 1 15 shows the broad differences in theseload curves over the course of 24 hours.Fig.

1.Energy demand curve over 24hr based on sector.It would be expected that the residential curve dips atnight and starts to peak in the morning. Resulting with thehighest peak in the evening. The commercial curve wouldshows a sudden peak due to work being commenced atthe same time every day, and this tails off towards thenight as employees go home for the evening. The industrialcurve is very different to the other load curves because it isdemands a fairly consistent energy supply. Some industrialservices such as sewage plants run 24 hours a day.

The onlyincrease that would be seen in the energy demand would bea slight raise during daylight hours due to a higher volumeof people using their services.1Fan et al 1 states that anomalous days need to be treatedwith different schemes. Sports events, faults, extreme stormsor holidays cause anomalous behaviour. To enlarge thedataset for anomalous days we can gather anomalous daysinto a group to enlarge the dataset and train a model onthis data which should create better results. It is not advisedto aggregate data from the past as energy load has seenan upwards trend so data that is too old is invalid. This isrelevant to anomalous and non-anomalous days.The models created for STLF can be applied to otherareas in the energy sector.

As the algorithms take weatherconditions into account, they are already well suited forpredicting energy production from renewable energy sourcessuch as wind and solar farms.III. B ACKGROUNDIn this section, algorithms which have become popularizedwill be discussed.A. Load prediction algorithmsAs previously mentioned, the ARIMA model was popular-ized by Box and Jenkins 5 in the 1970s. The ARIMA modelcomputes a new value by calculating a linear combinationof previous values.

This model is a generalized extensionof the IMA (integrated moving average) model. ARIMAexponentially decays the weights of older readings over time.ARIMA parameter tuning is done in one step, unlikein ANN where this is done over multiple iterations.The advantage of this is that the ARIMA model alwaysconverges to a unique solution 16 and doesnt get caughtin a local minimum.An ANN is network which is modeled on the humanbrain. It is comprised of units or neurons which are highlyconnected.

Each unit receives an input and transforms thedata based on the activation function used passing this outputto other highly connected neurons until the whole networkhas been traversed and an output is received. These networkscan have a varying amount of layersANN have become much more popular in recent years.Conventional algorithms attempt to find results by usingmodels that create a linear combination between variablessuch as temperature and humidity. The forecasted results aredependent of spatio-temporal elements 6. This increase inuse of ANNs is in line with greater research into the fieldand the increase of computational power.

Some ANN can produce low MAPE but have massiveoutliers. It is noteworthy to not only look at the overallMAPE but to also assess these outliers because if these wildpredictions were used it would be a huge detriment to thesystem.17As ANNs have the ability to model non-linear problemsthey are an obvious choice for STLF. After Lee et al13 first introduced the use of ANN to this problem,more researchers have advanced their models by creatingmulti-layered models and models in combination withmultiple different algorithms aimed to pre-process the dataor to work in tandem with ANNs.

Regression has been used for the STLF problem, anearly example of this is 18. Separate models were usedfor each hour of the day. Following this attempt whichgarnered respectable results, more papers 1910 havebeen produced using variations on this method.Expert systems were an early attempt to create intelligentsystems. These systems used knowledge of experts in thefield to infer and explain new facts. Attempts were madeto use expert systems in combination with other modelsfor STLF 920 but the results were not as promising asmore modern methods.

These systems struggle to scale aseffectively as new techniques meaning they are rarely usedto tackle this problem.GP (Gaussian processes) 21 have not been used muchin STLF. The ability to model uncertainty is very importantto ensure that the model can be used with a known level ofdeviation instead of a hard prediction as in other methods.Mori et al 22 used a GP model with stated assumptions tomodel data from a real Japanese data company. The modelshowed that there was a 100% probability that the actualreading was within three standard deviations of the predictionand a 93.67% probability of actual reading being within twostandard deviations of the prediction.

It is stated that the GPmodel results in lower errors than other methods such asANN and SVR.B. Identification of redundant dataResearch has been done in using genetic algorithms orPCA to remove redundant variables. This will result in thetraining data set size being reduced to only the variableswhich have the greatest effect on the system. Removingthese redundant features will also increase the speed atwhich models can be computed.

PCA is used to find the variables in the data which havethe highest variance. When this technique is used alongsidedata which includes temperature, humidity, wind speed etc,it can evaluate the data and find the most varying data. Thevariables which have low variance will not impact the resultsmuch. The removal of these variables can improve the modeland increase efficiency 12. ACO (ant colony optimization)has also been applied to data to achieve the same result 11.The ACO algorithm is an iterative algorithm and requiresmore computational power than PCA 23.

Plotting demandagainst different variables 24 is another indicator to showwhether they have a high correlation or not. This techniquecan show if a variable is important to the prediction.C. Genetic AlgorithmsGenetic algorithms have been used in combination withmore advanced models to improve forecasting accuracy. Thistype of algorithm has been used to choose architectures andtraining parameters 17 26.

Using genetic algorithms overother parameter optimization techniques such as gradientbased methods can be advantageous. This is because gradientbased methods are liable to get stuck in local minimum andtheir performance is reliant on the initial values used. Geneticalgorithms omit these issues.D. Types of variablesMany different elements can have an effect on energyload which have been studied throughout the decades.

These can be weather based ranging from wind speed,humidity and temperature or they can be related to time ofday, day of the week or national holidays. Some elementalvariables can be highly stochastic, such as wind speed, sothe average reading is used 1. There is some debate thattemperature plays a large role in the variability of energyload. Early papers state that there is a strong correlationbetween energy load and temperature 6 while more recentpapers suggest that the temperature does not strongly affectthe load 8.These difference in opinions could be caused bydifferent geographical locations of datasets. If the climateis near the equator and the temperature is more consistentall year round then the variance of this variable is smallresulting in a small change in the energy load.

IV. M ETHODS IN LITERATUREIn this section specific papers will be mentioned and thedifferent models used will be explained.Fan et al 1 used a SOM (self-organizing map) and aSVM hybrid model trained on data from the New YorkSystem Operator. A SOM is a type of ANN which buildsa representation on the relationship between vectors ofunlabeled data. The low dimensional output was formedfrom clusters of the input data which had similar properties.At this stage regular days and anomalous days were filteredinto two separate groups. This model was used along with24 SVR (support vector regression) models to predict theenergy load over the next 24 hours.

SVR uses a convexcost function meaning that a unique global optimum can befound. The next load predictions are computed by analyzingthe previous time-series readings and using them as inputs.When attempting to forecast a load, if a previous load whichis intended to be used as an input is from an anomalousday, then the forecasted load is used instead to simulate aregular day as to not create anomalous predictions. EachSVR model solves an optimisation problem subject topre-set constraints. Each segment can be accurately modeledusing the same regression model as they all have similarproperties. Suitable parameters for the SVM were selectedusing cross validation.

The advantages of this model arethat by beginning to identify and separate the anomalousand regular days, the authors were able to tailor the modelmore specifically to deal with unexpected loads. They couldalso create a more accurate model for the non-anomalousdays.Park et al 6 states that conventional regressionapproaches cannot address the variability of elements andtends to take an average of these variables, resulting inincreased inaccuracys in the models predictions. ANN aremore adaptable models which can address these smallchanges. A multi-layered network was trained and usedalong with the back propagation algorithm.

This wasiteratively used with the generalized delta rule as the weightupdate function which used momentum to help fasterconvergence 2728. Sigmoid hidden layers were usedto reduce this error though each iteration. It is concludedthat an ANN is more flexible and accurate than regressionmethods. An average error of 2.50% was reported. Foran early attempt using this model, these results are verypromising. The limited computational power would havelimited the size of the model available.

The results outlined in table 1 have been taken from themethods discussed in section IV.Taieb et al 24 used gradient boosting and efficientdata pre-processing to create a competitive model. Themodel was created for a Kaggle competition in which theteam finished in a respectable 5th place out of 105. Thecompetition focused on load forecasting for 20 geographicalzones. For each zone, the hourly electricity load for ninedifferent weeks needed to be predicted without having thelocations of zones or station.

Gradient boosting 25 wasused to accurately predict future loads. Teieb et al datapre-processing was a crucial part of the paper. By plottingdemand against current temperature it was evident that thetemperature was an important predictor for demand. Theleap years were then removed and the log of the demandwas used to stabilize the variance. In addition to this outlierswere replaced with the data from the same period which wascloser to the mean.

The careful pre-processing of data inthis case can reduce any unexpected anomalies when usingthe forecasting method. Two models were used, the firstwas estimated in a linear forward fashion. Then the secondestimated load in reverse. Then a weighted combinationof the models were used.

The author states that using dataanalysis steps helped them to identify useful variables tomodel. TABLE IC OMPARATIVE RESULTSAsber et al 19 used a non-parametric regression approachto the problem. This means that there are no assumptionsabout the data. A linear combination timeseries modelwas used to model the previous loads along with multiplevariables. A Parzen window estimator 29 approach wastaken to estimate the distribution of the dataset.

As the STLFdata is time-series related, the Parzen window moves thoughthe data giving weights to smooth the dataset. The resultsfor this paper were promising attaining an MAPE as lowas 1.5% but this type of model has a major disadvantage.The model holds the data from previous load readings so tofind similar days the data must be searched each time. Ifthere dataset is large then this can become very costly andinefficient.In this section, a wide range of different techniqueswere mentioned. Each method has its own advantages anddisadvantages which need to be assessed by the intendedusers.

Computational power, ease of use and geographicallocation along with many more factors should be taken intoaccount when choosing a load prediction method.V. L ITERATURE RESULT ANALYSISThe papers talked about throughout this review use differ-ent datasets. As mentioned previously, the different datasetcan have a large impact on the results. If efficient pre-processing of the data was conducted then it could have beenassumed that variables which did not have much impact wereweighted highly.

PaperCharytoniuket al 25Charytoniuket al 25Taieb et al 31Taieb et al 31Fan et al 11Park et al 1Algorithm Dataset MAPENon-parametric regression Residential load 1.50Non-parametric regression Commercial load 0.87Gradient boostingGradient boostingHybrid SOM/SVR networkANN Anomalous daysRegular daysNew York city demandKorean electic power company 2.

611.802.502.00Park and Lee’s ANN didnt yield better results than theiradaptive analytical method 30 which resulted in a percentrelative error of 1.

4%. An improvement of the 2% seenusing the ANN. Future application of neural networkscreated more complex models which include a wider rangeof variables and more modern data pre-processing.

The results discussed in this section are promising. Itshould not be assumed that the models with the lowestMAPE should be taken and used. The model used shouldbe dependent on the data at hand. Asber et al ? performedwell with a low MAPE but due to the need to search thoughthe dataset for every prediction this technique may not befeasible for applications with low computational power.As previously mentioned predicting commercial demandis much easier than predicting residential demand.

Thisdifference in ease to compute accurate results should be takeninto account when critiquing results.VI. C ONCLUSIONAs seen in this review, there is not one widely acceptedbest algorithm or method for STLF. Careful pre-processingof data and combination of models can be used to lowerthe error of the system. The early paper from Park et al 6shows that the concept of an ANN can tackle this problemand that good results can be achieved. This idea is stillused today on more scaled and complex systems and shownin 1. Many other models have been used throughout thedecades with only some becoming popularized due to theirrespectable results.

There are still many different aspects ofload forecasting which should still be addressed. There isan overall assumption that the main variables affecting loadis the weather. This is a possible oversight as other datawhich isn’t being considered could be more influential toenergy demand than initially thought. Whether this is thecase or not it can be settled upon that more data, especiallyabout anomalous days, would be beneficial and could seethe MAPE be lowered.Future work into the field should involve algorithmswhich can model differing degrees of uncertainty such asGaussian processes. Only some research has been done intothis algorithm for STLF in recent years 22.

Deep ANNsuch as recurrent ANN which are useful for time-seriesanalysis could also be used along with more modernlearning rules such as RMSProp and ADAM.