Hyper-sphere support vector machine based on collaborative knowledge mining and SMO optimization in power load forecasting

Background: Due to the influence of power market reform policies, the conversion of power loads has become more and more complicated. The current load forecasting methods have long calculation times and inaccurate volatility load forecasting. The difficulty of forecasting is becoming greater and more accurate. It becomes very important to predict the electrical load. Under this background, this paper proposes the application methods of collaborative knowledge mining and SMO in solving prediction models based on hyperball support vector machine (CKM / SMO-SVM). Methods: This study first analyzes the impact of historical data on samples and different parameters. The prediction of power load, sample data and various parameters have a significant impact on the prediction results. Secondly, applying weak entropy theory for collaborative knowledge mining, preprocessing sample data and historical information. Third, a short-term power load forecasting system based on the hypersphere support vector machine model is established and the problem is solved by SMO. Finally, the SVM model and BP model are selected for prediction to verify the new model. Results: Our research proves that the rms relative error of the CKM / SMO-SVM model is only 2.32%, which is 0.67% and 1.56% lower than the SVM and BP models, respectively, and the optimization speed is faster. Conclusions: The model proposed in this paper utilizes Hyper-sphere SVM which is suitable for Gaussian kernel function to achieve faster and more accurate load forecasting, which can provide more accurate services for energy spot transactions and energy scheduling plans.


INTRODUCTION
Short-term load forecasting of power system plays an important role to reliable and economic operation, especially with the development of power market, shortterm load forecasting obtains more and more attention [1]. SMO (sequential minimal optimization) put forward by Platt and applied to solve support vector machine (SVM) [2]. Zhou considered the shortcomings of conventional algorithms, simplified the SMO algorithm of solving the non-positive definite kernel, and only need to consider two Lagrange multiplier [3]. Zhao, Jiang , Jin and Liu et al. (2008) improved the disadvantages of slow training speed and unstable training results [4]. Aiming at this problem, Men and Wang introduced a parameter selection and optimization method of support vector machine (SVM) [5].
Wang, Liu and Wangcombined the SMO algorithm with the differential evolution algorithm which is applied to the soft-measurement in the flue gas oxygen content, and exhibited excellent performance in multimodal function optimization problems, this method has achieved good results [6].In [7][8] discusses how SMO can effectively reduce the sampling range.
Compared with the algorithms mentioned above, this paper combines the collaborative mining association technology with weaken entropy theory, clusters early samples and historical information and then deals with correlation pretreatment between subsets. Super sphere support vector machine (SVM) theory solved with the SMO algorithm is used finally to forecast the power load, a good prediction effect was finally achieved.

Data clustering mining
At present, weather is the most important influence factors for short-term daily load curve, in order to significantly improve daily load forecasting accuracy, the influence of meteorological factors must be considered. In addition, the time factors (season, whether holidays and weekend or not) also exert a tremendous influence on the power load. Among these weather factors, daily highest temperature, daily minimum temperature, daily average temperature and rainfall, humidity, cloud cover, wind speed have a great influence on the load and can be given by weather forecast. These factors were summarized as ten levels, respectively are the daily highest temperature, daily minimum temperature, daily average temperature and rainfall, humidity, cloud cover, wind speed, season, whether holidays or weekends, designated symbolically as [Z1, Z2, Z3, Z4, Z5 Z6, Z7, Z8, Z9, Z10]. Z1 to Z3, Z5 to Z7 was divided into three comparative value--[low, medium, high] and respectively assigned as [1,2,3]; Z4 was divided into four comparative value based on fuzzy classification as [no rain, light rain, moderate rain, heavy rain], respectively assigned as [0, 1, 2, 3]; Z8 classified as spring, summer, autumn and winter, respectively assigned as [1,2,3,4]; Z9, Z10 classified as yes or no, yes was evaluated as 0, no was evaluated as 1 (Niu, Wang, Wu, 2010). The historical daily load can be classified as follows: The low, medium and high standards of meteorological factors can be determined by the actual situation in the region or fuzzy clustering method. Input history daily load curve into database every day, and input the influence factors of daily fuzzy classification tags at the same time. Each day have a fuzzy classification and can form the corresponding database in this way.
If we know the category of predict day is (3, 2, 1, 1, 2, 3, 3, 1, 1, 1) according to the date and the weather forecast, we then extract all historical days that have category characteristics of (3, 2, 1, 1, 2, 3, 3, 1, 1, 1) reversely from historical load database to form a fuzzy classification database, the library of all the days of this database have the characteristics of (3, 2, 1, 1, 2, 3, 3, 1, 1, 1). The new database which is extracted of the data source that contains mining theme called mining subject database. Mining attribute of the project is no longer contained in each data records.

data correlation
Calculate the support (s) and the confidence (c) to universal set for each mining subject database. Support is the measure of the importance for association rules. It shows the representative compared with all things of universal set. Obviously, the larger the value of support is, the more representative it indicates. Support values are generally small on account of the abundant data in practice. Confidence is the measure of the accuracy for association rules.
We define correlation for the product of support and confidence: i r s c = * to solve the correlation of the mining subject database and universal set.

weaken entropy theory
It may appear a few small probability event in the new database, in order to avoid the small probability events, impact on association rule of mining, we import weaken entropy theory. Entropy expresses the measurement of uncertainty, sudden and randomicity in data set. We give the probability 1, 2, , This paper uses the technology of weaken entropy, only to operate the probability of the individual elements.
It shows as follows: According to the analysis for mathematics characteristics of eq. (2), if

correlation analysis for coordinated mining
According to the above model, the data is analyzed. And then modified association rules can be achieved. The new database marks an essentially change from previous database. We can consider that each attribute (project) of the maximum frequent item sets which form the association rules appears approximate probability in the new database, and its frequency is the highest.
3. Hyper-sphere support vector machine(SVM) regression theory 3.1 Hyper-sphere support vector machine This paper uses the hyper-sphere OC -SVM algorithm. The idea is to find a hyper-sphere, making its radius as small as possible at the same time, the training sample contains as much as possible.
Where R is the radius of the sphere, a is the center and ζ is slack variable, l is the number of training samples. C is the error penalty factor which controls the degree of punishment to the sub samples, achieves a compromise between the size of the ball and the number of samples contained.
When the sample points of the nonlinear separable, the samples are mapped to high dimensional feature space through nonlinear mapping The Lagrange function is derived from the above formula.
Transforming to dual problem: Which is satisfied with: Solving the above QP problem, we can get Lagrange multiplier , and determine the center and radius. The point on the sphere plays a key role to determine the sphere, which is called support vector.
Detail research has been carried out in statistical learning theory at abroad as the basis of support vector machine theory, but the support vector machine is used as a core content of statistical learning theory, its application is far from its expected effect theory. The reason, one is the depth and the computational efficiency of the support vector machine algorithm is not enough; the two is the choice of the kernel function and the parameters of the support vector machine. Therefore, in this paper, the SMO optimization algorithm and support vector machine are combined to find out the relationship between the prediction accuracy and the parameters, which will greatly reduce the complexity of the calculation.

The influence of penalty parameter C
The function of the penalty parameter C is to adjust the ratio of the confidence range of the learning machine and the experience risk in the determined data subspace, so as to make the learning machine's generalization ability best. The optimal C in different data subspace is different, in the determination of the data subspace, the small value of C indicates that the punishment of the empirical error is small, and the complexity of the learning machine is small and the value of experience risk is larger, and vice versa. The former is called "under learning", while the latter is "over learning". When C exceeds a certain value, the complexity of SVM reaches the maximum value of the data subspace, and then the experience risk and promotion ability almost no longer change. There is at least one suitable C for each data subspace, which makes the best generalization ability, as is shown in Fig. 1. As can be seen from the figure, when the C value is smaller, the error is larger, and decreases with the increase of C; When the value of C in the gradual increase in the process, in a certain stage it will be in a stable, but as the C is too large, the error increases with the increase of C.

The influence of estimation accuracy δ
The slack variables i ζ control the width of the insensitive band ζ and affect the number of support vectors. If the value ζ is too small, the regression estimation accuracy is high, but the number of support vectors is increased, the complexity of SMO algorithm is increased in the paper; if the value is too large, the regression estimation precision is reduced, the number of support vectors is reduced, and the complexity of SMO algorithm is reduced. And the estimation accuracy δ is similar to the effect of relaxation variables ζ on the system. Therefore, in the standard support vector machine, the parameters δ and the C control the complexity of the model in different ways.

The influence of nuclear parameters σ
Gauss kernel parameter σ reflects the distribution or range of the training sample data, it determines the width of the local neighborhood. Larger σ means lower variance. As shown in Fig. 2. When σ → 0, the error is small and the test error is large, it is a learning phenomenon, and the algorithm is short of generalization ability; When σ ∞ → , training and test errors are large, it is a phenomenon that is under study; When the value σ is less than a certain value, the correct rate of classification is decreased, the value σ of the description is not in the mathematical sense to tend to 0. Therefore, it is of great significance to improve the classification accuracy.   α respectively designated symbolically as 1 λ and 2 λ .
The feasible region of 1 λ and 2 λ is min , The optimized value of 1 α and 2 α are as follows: In order to decompose the training set to different subsets, we call the point which satisfy the condition of

Steps of algorithm
Step 1: Initialize 0 i α = and R=0 for a training set, calculate 2 i d according to eq. (9); Step 2: Search all points in bounds, look for the point called 1 x that violates KKT conditions, if the program finds the point, jump to step 4; Step 3: Search all boundary points, look for the point called 1 x that violates KKT conditions, if the program can't find the point, jump to step 7; Step 4: Select the point 2 x that can achieve maximum value of Step 6: Return to step 2; Step 7: The algorithm convergence, train is over.

Load forecasting based on CKM/SMO -SVM optimization program
Load forecasting based on CKM/SMO -SVM optimization steps are divided into: firstly, combining the sample data and historical relevant information using collaborative technology of mining association with weaken entropy theory, getting the fuzzy classification of prior samples and historical information Utilizing the chosen sample data and historical relevant information as well as digging into the collaborative knowledge about the combination of weaken entropy theory, classifying the prior sample and historical information, making clustering and correlation analysis, and completing the pretreatment of the upfront input information; Algorithm of parameter selection method is: firstly, carrying out the training of SMO algorithm according to a set of parameter values which are given on the basis of experience;  Figure 3. Forecasting flowchart of the SMO -SVM Then choose another group according to the size of the target parameter for training again, until the satisfaction of training model is obtained, and the process is suitable to be solved by utilizing bi-level optimization algorithm. The method is divided into two layers of optimization calculation; the upper optimization goal is to find the optimal set of parameters that can lead to the optimal model which forming through the SMO training; The task of the upper constraints is obtaining training model by SMO algorithm, which is under the optimization of the parameters in the upper constraints.
In this part, the concept of the bi-level optimization and the method of function fitting are the first to be introduced, then introducing the structure and application of steps about the usage of bi-level optimization to deal with function fitting problems.
Load forecasting based on CKM/SMO -SVM optimization steps (see Fig. 3) are as follows: Combine the sample data and historical relevant information using collaborative technology of mining association with weaken entropy theory, getting the fuzzy classification of prior samples and historical information clustering and correlation analysis, completing the preliminary data preprocessing.
Then, apply the radial basis function as the kernel function, the optimization object to punish coefficient C and kernel parameters σ and estimation precision δ , now the basic steps will be presented as follows: (1)Reading the sample data, making assignment classification of the information about weather, temperature, holiday, season，taking the advantage of collaborative mining technique to form database. Extracting the whole data records containing mining subject from the data source then a new database called mining subject database is formed. In each of the data records does not contain mining subject attribute of the project any longer.  In order to test the accuracy of this model, we predicted the 24-hour load at 7/1/2014 first, then calculated the relative error and root mean-square error and compared with the prediction accuracy of SVM and BP neural network. Results of each matrix are shown in Table 1 and Fig. 4.
Test sample are predicted afterwards. the relative error is calculated and are shown in Fig. 5. The main goal is to analyze the stability of this model.

Analyze of predicted results
Relative error and root-mean-square relative error are used as the final evaluating:  Root-mean-square relative error is used to contrast the predict effect of different models. The train speed is compared in Table 2.  . Error analysis with different models (1) From the method of optimization speed analysis, using collaborative technology of mining association conducts pretreatment to previous samples and historical data, and after the use of SMO optimized parameters, the training speed of using electric power load forecasting SVM is only 23 seconds, the prediction accuracy can reach 2.32% and the training speed of only using SVM to predict is 161 seconds, which is the CKM / SMO-SVM algorithm more than six times. However, the training speed of using BP neural network is faster, but there is no obvious advantage with the CKM / SMO-SVM algorithm comparing, but the former is up 1.56 percentage point than the latter of prediction accuracy.
(2) From Fig. 4, the comparison shows that the approach is scientific and rational. Relevant data [24] shows that if absolute value of relative error less than 3%, the model a reasonable level of accuracy. To measure the predict results that the relative error is not more than 3%, in the 24 points, 22 points' absolute value of relative error less than 3% used CKM/SMO -SVM model, points evenly distributed up and down close to zero; For the SVM prediction model, 8 error values are above 3%, and the other 16 points in addition to 9 in the 2% range, the error of the other 7 points are between 2% and 3%; And predicted results of BP only have 14 points whose absolute value of relative error within 3%, and these 14 points close to 3% for the most part, the largest error achieve 4.77%.
(3) According to the root mean square error comparison, the root means square error of CKM/SMO-SVM algorithm is only 2.32%, 0.67 and 1.56 percentage points lower than that of SVM model and BP neural network. So no matter from the forecast error of the prediction is from the load forecast precision points compared. Prediction accuracy of the new proposed CKM/SMO-SVM prediction model of were higher than the level of prediction accuracy of the SVM model of simple and traditional BP neural network model.  Figure 5. Forecast error of test set (4) According to the analysis of the relative error of predicted on the 12 day of the test set, the prediction results of CKM/SMO-SVM algorithm has good stability. The relative errors were less than 4%, most of which isless than 3%. The root mean square error is 2.37%, and the prediction accuracy of the first day is very poor, which further proves the feasibility of the algorithm.

Conclusion
In this paper, through the detailed analysis of the use of CKM/SMO-SVM optimized support vector machine prediction model ，the combination of data utilization and weaken entropy theory is used to preprocess the sample and historical information. Meanwhile, it analyzes the support to the amount of machine parameters to determine the effect to predict the results. Then, the prediction model of hyper sphere support vector machine based on Collaborative mining association technique and SMO optimization is established. Finally, the method of obtaining high accuracy prediction results is obtained. The following conclusions are obtained by comparing the proposed method with the SVM model and the BP neural network prediction method in the actual short-term power load forecasting: (1) In load forecasting, early data without anticipation on subsequent training speed constitute greater impact and late effects prediction accuracy, because the input information is more complex, quantitative and qualitative information cross strong. So in this paper, we use the collaborative mining association technology to fuzzy clustering and correlation analysis.
(2) By support vector machine parameters to choose a value analysis of influence of different influence on predicting accuracy degree of different, I think in the use of support vector machine prediction and parameter choice to the key role and parameter selection is reasonable tend to cause calculation under learning and studying the phenomenon, which directly affect the prediction accuracy and running time.
(3) With the passage of time prediction, CKM/SMO-SVM optimization support vector machine (SVM) model not only has higher prediction precision of stability is relatively high, and the training speed has also been greatly improved and has very strong practicability.