Modelling of Input Parameters for Power Generation using Regression Models

In this study, multiple linear regression models were employed in the correlation of gas supply and power generation using a gas Power Plant in Niger Delta, Nigeria as a Case study. From the analysis based on outlier detection, reliability analysis and test of homogeneity, it was observed that the independent variable data such as ambient temperature, gas pressure and compressed temperature failed normality test. Therefore, the use of any linear model for either analysis or modelling of the data was not acceptable. Data used for reliability analysis of the gas pressure and compressed temperature difference were positively correlated with power generation, having a covariance value of 0.639 and 113.148. The ambient temperature was negatively correlated with power generation, having a covariance value of 14.564. The positive value showed that both dimensions exclusively increased and decreased together with respect to the output while the negative value showed that increment in value of one variable led to decrease in the value of the other, and vice versa.


Introduction
Nigeria's natural gas reserves, estimated at about 188 trillion standard cubic feet is the largest in Africa and known to be substantially larger than its oil resources (Nwokeji, 2007;Izuwan, 2017). According to Oyedepo (2012), the largest single consumer of natural gas in Nigeria (before its privatization in November 2013) was the Power Holding Company of Nigeria (PHCN), it accounted for over 70% used in operating electricity-generating gas plant in the country. Given the current reserves and rate of exploitation (about 900mmscfd) for power generation, the expected life span of Nigerian natural gas is over 1000 years, thus making it a good means for power generation. As a result, the gas produced in Nigeria is used mostly in the power sector for power generation and for export as liquefied Petroleum Gas (LPG) (Sambo et al., 2010). Electricity plays a vital role in economic growth and social welfare, thus it is essential to have accessible and reliable electricity at safe conditions (Luis et al., 2019). Rapu et al. (2015) put the average generation capacity of electricity in Nigeria to be fluctuating within the range of 2,623.1 MW/hr in 2007 and 3,485.5 MW/hr in 2014 as against the estimated demand of 10,000MW per day. Enete and Alabi (2011) estimated the distribution of household final energy consumption by types in Nigeria to be 4% electricity, 13% kerosene, 1% LPG and 82% wood and others. Currently, over 70% of Nigeria power generation is from natural gas utilization in power plants. There are over 15 power stations in Nigeria with total generation capacity of over 15,000MW that are currently generating about 7000MW (Onochie et al., 2015). The problem is attributed to various factors that include gas supply, transmission, grid capacity, plant ambient temperature and operating conditions (such as gas pressure and compressed temperature difference), etc. Orogun (2015) in his work discussed the challenge of gas pipeline vandalisation as one of the major challenge militating against the Federal Government of Nigeria efforts to utilize Nigeria's natural gas sustainably for power generation. This challenge is also compounded due to the inadequacy of natural gas transmission and distribution infrastructure. The power station situated in Benin City, Edo state is incorporated with a simple cycle gas turbine with over 450MW capacity and has the following coordinates: 6 o 24'20"N 5 o 41'00"E, with Escravos-Lagos pipeline system as the source of gas supply.
Although some studies such as Oricha and Olarinoye (2012) and Iwuamadi and Dike (2012) have shown that poor plant maintenance, operational policies and power transmission issues are factors that also affect adequate power generation in Nigeria, however, the veracity of these militating factors has been tested on the field assessment in some gas power plants and results show less impact on both plant capacity performance and economic viability when compared to the effect of gas supply to the plants. This paper therefore, attempts to correlate the gas supply and power generation based on statistical regression models.

Materials and Methods
Questionnaires were designed from the data obtained from the National Integrated Power Project (NIPP) power stations and some relevant literature. Forty-three (43) variables were considered in the questionnaires which was scaled with five (5) point Resits Likert's attitudinal scale and administered to 150 respondents. Respondents' responses were transposed into metric variables. Gas pressure, ambient temperature, compressed temperature difference (CTD) and power generation have been identified as variable (Oyedepo et al., 2015) and were used as research parameters in this study. The correlation of gas supply and power generation of gas power plant was done using the multiple linear regression models to assess the data quality, normality of data and the diagnostic analysis of data.

Assessment of data quality
To assess the quality of the data, three important tests were conducted which includes: outlier detection; data fitness using reliability analysis; and test of homogeneity.
2.1.1. Detection of outliers using the labelling rule In this study, the labelling rule method was employed to detect the presence of outliers. The labelling rule is the statistical method of detecting the presence of outliers in data sets using the 25th percentile (lower bound) and the 75th percentile (upper bound). The underlying mathematical equation based on the lower and the upper bound is presented as follows: (1) Upper Bound Q 3 + [2.2 × (Q 3 − Q 1 )] (2) At 0.05 degree of freedom, any data lower than Q 1 or greater than Q 3 was considered an outlier and need to be removed before analysis (Levi et al., 2009).
2.1.2. Data fitness using reliability analysis Reliability analysis of the data was done to ascertain the fitness of the data for the selected analysis. Descriptive analysis of the reliability test was based on the data scale (measured in terms of weight and order of distribution). The summary statistics was done to compute the data means, variance, covariance and correlations using the intra class correlation coefficient.

Test of homogeneity
Homogeneity test was carried out to establish the fact that the data used (i.e. gas pressure, ambient temperature, CTD and power generation) for the analysis were from the same power plant (same population). Homogeneity test is based on the cumulative deviation from the mean as expressed using the mathematical equation below (Raes et al., 2006).
where, i X = the record for the series X 1 X 2 , … X n ,  X = the mean, S ks = the residual mass curve.
For a homogeneous record, one may expect that the S ks fluctuate around the zero-centre line in the residual mass curve since there is no systematic pattern in the deviation X i 's from the average values  X . To perform the homogeneity test, a software package for analysing time series data known as Rainbow (Raes et al., 2006) was used.

Assessment of normality
In the study the Jarque-Bera (JB) test for normality was employed because the sample size was large (i.e. >1000). Mathematically, the JB test is defined (Bowman and Shenton, 1975) as follows: where, n = sample size, √b 1 = sample skewness, and b 2 = kurtosis coefficient.
The hypothesis for the JB test is: H 0 = Data follows a normally distribution H 1 = Data do not follow a normal distribution In general, a large JB value indicates that the residuals are not normally distributed. A value of JB greater than 10 means that the null hypothesis has been rejected at the 5% significance level. In other words, the data do not come from a normal distribution. JB value of between (0-10) indicates that the data is normally distributed (Das and Imon, 2016).

Diagnostic analysis of data
Diagnostic statistics were conducted to verify the statistical properties of the overall regression model. The selected diagnostic statistics include: i. Heteroskedasticity test using Breusch-Pagan Godfrey ii. Serial Correlation test using Breusch Godfrey iii. Variance Inflation Factor (VIF)

Data quality assessment results
The results of outlier detection test; data fitness test using reliability analysis; and test of homogeneity are discussed below.
3.1.1 Data fitness test result using the labelling rule Results of the computed percentiles for both the dependent and independent variable are presented in Table 1. Using the weighted average shown in Table 1, the 25th percentile (Q 1 ) for gas pressure was observed to be 20.600 while the 75th percentile (Q 3 ) was observed to be 21.500. Substituting into Eqn.
(2), the lower and upper bound statistics were computed to be 18.62 and 23.58 respectively.  (Levi et al., 2009).
From Table 1, the 25th percentile (Q 1 ) for ambient temperature was observed to be 27.000 while the 75th percentile (Q 3 ) was observed to be 29.300. Using eqns. (1) and (2)   The extreme value statistics of CTD is shown in Table 4. The 25th percentile (Q 1 ) for CTD was seen to be 335.650 while the 75th percentile (Q 3 ) was seen to be 350.750. The lower and upper bound statistics were computed as 302.43 and 383.97 using eqns. (1) and (2) Table 6 shows the summary statistics of data means, variance, covariance and correlations using the intra-class correlation coefficient. The Fisher's probability test (F-test) was used for the analysis and result obtained are presented in Table 7.  Table 7 shows that the gas pressure and compressed temperature difference were positively correlated with power generation and have a covariance value of 0.639 and 113.148 respectively. The large covariance value of compressed temperature difference indicates that the variable has an overriding influence on power generation compare to gas pressure and ambient temperature. These are seen to be negatively correlated with power generation. The computed coefficient of correlations of 0.051 for gas pressure, -0.063 for ambient temperature, and 0.311 for compressed temperature difference were observed to be relatively weak, which is indicative of the absence of co-linearity problem in the regression variables. The highest coefficient of (+0.311) which is between compressed temperature difference and power generation still did not pose any challenge of multi-collinearity. Hence, we can conclude that there is no issue of multicollinearity and that the regression variables are clearly correlated with the dependent variable. This is evident in the intra-class correlation coefficient presented in Table 8. Again, we observed from the result of Table 8 that the single and average measure intra-class correlation coefficients are relatively weak (0.046 and 0.161) which is indicative of the absence of multicollinearity. To ascertain the reliability of the data, one-way analysis of variance (ANOVA) was generated and presented in Table 9. At 0.05 degree of freedom (df), with a computed p-value of 0.000 as observed in Table 9, the null hypothesis was accepted and it was concluded that the data are good and can be employed for further analysis. The null and alternate hypotheses were tested at 90%, 95% and 99% confidence interval (i.e. 0.1, 0.05 and 0.01) df as shown in Figure 1. The gas presure data is seen to fluctuated around the zero-center line of the residual mass curve in Figure 1, an indication that the data were statistically homogeneous. A further test of homogeneity was done using the homogeneity statistics to check the strength of the null hypothesis over the alternate hypothesis. Based on the result obtained, the null hypothesis (H 0 ) was accepted, and we concluded that the gas pressure data were statistically homogeneous at 90%, 95% and 99% confidence interval.
The homogeneity test hypothesis for ambient temperature is as follows: H 0 : Data are statistically homogeneous. H 1 : Data are not homogeneous.

Figure 2: Homogeneity test of ambient temperature data
The ambient temperature data ( Figure 2) fluctuated around the zero-center line of the residual mass curve, an indication that the data were statistically homogeneous. The homogeneity statistics was used to check the strength of the null hypothesis over the alternate hypothesis. Based on the result obtained, the null hypothesis (H 0 ) was accepted, and it was concluded that the ambient temperature data were statistically homogeneous at 90%, 95% and 99% confidence interval.
The hypothesis of homogeneity test of compressed temperature difference data is: H 0 : Data are statistically homogeneous. H 1 : Data are not homogeneous.

Figure 3: Homogeneity test of compresses temperature difference data
From Figure 3, the compressed temperature difference data fluctuated around the zero-center line of the residual mass curve, an indication that the data were statistically homogeneous. The homogeneity statistics was used to check the strength of the null hypothesis over the alternate hypothesis. Based on the result obtained, the null hypothesis (H 0 ) was accepted, and it was concluded that the compressed temperature difference data were statistically homogeneous at 90%, 95% and 99% confidence interval.
The hypothesis of homogeneity test of power generation data is: H 0 : Data are statistically homogeneous. H 1 : Data are not homogeneous.
The null and alternate hypothesis were tested at 90%, 95% and 99% confidence interval (i.e. 0.1, 0.05 and 0.01) df as shown in Figure 4. showed that the power generation data fluctuated around the zero-center line of the residual mass curve, an indication that the data are statistically homogeneous. The homogeneity statistics was used to check the strength of the null hypothesis over the alternate hypothesis. Based on the result obtained, the null hypothesis (H 0 ) was accepted, and it was concluded that the power generation data were statistically homogeneous at 90%, 95% and 99% confidence interval.

Normality test results
The normality test was done for the one independent and three dependent variables using the JB test for normality statistical software. Figure 5 shows the results of the normality test of gas pressure.  Figure 5 indicate that the gas pressure data is not normally distributed. For normality, the skewness coefficient should not be greater than 1 and the kurtosis should not be greater than 3 (Bai and Ng, 2005). JB value of 93752.18 and a probability (p-value) of 0.00% observed in Figure 5 indicates that the gas pressure data is not normally distributed. JB value >10 means that the null hypothesis is rejected at that level of significance (Das and Imon, 2016), meaning, the data did not come from a normal distribution. Since the JB test value is greater than 10 and the (p-value) is less than the 5% significant value, the null hypothesis was rejected and it was concluded that the data is not from a normal distribution.
The normality test result of ambient temperature is shown in Figure 6.  Figure 6 indicate that the ambient temperature data is not normally distributed. JB value of 16603284 and a probability (p-value) of 0.00% observed in Figure 6 indicates that the ambient temperature data is not normally distributed. Since the JB test value is greater than 10 and the p-value is less than the 5% significant value, the null hypothesis was rejected and it was concluded that the data is not from a normal distribution. Figure 7 shows result of the normality test of compressed temperature difference.

Figure 7: Normality test of compressed temperature difference data
A skewness coefficient of -5.370987 shows that the data is negatively skewed an indication that the data is not normally distributed. Kurtosis value of 59.20195 observed in Figure 7 is also an indication that the data is not from a normal population distribution. JB value of 153198.2 and a probability (pvalue) of 0.00% also indicated that the compressed temperature difference data was not normally distributed. Since the JB test value is greater than 10 and the p-value is less than the 5% significant value, the null hypothesis was rejected and it was concluded that the data is not from a normal distribution. Figure 8 shows result of the normality test of power generation data.

Figure 8: Normality test of power generation data
A skewness coefficient of -0.243075 shows that the data is negatively skewed an indication that the data is not normally distributed. Kurtosis value of 5.118130 observed in Figure 8 is also an indication that the data is not from a normal population distribution. JB value of 220.9885 and a probability (pvalue) of 0.00% as observed in Figure 8 also indicates that the power generation data is not normally distributed Since the JB test value is greater than 10 and the p-value is less than the 5% significant value, the null hypothesis was rejected and it was concluded that the data is not from a normal distribution.

Results of the Diagnostic Analysis of Data
The diagnostic statistical analyses done in this study include: Heteroskedasticity test using Breusch-Pagan Godfrey; Serial Correlation test using Breusch Godfrey; and Variance Inflation Factor (VIF).

Heteroskedasticity test
Result of heteroskedasticity test using Breusch-Pagan Godfrey method showed that (i) the calculated (p-value) based on the F-statistics is 0.0000; (ii) the calculated p-value based on Lagrange multiplier (LM) is 0.0000. Since the computed p-value based on F-statistics and Lagrange multiplier is less than 0.05 (P < 0.05), we rejected the null hypothesis of homoskedasticity and conclude that there is no heteroskedasticity in the data (Astivia and Zumbo, 2019).

Serial correlation test result
The result of serial correlation LM test using Breusch Godfrey method indicated that (i) the calculated p-value based on the F-statistics is 0.0000; and (ii) the calculated p-value based on LM is 0.0000.
Since the computed p-value based on F-statistics and LM is less than 0.05 (P < 0.05), we rejected the null hypothesis of serial correlation and concluded that there is the presence of serial correlation in the data.

The Variance Inflation Factor (VIF) result
The result of the calculated VIF for the selected variables was observed to be less than 10. Since the computed variance inflation factors (centred VIF) for the selected independent variables were less than 10, it was concluded that the variables were well correlated with the dependent variable, hence absence of multicollinearity (Montgomery, 2005). The Output of regression analysis is presented in Table 10. Finally, the reliance of the dependent variable on the selected independent variables was evaluated using the coded least square regression equation as shown in Eqn. (5).
3.4.5. Log function method The ANOVA analysis in Table 12 indicated that the log function model is significant at 0.05df. Using the unstandardized coefficients, the log function equation was developed as: The summary of the developed mathematical models are shown in Table 13, and can be used to predict the power generation.

Conclusions
The relationship between the output of gas power plant and the quantitative variables were not linearly related, but could be best described by a non-linear regression model. The study provided a veritable and laudable process to systematically identify factors that are capable of influencing the generation of power in gas power plants. One dependent (power generation) and three independent variables (Gas Pressure, Ambient Temperature, Compressed Temperature) were used for this analysis. The three critical numeric variables were observed to play a key role in assessing the relationship between input and output parameters in gas power plant. The study also showed that the ambient temperature and compressed temperature difference had very strong influence on the dependent variable when compared to the gas pressure variable.