Intelligent Modeling Approach to Predict Effluent Quality of Wastewater Treatment Process Intelligent Modeling Approach to Predict Effluent Quality of Wastewater Treatment Process

Monitoring of effluent quality remains a challenge to the wastewater treatment process (WWTP). In order to provide a reliable tool for the online monitoring of effluent quality, an intelligent modeling approach, which consists of online sensors and an effluent quality predicting plant, is developed to predict effluent quality in this chapter. The intelligent modeling approach, based on a self-organizing fuzzy neural network (SOFNN), is able to enhance the modeling performance by organizing the structure and adjusting the parameters simultaneously. The experimental studies of intelligent modeling approach have been performed on several systems to verify the effectiveness. The comparison with other existing methods has been made and demonstrated that the intelligent modeling approach is of better performance.


Introduction
In recent years, due to the increasingly severe situation of the wastewater treatment, more and more stringent wastewater effluent limits and regulations have been implemented to reduce the negative impact to the water bodies and the environment [1][2][3]. Therefore, it is important and desirable to predict the effluent quality in real time, since infrequent and inaccurate measurements of the effluent parameters may lead to poor system performances, large operational cost and wrong management decisions [4][5][6]. How to design the predictor, which can conduct an appropriate action to realize the accurate monitor and adjust to the dynamic operational stations, is still a challenging work [4,7].
Conventionally, the measurement of the effluent quality indices can be performed by off-line or online instruments [8,9]. However, the measurement time of the off-line or online measurement is long, for it requires several minutes to hours [10,11]. The dynamic conditions in biological treatment processes such as the complex activated sludge process make the measurement challenging [12]. Therefore, prediction modeling method based on online sensors causes great attention. Wen et al. used an equation, derived from the material balance, to calculate the suspended solid concentration, and then employed to predict the treatment results through the sludge [13]. Yu et al. proposed two mechanism models, which were based on linear regression analyses of experimental results from two anaerobic filters, to predict the effect of recirculation on effluent quality of anaerobic filters [14]. The prediction ability was verified by several experiments, and superior results were realized. Bhowmick et al. presented a mathematical model based on the dynamic wave method, to simulate the effluent quality of the treatment system [15]. The abovementioned methods have realized the online prediction of the effluent quality. However, considering the complexity and nonlinearity of WWTP, it is reasonable to design the adaptive prediction model to improve the accuracy of the online prediction.
To improve the adaptive ability of the online prediction model, intelligent method, based on data-driven approach, has caused extensive concern [16,17]. Zhao et al. presented a partial least-squares-based extreme learning machine to enhance the estimate performance in terms of accuracy and reliability for effluent quality indices [18]. The experimental results showed that the proposed prediction model could effectively capture the input-output relationship with favorable performance. Pai et al. applied five types of gray models to predict suspended solids, chemical oxygen demand and pH in the effluent from a wastewater treatment plant [19]. The results revealed that the gray models could predict the industrial effluent variation successfully. To improve the model accuracy, Perendeci et al. used a neural fuzzy model, based on an adaptive network-based fuzzy inference system, to estimate the effluent chemical oxygen demand by the related process variables [20]. Acceptable correlation coefficient (0.8354) and root mean square error (0.1247) were found between estimated and measured values of the system output variable, effluent chemical oxygen demand. However, considering the dynamic properties of WWTP, it is difficult to determine the reasonable fuzzy rules in this adaptive network-based fuzzy inference system. Aimed at this problem, Han et al. designed a flexible structure radial basis function neural network (FS-RBFNN) and applied it to estimate the water quality [21]. This FS-RBFNN could vary its structure dynamically in order to maintain the prediction accuracy, but it had poor interpretability.
Considering the learning ability of neural network and the interpretability of rule-based fuzzy systems, an intelligent method, based on self-organizing fuzzy neural network (SOFNN), is developed to realize the online prediction of the effluent indices. The main advantages of this prediction model are summarized as follows. First, an efficient secondorder algorithm is designed to adjust the parameters of SOFNN, which enables to improve the learning capability. Second, the structure of SOFNN can be self-organized based on the relative importance index of each rule in the learning process. The fuzzy rules can be generated or pruned automatically to reduce the computational complexity and improve the generalization power of SOFNN.

Wastewater treatment process
WWTP is a large nonlinear system subject to large perturbations in influent flow rate and pollutant load, together with uncertainties concerning the composition of the incoming wastewater. It is also a complex reaction process, which contains biological, physical and chemical reactions. The most popular technology for wastewater treatment is the activated sludge process (ASP). The simplified flow chart of ASP is shown in Figure 1, where a primary sedimentation tank, a biochemical reaction tank and a secondary sedimentation tank are consisted. First of all, the dynamically changing influent flows into the primary sedimentation tank to remove the suspended solids. Then, the wastewater gets further processed in the biochemical reaction unit. In this unit, nitrification and denitrification are composed to achieve biological nitrogen removal. After that, the standard wastewater is discharged from the top of the secondary sedimentation tank, and the sludge is returned to the biochemical reaction unit from the bottom of the secondary sedimentation tank. During the reaction process, numerous process variables are contained to influence the treatment performance.
Effluent quality, taken as an important performance evaluation to reflect the treatment results, can provide a basis for water treatment plant management decisions to minimize the microbial risks and optimize the treatment operation. Standard effluent quality requires that the effluent organisms, such as effluent ammonia nitrogen, effluent total nitrogen and effluent suspended solid, remain in the required limits. Although the effluent quality indices can be measured directly by laboratory analysis, a significant time delay problem, which may range from a matter of minutes to a few days, is always unavoidable. This lack of suitable real-time process variable information limits the effective operation of effluent quality. Therefore, an online prediction model is essential to support water quality parameters. Since an approach based on neural networks does not make any assumptions about the functional relationship between the dependent and independent variables, it is suitable for capturing functional relationships between bacterial levels and other variables.

Intelligent modeling approach based on SOFNN
An intelligent method based on SOFNN is proposed to predict the effluent ammonia nitrogen (S NH ) in urban WWTP. The main challenges are the selection of the principal process variables, the construction of the model structure and the adjustment of the model parameters.

Selection of principal process variables
To determine the principal process variables of the effluent S NH , the mechanism analysis is firstly applied to determine the related process variables, and then principal component analysis (PCA) is introduced to lower the dimension of the original process variables. This method has the advantage of extracting the important information from the coupling process variables and reducing the computational complexity of prediction models.
For the effluent S NH , the mechanism models are described as: where where Y A is the autotrophic bacteria yield coefficient of chemical oxygen demand, i N,BM , i N,S1 , i N,XS and i N,X1 are the parameters of nitrogen content, f s1 is the proportion of inert chemical oxygen demand in granular matrix, f X1 is the proportion of inert chemical oxygen demand in oxide, K h is the water solubility rate function, μ AUT is the maximum growth rate, X S is the slowly biodegradable substrate, K NO3 is the subsaturation coefficient of nitrate, K NH4 is the autotrophic bacteria subsaturation coefficient of nitrogen, K O2 is the heterotrophic bacteria subsaturation coefficient of oxygen, K NO3 is the heterotrophic bacteria subsaturation coefficient of nitrate, K S is the heterotrophic bacteria subsaturation coefficient of COD, K P is the phosphorus storage saturation coefficient, X P is the particulate products arising from biomass decay, X H is the water solubility, b AUT is the decay rate, K ALK is the growth factor of alkalinity, S O2 is the dissolved oxygen, S NO3 is the nitrate, S PO4 is the total phosphorus, S ALK is the alkalinity and X AUT is the autotrophic concentration.
According to the mechanism models in Eqs. (1)- (8), it can be concluded that the related process variables to the effluent SNH are S NO3 , X S , S O2 , S PO4 , S ALK , X AUT and X H . Combining with the real data collected from urban WWTP, oxidation-reduction potential (ORP), total suspended solids (TSS), temperature (T), PH, influent ammonia nitrogen (S NH,i ) and effluent nitrate nitrogen (S NO,e ) are also considered as the influencing variables of the effluent S NH . Then, PCA is utilized to select the principal variables from the 13 related variables.
For reducing the dimension of the process variables, the first important thing is to remove the abnormal data according to the standard deviation calculation formula where σ i is the standard error and ū i is the average value of the ith column sample data; the error between the sample and the average value is shown as it is considered as abnormal data and then, it is removed. Due to the fact that the 13 columns of process variables have different magnitudes, data normalization processing should be conducted where u inorm is the value after normalization and u imin and u imax are the minimum and maximum of the ith column sample data, respectively. After the normalization treatment, all the sample data are within [0, 1]. It is worth noting that the testing outputs should be antinormalized to the original ranges.
Then, the covariance matrix S is calculated and decomposed according to their singular values into matrices V and Λ S ¼ where r m,m is the correlation coefficient and Λ is a diagonal matrix of the eigenvalues associated with the eigenvectors contained in the columns of matrix V. The contribution rate of each component is calculated by Λ, the principal component factor loading matrix P is then calculated according to Λ and V. The projected matrix T in the new space is defined as where matrix E is used to detect misbehavior in the modeling process.

Self-organizing fuzzy neural network
To predict the effluent S NH through the principal process variables, a multi-input and singleoutput SOFNN is developed. The structure of the fuzzy neural network is shown in Figure 2.
The mathematical description of this multi-input and single-output fuzzy neural network is given below: where ĝ is the output of the output layer, W = [w 1 , w 2 ,…, w P ] are the weights between the output layer and the normalized layer, P is the number of neurons in the normalized layer and v is the output of the normalized layer and for a fuzzy model where v l is the output of the lth normalized neuron and v = [v 1 , v 2 ,…,v P ] T and The number of neurons in the radial basis function (RBF) layer is equal to the number of neurons in the normalized layer, and ϕ j is the output value of the jth RBF neuron c j = [c 1j ,c 2j ,…,c kj ] and σ j = [σ 1j , σ 2j ,…,σ kj ] are the vectors of centers and widths of the jth RBF neuron, respectively, and where x = [x 1 ,x 2 ,…,x k ] is the input vector of the input layer and U = [u 1 ,u 2 ,…,u k ] is the input of the RBF layer.
Following the computation procedure in the Levenberg-Marquardt algorithm, the updated rule of the adaptive second-order algorithm for the parameters in fuzzy neural network is given by where Ψ(t) is the quasi-Hessian matrix, Ω(t) is the gradient vector, I is the identity matrix which is employed to avoid the ill condition in solving inverse matrix and λ(t) is the adaptive learning rate defined as: where τ max (t) and τ min (t) are the maximum and minimum eigenvalues of Ψ(t), respectively, (0 < τ min (t) < τ max (t), 0 < λ(t) < 1,) and the variable vector Θ(t) contains three kinds of variables: the output parameter matrix W, the center vector c and the width vector σ In this adaptive second-order optimization algorithm, the output parameter matrix W, the center vector c and the width vector σ can be optimized simultaneously. The quasi-Hessian matrix Ψ(t) and the gradient vector Ω(t) are accumulated as the sum of related submatrices and vectors.
where e(t) is the error between the output layer and the real output at time t, and the Jacobian vector j(t) is calculated as: The elements of the Jacobian vector j(t) are given as: With Eqs. (28)-(32), all the elements of the Jacobian vector j(t) can be calculated. Then, the quasi-Hessian matrix Ψ(t) and the gradient vector Ω(t) are obtained from Eqs. (24)- (25), so as to apply the updated rule (20) to parameter adjustment. From the former analysis, some remarks are emphasized.
To grow or prune the structure of the fuzzy neural network, relative importance index is utilized. The values of relative importance index can be used to determine the proportion of output values in a multiple regression equation. The relative importance index of each neuron in the normalized layer is defined as: where R k(t) is the relative importance index of the kth normalized neuron at time t; the regression coefficients B(t) = [q 1(t), b 2(t),…, b P(t)] T and A(t) = [a 1(t), a 2(t),…, a P(t)] (a l = [a 1 l(t),…, a Pl(t)] T ) can be calculated as: where is the eigenvectors of (Î(t)) T Î(t), Δ(t) is the singular matrix of Î(t), Î l(t) = [w l(t)Âv l (x(t)), w l(t)Âv l (x(tÀ1)),…, w l(t)Â v l (x(t-T + 1))] T and T is the preset number of sample. The relative importance index of each normalized neuron represents the contribution of each normalized neuron to each output neuron.
Before introducing the self-organizing mechanism, the error of the output is defined as: where y(t) and ĝ q t) are the desired and real output values.
The procedure of the proposed self-organizing mechanism is given as follows: 1. Growing phase.
If E(Θ(t)) is larger than E(Θ(tÀ1)), a new neuron will be inserted to the normalized layer. The parameters of the new normalized neuron are designated by the normalized neuron with the largest relative importance index , R m (t) is the mth normalized neuron with the largest relative importance index. The parameters of new normalized neuron are designed as: where c new (t) and σ new (t) are the center vector and width vector of the new normalized neuron, respectively, w new (t) is the weight of new normalized neuron, c m (t) and σ m (t) are the center vector and width vector of the mth normalized neuron, respectively.

Pruning phase.
In the training process, if E(Θ(t)) is less than E(Θ(tÀ1)) and Then, the hth normalized neuron will be pruned and the parameters of remaining normalized neuron will be updated where the h'th normalized neuron is nearest to the hth normalized neuron with the smallest Euclidean distance, w h 0 (t) and w h' (t) are the hth weight vector and the h'th weight vector after pruning the hth normalized neuron, respectively, c h 0 (t) and σ h 0 (t) are the center vector and width vector of the hth normalized neuron after the neuron is pruned, respectively, and c 0 h' (t) and σ 0 h' (t) are the center vector and width vector of the h'th normalized neuron after the neuron is pruned, respectively.

Simulation results and analysis
In this section, the effectiveness of the proposed intelligent modeling method based on SOFNN is evaluated. A brief introduction to experimental setup is provided before the experimental results are detailed.

Experimental setup
The performance of the online prediction for the effluent S NH depends heavily on the determination of the input variables. Based on the analysis of PCA and the work experience of the experts in urban WWTP, five process variables have been chosen as the input variables to develop the intelligent method: S PO4 , ORP, S O2 , TSS and PH, respectively. S PO4 is an important index of the effluent, ORP reflects the concentration of oxide, S O2 is an important indicator to the growth of organic matter and the nitrification reaction, TSS stands for the degree of wastewater treatment and PH stands for the acid-base property of the wastewater. The input variables determined for the effluent S NH are listed in Table 1. The detailed selection process and analyzation process are shown in [22]. Meanwhile, the online measurement instruments used for obtaining the process values are also displayed.
CHM-301 is the S PO4 detector, AODJ-QX6530 is the portable ORP probe, WTW oxi/340i is the portable S O2 probe, 7110 MTF-FG is the TSS analyzer and pH 700 is the PH detector.
Taking advantage of the abovementioned analysis, an experimental hardware is set up. Anaerobic-anoxic-oxic (A 2 /O) treatment process with the online sensors is employed in urban WWTP (shown in Figure 3). In this experimental hardware, online sensors, effluent S NH models based on fuzzy neural network are schematically shown.
The online sensors consist of five parts: TP detector, ORP probe, S O probe, TSS analyzer and PH detector. The output signals from the sensors are integrated and connected to programmable logical controller (PLC, S7-200) for transmitting primary indictors. The PLC system is interfaced with equipped sensors and collected reliable data in form of 4-20mA electrical signals with a fast response time. Moreover, the PLC system has been connected through a serial port (RS 232, Siemens AG) of the host computer, which uses the real-time data to calculate the values of key variables and also stores the data in form of local file. The sensors are operated in continuous/online measurement mode, and the historical process data are routinely acquired and stored in the data acquisition system. The process data are periodically collected from the reactor to check whether the system is operating as scheduled during the experiments. Then, after preprocessing, the data are applied to the proposed SOFNN method. In SOFNN, five neurons are determined in the input layer based on the analyzed related process variables S PO4 , ORP, S O2 , TSS and PH. According to the experienced experts, there are 10 neurons in both RBF layer and normalized layer initially, and then the neurons in normalized layer are self-organized based on the relative importance index to guarantee the prediction accuracy. The number of output neuron is one, which represents the predicted effluent S NH .

Experimental results
An intelligent modeling method based on the proposed SOFNN is proposed to predict the effluent S NH concentration by the determined principal process variables. All data are collected on a daily basis and covered all four seasons. The daily frequency of measurements is considered sufficient because of the long residence times in WWTP. To guarantee the efficiency in this soft-computing method, all variables are normalized and denormalized by taking advantage of the maximum and minimum values before and after application. The input-output water quality data were collected from a real-world wastewater treatment plant (Beijing, China) over the year 2014. After deleting the abnormal data, 280 samples were obtained and normalized; 140 samples from 1/5/2014 to 30/9/2014 were taken as the training data while the remaining 140 samples from 1/10/2014 to 30/11/2014 were employed as testing data.
The error measures for the effluent NH 4 are 0.1 mg/L confidence limits. Both the mean testing RMSE ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P N n¼1 y n t ð Þ À b g n t ð Þ À Á 2 =N s and the mean predicting accuracy P Days t¼1 1 À e t ð Þ=b g t ð Þ ð Þ =Days ! are utilized as the performance indices to assess the modeling performance, where N is the number of samples. The predicting results and the predicting error of the effluent S NH concentration are shown in Figures 4-6. Additionally, to show the performance of SOFNN clearly, Table 2 shows the network structure, the mean testing RMSE and the mean accuracy in comparison with other methods.
The prediction results of the effluent S NH based on SOFNN are displayed in Figures 4-6. The training RMSE of the effluent S NH is shown in Figure 4; it can be observed that the final value can reach 0.02. In Figure 5, the predicted results are displayed, both the SOFNN outputs and real outputs. The predicted outputs based on SOFNN can approximate the real outputs with little errors. Meanwhile, the errors are displayed in Figure 6, which remain in the range of   AE0.3. From this figure, it can be observed that the proposed adaptive fuzzy neural network has the superior prediction ability by using S PO4 , ORP, S O2 , TSS and PH as the inputs.
In addition, the results of SOFNN are also compared with other modeling methods, SOFNN with fixed learning rate, the self-organizing fuzzy neural network with adaptive computation algorithm (SOFNN-ACA) [23], fast and accurate online self-organizing fuzzy neural network (FAOS-PFNN) [24], growing-and-pruning fuzzy neural network (GP-FNN) [25] and the mathematic model [12]. Table 2 indicates that the proposed SOFNN can achieve with compact structure than other compared methods, the number of the final normalized neurons is 13. Higher mean accuracy is acquired by this proposed SOFNN with adaptive learning rate (mean accuracy value is 97.94%), which is higher than the proposed SOFNN-ACA [23], FAOS-PFNN [24], GP-FNN [25] and the mathematic model [12]. This means that this proposed SOFNN with

Conclusion
In this chapter, an intelligent method is designed to realize the online prediction of the effluent S NH . Based on SOFNN, the proposed model could capture the correlation between the effluent S NH and the principal process variables and construct the modeling structure automatically. The effectiveness of the proposed intelligent modeling method is evaluated in a WWTP. Experimental simulations and results analysis are provided to show the superior prediction performance.