The Novel Applications of Deep Reservoir Computing in Cyber-Security and Wireless Communication

This chapter introduces the novel applications of deep reservoir computing (RC) systems in cyber-security and wireless communication. The RC systems are a new class of recurrent neural networks (RNNs). Traditional RNNs are very challenging to train due to vanishing/exploding gradients. However, the RC systems are easier to train and have shown similar or even better performances compared with traditional RNNs. It is very essential to study the spatio-temporal correlations in cyber-security and wireless communication domains. Therefore, RC models are good choices to explore the spatio-temporal correlations. In this chapter, we explore the applications and performance of delayed feedback reservoirs (DFRs), and echo state networks (ESNs) in the cyber-security of smart grids and symbol detection in MIMO-OFDM systems, respectively. DFRs and ESNs are two different types of RC models. We also introduce the spiking structure of DFRs as spiking artificial neural networks are more energy efficient and biologically plausible as well.


Introduction
Smart grids are a new generation of power grids, which provide more intelligent and efficient power transmission and distribution.However, the smart grids are vulnerable to security challenges unless properly protected.False data injection (FDI) attacks are the first and most common type of attacks in smart grids.Two major types of FDI attacks are known in smart grids.These two major types are single-period or opportunistic and multi-period or dynamic attack, respectively.In single-period attack, the adversary waits until it finds the opportunity to launch the attack instantaneously.On the other hand, in dynamic attacks, the adversary launches the attack gradually and through time toward its desired state.The singleperiod attacks are widely studied in the literature and they are more easily detected by the supervisory control and data acquisition (SCADA).In this chapter, we focus to study the multi-period or dynamic attacks [1][2][3][4][5].
State vector estimation (SVE) is the first technique to tackle the FDI detection in smart grids.However, SVE fails to detect stealth FDI attacks with low magnitudes.
In recent years, both supervised and unsupervised machine learning (ML) approaches have been proposed to study FDI detection in smart grids.Generally, ML-based techniques have shown better performances than SVE.However, the ML techniques that have been proposed so far are not capable to capture the rich spatio-temporal correlations that exist between different components of smart grids.Therefore, in this chapter, we introduce spiking delayed feedback reservoirs (DFRs) to tackle the FDI detection problem in smart grids as they are very energy efficient and also can capture the spatio-temporal correlations between different components of smart grids.DFRs are an energy efficient class of reservoir computing systems [6][7][8].
Figure 1 demonstrates the structure of a reservoir computing (RC) system.As it can be seen, there are three layers in RC systems.They are the input, reservoir, and output layer, respectively.The architecture of RC systems is based on recurrent neural networks (RNNs).However, unlike the RNNs, the weights of the hidden (reservoir) layer are fixed and do not go through a training.The reservoir weights have to be initialized such that the echo state property is satisfied.Echo state property implies that in order to form a memory, the largest eigenvalue of the reservoir weights has to be less than 1.The largest eigenvalue of the reservoir layer's weights is a design parameter and plays an important role in the performance of the RC systems.DFRs, echo state networks (ESNs), and liquid state machines (LSMs) are three different categories of RC systems.The strength of RNNs is employed as the reservoir or liquid states.In the reservoirs or liquid states, the weights of synaptic connections are fixed and do not require any training.The output weights are the only sets of weights that require training in RC models.This results in reducing the computational complexity of RC models compared to traditional RNNs [9][10][11][12].
Equation ( 1) expresses the states of reservoir nodes, st ðÞ¼fW res res :stÀ where st ðÞis the state of reservoir node at time t; xtÀ 1 ðÞ corresponds to the input signal at time t À 1; W res res and W res in correspond to the weights of randomly generated reservoir and input connection, respectively; and ŷ represents the estimated output that can be expressed in terms of input and weight connections, where W out res are the output weights of the neurons that form the reservoir layer; W out in correspond to the feedback weights from output layer to reservoir layer; and Intelligent System and Computing W out bias is the set of weights for bias values training.The process of nonlinear mapping is accomplished by the neurons in the reservoir layer.The neurons in the reservoir layer own two major properties: (1) high dimensionality and (2) forming a short term memory that spatio-temporal patterns can be memorized.Several studies have shown that these two properties are satisfied only if the neurons at the reservoir layer operate at the edge of chaos.Satisfying the echo states property, is the key to make the reservoir neurons work at the edge of chaos.The lower computational complexity and the flexible reservoir implementation of RC models make them very suitable for unconventional computing paradigms applications.
The DFR is a ring topology of RC systems, where a single artificial neuron and a delay loop together form the reservoir layer.There are multiple choices available for the single artificial neuron of the DFR.In this chapter, we introduce spiking neurons as the nonlinear single neuron of the DFR.Spiking neurons are one of the several mathematical models that are introduced to model the biological neurons.Spikes are the main signals that the neurons of the brain use for communication.Hence, the mathematical representation of the biological neurons as spikes tends to be more biological plausible.Energy efficiency is another motivation to use the spiking neurons.TrueNorth chip consumes only 70 milliWatts (mW) to run 1 million spiking neurons with 256 million synapses [13][14][15].The energy efficiency of spiking neural networks (SNNs) makes them a suitable choice for hardware implementations of artificial neurons as well [16,17].
So far, several models for spiking neurons including leaky-integrate-and-fire (LIF) and the Hodgkin-Huxley have been proposed to mimic the behavior of our brains' neurons [18].The LIF models of spiking neurons have been used more commonly than other spiking artificial models of neurons due to their simplicity and ease of hardware implementation [19,20].The spiking neurons fire a spike as soon as a stimulating current is applied on their membrane, which makes the voltage of the membrane exceeds a certain threshold value.The relationship between the stimulating current and the voltage of membrane is expressed as follows: where V m is the membrane voltage; τ m ¼ R m C m corresponds to the neuron's time constant; C m and R m are the capacitance and the resistance of the membrane, respectively; E represents the resting voltage; I noise is noise current; and I s is stimulus current [21].We set R m to 1 mega ohms and C m = 10 nano Farads (nF).
In Figure 2, the topology of our proposed spiking DFR is demonstrated.There are multiple blocks in this structure.The input block is where the smart grids' measurements are received.These measurements have to be first encoded before getting processed by DFR.There are two major types of encoding schemes for spiking neurons, namely rate encoding and temporal encoding [22].Rate encoding has been vastly studied in the literature.However, recent studies have shown that temporal encoding schemes are more efficient and are superior to rate encoding schemes.The exact time that spike fires is used for temporal encoding of spikes.However, in rate encoding schemes, the number of the spikes that are fired by the neuron is used to encode the stimulus.
It has been shown in several experiments that temporal encoding is more likely to be the encoding scheme, which is leveraged by biological neurons.The neurons in the lateral geniculate nucleus, retina, and the visual cortex respond to the stimuli with milliseconds (ms) precision.The computational complexity of temporal encoding schemes has also made them superior to rate encoding approaches [23].Therefore, in this chapter, we focus on temporal encoding schemes.
After the smart grids' measurements are encoded, the encoded data is then converted to the analog current.This current is next fed in to the nonlinear node, which in our case, is a LIF neuron.For each current signal, its corresponding spike train is generated by the LIF neuron, and this spike train goes through a delay loop.The delay loop along with the LIF neuron forms the reservoir layer of DFR.We repeat this process as long as the corresponding reservoir states of each smart grid's measurements are generated.The interspike intervals (ISI) of each spike trains are used as the training feature of the readout layer [24].In this chapter, a multi-layer perceptron (MLP) is used as the readout layer.The features extracted in the reservoir layer are used for training the MLP layer.For each class of data, i.e., compromised and uncompromised, a proper label is assigned.We consider 1 as the label of compromised samples, and 0 for uncompromised samples.
Equation ( 4) expresses the governing equation for DFR, where F is a differentiable nonlinear function; τ is the delay loop, which is a hyperparameter that requires tuning; xt ðÞcorresponds to the reservoirs states of DFR; and It ðÞis the input stimulus current signal along with a masking scheme.The total delay time, τ, is divided into N equidistant delay units within the delay loop.Dividing the total delay into N equidistant delay units is expressed as follows: where θ represents the time interval between reservoir virtual nodes.Unlike the conventional RC model, the number of nonlinear nodes of DFR is drastically reduced, due to the ring topology of DFR.The weights of the output MLP layer are the only weights that undergo the training process [16].
DFRs have drawn a lot of attentions due to their capability to map the data from low dimensional space to high dimensional space.As it can be seen in Figure 3,by mapping the data from low dimensional space to high dimensional space, the nonlinearly separable data becomes linearly separable.The chaos theory through Lyapunov analysis has shown that delay systems can show high dimensional behavior if the delay value is tuned properly such that the delay system operates at the edge of chaos.The Lyapunov dimension of a delay chaos system directly is Intelligent System and Computing determined by to the delay value [25].In this chapter, we will examine the effect of delay value on the performance of DFR while detecting the dynamic hidden attacks in smart grids.
In this chapter, we will also look at symbol detection in multiple-input multipleoutput orthogonal frequency division multiplexing (MIMO-OFDM) systems.In wireless communication systems, multicarrier access techniques are realized through OFDM.In fact, frequency-selective fading channels are converted to multiple flat-fading subchannels [26][27][28].Spectral efficiency, transceiver structure, channel capacity, and robustness against interference are all improved as a result of applying OFDM in wireless communication systems [29][30][31][32][33]. MIMO systems are also extensively leveraged in different wireless communication systems including HSPA+(3G), WiMAx(4G), and long term evolution (4G LTE).By using MIMO systems, the capacity of wireless link is improved through the transmission of symbols on multiple paths.The system which is realized through the combination of MIMO and OFDM systems is called a MIMO-OFDM system [34][35][36][37][38].A MIMO-OFDM system has shown to be very effective in utilizing the benefits of both MIMO and OFDM systems.
In order to detect the transmitted symbols accurately at the receiver (Rx), it is very essential to estimate the wireless channel state information (CSI) precisely [39][40][41].CSI estimation is one of the major challenges of MIMO-OFDM systems.There are generally two major approaches for CSI estimation.The first approach leverages blind channel estimation to obtain the statistical properties of the channel [42].The second category of CSI estimation techniques is based on training the symbols sent by transmitter (Tx) and received by (Rx) [29,43,44].Training-based CSI estimation techniques have been adopted in many advanced communication systems including 3GPP LTE/LTE-Advanced.In the former category of CSI estimation techniques, no computational overhead is inferred, but they are good only for the channels that are varying very slowly with respect to time [45].The latter category, i.e., training-based category can be applied for any channel regardless of their statistical properties.Therefore, the learning-based techniques including artificial neural networks have been vastly studied in literature [46][47][48] as the wireless channel estimation mechanism.RNNs have also been studied in [49][50][51][52] for CSI estimation and symbol detection.Due to the difficulties of training, the conventional RNNs, we introduce echo state networks (ESN) for symbol detection and CSI estimation in MIMO-OFDM wireless communication systems.

Problem formulation of smart grids attack detection
The state and topology of smart grids are the two major targets that are manipulated by the adversaries [53].The state of the smart grids is the key factor in determining the measurements values.A linear function H and the environment noise are the other two factors that determine the measurements values.
where z is the measurement vector that represents the real parts of the line flows and bus injections; H is a linear function; x is the state vector, and n is the environment noise [53].Equation ( 6) can be written as follows in case the meters are compromised by an adversary, where a is the attack vector.The attack represented in Eq. ( 7) is an observable attack.The attack can also be hidden by the attacker.In this chapter, we consider the attacks as hidden dynamic attacks.The hidden attack is defined as a ¼ Hc, and Eq. ( 6) is reformulated as follows, where c is the desired state of the adversary, where the attacker wants to drift the normal state of the smart grid toward its desired state by hiding it in the H matrix. Hidden attacks are more challenging to be detected.The adversaries launch dynamic attacks such that the state of the smart grid system is drifted toward their desired state gradually.Dynamic attacks are defined as a function of time as the adversary achieves its desired state gradually and through time.In single-period attacks, the variations of the attacks magnitude are sudden and abrupt, and are more easily detected.The formulation of dynamic attack used in this chapter is as follows: zt ðÞ¼Hx t ðÞþn þ at ðÞ : The dynamic attack at ðÞis time dependent, and we also assume that the adversary has access to H matrix. Thus, the attack can be performed as hidden or unobservable.In hidden attacks, the attack at ðÞcan be expressed as at ðÞ¼Hc t ðÞ , and ct ðÞis defined as follows: where A is the magnitude of attack; cos is cosine function; f c corresponds to the frequency of attack and we set that equal to 1 in this chapter, and N(0,1) is a normally distributed vector in which its mean is zero and its variance is 1.
MATPOWER is a publicly available toolbox [54] that can be used to simulate the smart grids.In this chapter, we use MATPOWER to simulate the meters of a smart grid with 14 buses.There are totally 34 different meters in an IEEE-14 bus smart grid.We assume that the level of the access that the adversary can have to the meters of the system can range from 0 to 34.The level of access is defined as the number of meters that can be compromised by the attacker.In this chapter, the dataset that we use for train, test, and validation is assumed to be unbalanced.
A dataset is called unbalanced when the ratio of compromised and uncompromised samples is not equal.In this chapter, it is assumed that 80% of the samples are uncompromised and 20% are compromised.Totally, 10,000 samples for training and 10,000 samples for test and validation are generated using MATPOWER.

Attack detection performance of DFR
The performance metrics for evaluation are accuracy and F1.Accuracy and F1 are defined as: where Precision ¼ TP TPþFP ; and Recall ¼ TP TPþFN and TP, TN, FP, and FN correspond to the number of true positive, true negative, false positive, and false negative samples, respectively.
Accuracy of attack detection for three different methods and magnitude of attacks, A = 0.1, 1, and 10.
In order to evaluate the performance of our proposed spiking DFR model, we compare our results with a MLP and a SNN.The MLP is trained using backpropagation algorithm and SNN is trained using precise spike driven (PSD) algorithm.In PSD, temporal encoding is leveraged as the encoding scheme.PSD is used to learn the hetero-associations that exist in spatio-temporal spike patterns and is introduced in [21].As it can be seen in Figures 4 and 5, spiking DFR + MLP outperforms both MLP and SNN in terms of accuracy and F1.That is due to the fact that the spiking DFR + MLP is capable to map the data from low dimensional space to high dimensional space, and also captures the spatio-temporal correlation that exists between different components of smart grids.Based on our simulation results, the average accuracy of attack detection is increased up to 94.6% when the combination of spiking neurons, DFR, and MLP is realized in a single platform.This improvement is observed for all different magnitude of attacks and number of compromised measurements.In our baseline model where only SNNs are used, the average accuracy is 77.92%.This improvement implies that the average accuracy is improved about 17% through our introduced hybrid spiking DFR and MLP model.F1 measure shows even more significant improvement brought about.F1 that is achieved through combination of spiking neurons, DFR, and MLP is 78%.However, the F1 which is achieved by SNN and PSD algorithm for dynamic attack detection is about 25%, which means that our introduced model increases the F1 for 53%.

Delay effect on the performance of DFR
As it was mentioned in Section 1, the DFRs cannot show high dimensional behavior unless the delay value is tuned properly that the DFR operates at the edge of chaos.At this part, we show that delay value can significantly affect the performance of DFR for hidden dynamic attack detection on smart grids.Figure 6 demonstrates the performance of DFR for different values of delay.As it can be seen in Figure 6,for delay equal to 40 milliseconds (ms), the performance of spiking DFR + MLP achieves the highest value in terms of F1 and accuracy.However, for delay value equal to 10 ms, the lowest performances are obtained.This observation implies that only for a proper delay value, the spiking DFR + MLP can operate at the edge of chaos and show high dimensional behavior.The phase portrait behavior of DFR with respect to varying the delay time is shown in Figure 7.The dynamic behavior of the delay systems can be tracked through phase portraits and chaotic or periodic behavior of the system can be demonstrated.It is suggested in [25] that if the delay of dynamic system is tuned properly, it can show high dimensional behavior.We also investigate the solution of the delay differential equation (DDE) to further explore the dynamic behaviors of our introduced model.As demonstrated in Figure 7, DDE is leveraged to model the dynamic behavior of nonlinear function while the delay is varying.
Figure 7 shows that varying the delay value can shift the behavior of delay system from periodic to edge of chaos region and completely chaotic.

Complexity analysis
In this section, the complexity of our approach in terms of training time is analyzed.The computational complexity of the introduced spiking DFR + MLP

Effects of different values of the delay on the performance when the A = 1.
Intelligent System and Computing is associated with calculating the state of the reservoir layer, and updating the weights of readout layer during training.In the introduced spiking DFR model, the weights of input and reservoir layers are fixed and do not undergo any training.That is the fact that makes DFRs significantly computationally efficient compared to other types of RNNs.In traditional RNNs, all the hidden layers require to be trained.Due to the training of all hidden layers, the RNNs are very difficult to train.The measure of complexity is equivalent to the total number of floating-point operations (FLOPs).The training time of RC-based learning techniques correspond to the complexity of model as well [55].In order to evaluate the computational complexity of our proposed model, the training time of our model is compared with the baseline approaches, i.e., MLP and SNN.Table 1 presents the training times (complexity) of spiking DFR + MLP, MLP, and SNN.
The SNN which is trained by PSD algorithm shows the highest computational complexity, as it can be seen in Table 1.The spiking DFR + MLP and MLP rank as the second and third computationally complex algorithms, respectively.As it can be seen in Figure 2, there are some building blocks in the spiking DFR + MLP.Therefore, the computational complexity of spiking DFR + MLP is higher than a simple MLP.Temporal encoding, spike to current, and reservoir blocks are the blocks that exist in our introduced model.However, the superiority of our model in terms of performance makes it justified for us to use this model as the attack detection platform in smart grids.Table 1.
Computational complexity analysis.We assume there are N r antennas at Rx; and N t antennas at Tx.The received signal can be expressed as: where n t ðÞis the additive noise; ⊛ stands for the convolution operation; h i t ðÞ∈ ℂ N r Â1 is the channel from the ith Tx antenna to the Rx; and x i t ðÞis the associated transmitted signal, which is defined as: where n is the index of subcarrier; p is the index of time instance; f c is the carrier frequency; sn, p ½ is modulation symbols; f 0 is the frequency space between each subcarrier component; N c is the number of subcarriers; and gt ðÞis the waveform function with finite time support which is usually selected as:

&
The channel model is defined according to the ray-tracing principle where k is the index of channel taps; θ k stands for the angle of arrival (DoA); α k is the associated path gain; and τ k is the delay parameter.

Symbol detection framework
In symbol detection, we aim to estimate sn, p ½ belonging to all transmission antennas and time channel use, where the general framework is shown in Figure 8.For this problem, the interference from different antennas and OFDM symbols need to be canceled out.Rather than estimating the underlying channel information, in our approach, the reservoir computing network RC is applied to y t ðÞto retrieve the transmitted waveform.At the learning stage, the objective is written as: min where L is the loss function.Through learning the output weight of RC, it yields an interference cancellation manner, which can recover the transmitted signals.Meanwhile, this relies on a symbol level synchronization among multiple antennas.Alternatively, the symbol detection can be learned through a decomposed manner.
Following this way, we can rewrite the received signal model ( 14) as: where k is the index of interested user; and the remained terms are treated as the interference to the k th user.Given a user index k, the symbol detection is conducted by learn a RC by solving min The symbol detection requires learning k RCs, correspondingly.The trained RCs generate estimated symbols for each stream independently.
Moreover, an input buffer can be incorporated to further improve the symbol detection performance as proposed in [31].To this end, the input of RC at time t 0 is a batch y t ðÞ ÈÉ t 0 þT t¼t 0 , where T is the length of the buffer.

One layer learning
We consider the special case when the output is only with one layer.According to the dynamic equation of inner states, denoted as s t ðÞ fg T a À1 t¼0 , where T a is the 11 The where X ∈ ℂ NÂT a is the target waveform at transmitter side, in which N denotes the number of streams; and W is the output layer to be learned.Accordingly, the target waveform X can be chosen as the time domain presentation of scattered pilots or comb pilots.For the target of scattered pilots, the i, t ðÞ th entry of X is defined as where Ω p stands for the index of the sub-carriers selected as pilots in the pth OFDM symbol.Specially, for the comb pilots, Ω P is defined as all the subcarriers at a certain OFDM symbol or several subarriers across all OFDM symbols.
For solving the problem (20), W can be calculated once whole batch of training data are collected, which is through the following pseudo-inverse operation or thorough an online version, such as gradient descent or recursive least squares [57].For multiple output layers, it follows the same method as multiple layers feedforward neural networks via the forward backward propagation procedure [58].

Simulation results
In Figure 9, it demonstrates the BER performance of reservoir computing-based symbol detection methods: simple echo state networks (ESN) and echo state networks with windows (WESN) to the conventional methods: linear minimum mean Intelligent System and Computing squared error (LMMSE) and sphere decoding (SD).For the conventional methods, the CSI is obtained by LMMSE channel estimation [59,60].Here, we also consider the impact by PA non-linearity at the transmitter side.When the transmitted signal goes throughout the nonlinear region of PA, the signal suffers strong distortion, which can lead to a poor BER performance.Meanwhile, from this figure, we can observe the learning-based methods perform the best at low SNR regime and nonlinear region.This is because conventional methods rely on accurate CSI, which cannot be obtained in these two cases, while learning-based methods are robust against the model-based methods.

Conclusion
In this chapter, the emerging applications of spiking DFRs and ESNs were explored.We introduced the combination of spiking neurons, DFRs, and MLPs as the main platform to detect FDI attacks in smart grids.Our simulation results showed that spiking DFR + MLP outperforms SNN, and MLP in terms of accuracy and F1, respectively.The combination of DFRs and spiking neurons is capable of mapping the data to high dimensional space and capturing the spatio-temporal correlations, which exist between different components of smart grids.The effect of delay value on the performance of DFR was also studied in this chapter.We showed that DFRs can show high dimensional behaviors only for the delay values that make them operate at the edge of chaos.The computational complexity of our introduced model was also studied.In the use case of ESN for MIMO-OFDM symbol detection, we see this learning-based framework can perform better than conventional channel model-based methods when the obtained channel information is imperfect or model mismatch exists.The cost of learning is very few, i.e., it does not require a large size of pilots, which permits the application of this technique in practical system.

Figure 3 .
Figure 3. High dimensional mapping of data using DFR.

Figure 4 .
Figure 4. Accuracy of attack detection for three different methods and magnitude of attacks, A = 0.1, 1, and 10.

Figure 5 .
Figure 5. F1 of attack detection for three different methods and magnitude of attacks, A = 0.1, 1, and 10.

Figure 6 .
Figure 6.Effects of different values of the delay on the performance when the A = 1.

Figure 9 .
Figure 9. BER comparison of reservoir computing-based symbol detection methods (ESN and ESN) to conventional methods (LMMSE and sphere decoding).