Page 4 - Detection and prediction of a beam-driven mode in field-reversed configuration plasma with recurrent neural networks
P. 4
Nucl. Fusion 60 (2020) 126025
C. Scott et al
Plots of readings from several diagnostic probes in a representative shot of the C-2U experiment (this shot includes staircase behavior, as can be seen in the upper left plot). Time axis is identical for each subplot. For a full list of probe types and descriptions, refer to the beginning of section 2.
guarantees [14] that even a small neural network is capable of approximating, to an arbitrary degree of error, any continu- ous function of Rn (given enough data). However, despite this theoretical guarantee, a basic ANN may require an infeasible amount of sample points to learn the underlying distribution of a dataset. Hence, more complicated neural network mod- els exist. These models are built in a way which assumes (and therefore takes advantage of) known structure in the data; this idea is referred to as the ‘inductive bias’ of a machine learn- ing model [15]. We discuss some particular examples of such specialized ANN models later in this section, paying specific attention to the variants used in the numerical experiments
When training neural networks (or any other machine learn- ing model), it is typical to split the data into a training dataset (used to adjust the parameters of the model), and a separate testing dataset. The testing dataset is only used to evaluate the model; this provides some statistical guarantee that the model will generalize to unseen data. See [19], chapter 11, for details on techniques for model evaluation and validation. Additionally, all neural network models discussed in this work were trained with a regularization technique called dropout [20], which reduces overfitting by (once per training batch) randomly masking a percentage (the ‘dropout percentage’) of neurons in a given layer to have zero output. Neurons are only dropped out during training, not during inference or evalu- ation.
Convolutional neural networks (CNNs) assume that the data is structured spatially; that is, nodes which are ‘near’ each other, in some sense, are related. For example, pixels in a natural image are ‘near’ their immediate neighbors. CNNs process data stuctured this way more efficiently by comput- ing weighted functions of each pixel’s neighborhood (like an
2 Superscripts in these equations represent the variable which indexes layers of the network, and are parenthesized to disambiguate them from exponenti- ation.
Figure 3.
4
later in the paper. For a more detailed explanation of neural networks and their variants, we refer the reader to [16] and [17].
At its most basic, a neural network consists of k layers of the form2:
a(l) = W(l) · x(l−1) + b(l) x(l) = σ(a(l)),
where:
• x(i) is understood to be the vector of neuron values of layer i;
• W(i) and b(i) are, respectively, a matrix and vector of vari- ables usually referred to as the ‘weight matrix’ and ‘bias vector’;
• andσisanon-linearfunction(logisticsigmoidandtanhare common choices).
Given a training instance consisting of an input x(0) and a
target output y, a neural network is trained by first perform-
ing a feed-forward pass to compute all layers including the
(k)
final output ˆy = x
ˆy and target y (the specific error function will vary according to application). Then, the gradient of the error is backpropagated [18] through the network (by using the Chain rule of multivari- ate calculus) to find the gradient of each variable with respect to the error. Gradient descent is performed, with an appropri- ate step size, to adjust these variables. Typically elements of the dataset are not processed one-by-one but in batches of size 1 < b ≪ N, where N is the size of the dataset. Many variants of this basic idea and model structure exist, many of which are designed to leverage known structure about a given data- set (see below).
. Error E is computed between the output