ab.layers¶
Network layers and utilities.
-
class
aboleth.layers.
Activation
(h=<function Activation.<lambda>>)¶ Bases:
aboleth.baselayers.Layer
Activation function layer.
Parameters: h (callable) – the element-wise activation function. -
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.layers.
Conv2D
(filters, kernel_size, strides=(1, 1), padding='SAME', l1_reg=0.0, l2_reg=0.0, use_bias=True, init_fn='glorot_trunc')¶ Bases:
aboleth.layers.SampleLayer
A 2D convolution layer.
This layer uses maximum likelihood or maximum a-posteriori inference to learn the convolutional kernels and biases, and so also returns complexity penalities (l1 or l2) for the weights and biases.
Parameters: - filters (int) – the dimension of the output of this layer (i.e. the number of filters in the convolution).
- kernel_size (int, tuple or list) – width and height of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.
- strides (int, tuple or list) – the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions
- padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding algorithm to use.
- l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \|\mathbf{W}\|_1\)
- l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2\)
- use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
- init_fn (str, callable) – The function to use to initialise the weights. The default is ‘glorot_trunc’, the truncated normal glorot function. If supplied, the callable takes a shape (input_dim, output_dim) as an argument and returns the weight matrix.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
Conv2DVariational
(filters, kernel_size, strides=(1, 1), padding='SAME', prior_std='glorot', learn_prior=False, use_bias=True)¶ Bases:
aboleth.layers.SampleLayer
A 2D convolution layer, with variational inference.
(Does not currently support full covariance weights.)
Parameters: - filters (int) – the dimension of the output of this layer (i.e. the number of filters in the convolution).
- kernel_size (int, tuple or list) – width and height of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.
- strides (int, tuple or list) – the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions
- padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding algorithm to use.
- prior_std (str, float) – the value of the weight prior standard deviation (\(\sigma\) above). The user can also provide a string to specify an initialisation function. Defaults to ‘glorot’. If a string, must be one of ‘glorot’ or ‘autonorm’.
- learn_prior (bool, optional) – Whether to learn the prior standard deviation.
- use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
Dense
(output_dim, l1_reg=0.0, l2_reg=0.0, use_bias=True, init_fn='glorot')¶ Bases:
aboleth.layers.SampleLayer
Dense (fully connected) linear layer.
This implements a linear layer, and when called returns
\[f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}\]where \(\mathbf{X} \in \mathbb{R}^{N \times D_{in}}\), \(\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}\) and \(\mathbf{b} \in \mathbb{R}^{D_{out}}\). This layer uses maximum likelihood or maximum a-posteriori inference to learn the weights and biases, and so also returns complexity penalities (l1 or l2) for the weights and biases.
Parameters: - output_dim (int) – the dimension of the output of this layer
- l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \|\mathbf{W}\|_1\)
- l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2\)
- use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
- init_fn (str, callable) – The function to use to initialise the weights. The default is ‘glorot’, the uniform glorot function. If supplied, the callable takes a shape (input_dim, output_dim) as an argument and returns the weight matrix.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
DenseNCP
(output_dim, prior_std=1.0, learn_prior=False, use_bias=True, latent_mean=0.0, latent_std=1.0)¶ Bases:
aboleth.layers.DenseVariational
A DenseVariational layer with Noise Constrastive Prior.
This is basically just a
DenseVariational
layer, but with an added Kullback Leibler penalty on the latent function, as derived in Equation (6) in “Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors” https://arxiv.org/abs/1807.09289.This should be the last layer in a network, and needs to be used in conjuction with
NCPContinuousPerturb
and/orNCPCategoricalPerturb
layers (after an input layer). For example:net = ( ab.InputLayer(name="X", n_samples=n_samples_) >> ab.NCPContinuousPerturb() >> ab.Dense(output_dim=32) >> ab.Activation(tf.nn.selu) >> ... ab.Dense(output_dim=8) >> ab.Activation(tf.nn.selu) >> ab.DenseNCP(output_dim=1) )
As you can see from this example, we have only made the last layer probabilistic/Bayesian (
DenseNCP
), and have left the rest of the network maximum likelihood/MAP. This is also how the original authors of the algorithm have implemented it. While this layer also works withDenseVariational
layers (etc.) this is not how is has been originally implemented, and the contribution of uncertainty from these layers to the latent function will not be accounted for in this layer. This is because the nonlinear activations between layers make evaluating this density intractable, unless we had something like normalising flows.Parameters: - output_dim (int) – the dimension of the output of this layer
- prior_std (str, float) – the value of the weight prior standard deviation (\(\sigma\) above). The user can also provide a string to specify an initialisation function. Defaults to ‘glorot’. If a string, must be one of ‘glorot’ or ‘autonorm’.
- learn_prior (bool, optional) – Whether to learn the prior on the weights.
- use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
- latent_mean (float) – The prior mean over the latent function(s) on the output of this layer. This specifies what value the latent function should take away from the support of the training data.
- latent_std (float) – The prior standard deviation over the latent function(s) on the output of this layer. This controls the strength of the regularisation away from the latent mean.
Note
This implementation is inspired by: https://github.com/brain-research/ncp/blob/master/ncp/models/bbb_ncp.py
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
DenseVariational
(output_dim, prior_std=1.0, learn_prior=False, full=False, use_bias=True)¶ Bases:
aboleth.layers.SampleLayer3
A dense (fully connected) linear layer, with variational inference.
This implements a dense linear layer,
\[f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}\]where prior, \(p(\cdot)\), and approximate posterior, \(q(\cdot)\) distributions are placed on the weights and also the biases. Here \(\mathbf{X} \in \mathbb{R}^{N \times D_{in}}\), \(\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}\) and \(\mathbf{b} \in \mathbb{R}^{D_{out}}\). By default, the same Normal prior is placed on each of the layer weights and biases,
\[w_{ij} \sim \mathcal{N}(0, \sigma^2), \quad b_{j} \sim \mathcal{N}(0, \sigma^2),\]and a different Normal posterior is learned for each of the layer weights and biases,
\[w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}), \quad b_{j} \sim \mathcal{N}(l_{j}, o_{j}).\]We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,
\[\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),\]where \(\mathbf{m}_j \in \mathbb{R}^{D_{in}}\) and \(\mathbf{C}_j \in \mathbb{R}^{D_{in} \times D_{in}}\).
This layer will use variational inference to learn the posterior parameters, and optionally the
prior_std
parameter can be learned iflearn_prior
is set to True. The given value is then used to initialize.Whenever this layer is called, it will return the result,
\[f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)} + \mathbf{b}^{(s)}\]with samples from the posteriors, \(\mathbf{W}^{(s)} \sim q(\mathbf{W})\) and \(\mathbf{b}^{(s)} \sim q(\mathbf{b})\). The number of samples, s, can be controlled by using the
n_samples
argument in anInputLayer
used to feed the first layer of a model, or by tiling \(\mathbf{X}\) on the first dimension. This layer also returns the result of \(\text{KL}[q\|p]\) for all parameters.Parameters: - output_dim (int) – the dimension of the output of this layer
- prior_std (str, float) – the value of the weight prior standard deviation (\(\sigma\) above). The user can also provide a string to specify an initialisation function. Defaults to ‘glorot’. If a string, must be one of ‘glorot’ or ‘autonorm’.
- learn_prior (bool, optional) – Whether to learn the prior
- full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.
- use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
DropOut
(keep_prob, independent=True, observation_axis=1, alpha=False)¶ Bases:
aboleth.baselayers.Layer
Dropout layer, Bernoulli probability of not setting an input to zero.
This is just a thin wrapper around tf.dropout
Parameters: - keep_prob (float, Tensor) –
the probability of keeping an input. See tf.dropout.
- independent (bool) – Use independently sampled droput for each observation if
True
. This may dramatically increase convergence, but will no longer only sample the latent function. - observation_axis (int) – The axis that indexes the observations (
N
). This will assume the obserations are on the second axis, i.e.(n_samples, N, ...)
. This is so we can repeat the dropout pattern over observations, which has the effect of dropping out weights consistently, thereby sampling the “latent function” of the layer. This is only active ifindependent
is set toFalse
. - alpha (bool) – Use alpha dropout (tf.contrib.nn.alpha_dropout) that maintains the self normalising property of SNNs.
Note
If a more complex noise shape, or some other modification to dropout is required, you can use an Activation layer. E.g.
ab.Activation(lambda x: tf.nn.dropout(x, **your_args))
.-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
- keep_prob (float, Tensor) –
-
class
aboleth.layers.
Embed
(output_dim, n_categories, l1_reg=0.0, l2_reg=0.0, init_fn='glorot')¶ Bases:
aboleth.layers.SampleLayer3
Dense (fully connected) embedding layer.
This layer works directly on inputs of K category indices rather than one-hot representations, for efficiency. Note, this only works on a single column, see the
PerFeature
layer to embed multiple columns. E.g.cat_layers = [Embed(10, k) for k in x_categories] net = ( ab.InputLayer(name="X", n_samples=n_samples_) >> ab.PerFeature(*cat_layers) >> ab.Activation(tf.nn.selu) >> ... )
It is a dense linear layer,
\[f(\mathbf{X}) = \mathbf{X} \mathbf{W}\]Here \(\mathbf{X} \in \mathbb{N}_2^{N \times K}\) and \(\mathbf{W} \in \mathbb{R}^{K \times D_{out}}\). Though in code we represent \(\mathbf{X}\) as a vector of indices in \(\mathbb{N}_K^{N \times 1}\). This layer uses maximum likelihood or maximum a-posteriori inference to learn the weights and so also returns complexity penalities (l1 or l2) for the weights.
Parameters: - output_dim (int) – the dimension of the output (embedding) of this layer
- n_categories (int) – the number of categories in the input variable
- l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \|\mathbf{W}\|_1\)
- l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2\)
- init_fn (str, callable) – The function to use to initialise the weights. The default is ‘glorot’, the uniform glorot function. If supplied, the callable takes a shape (input_dim, output_dim) as an argument and returns the weight matrix.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
EmbedVariational
(output_dim, n_categories, prior_std=1.0, learn_prior=False, full=False)¶ Bases:
aboleth.layers.DenseVariational
Dense (fully connected) embedding layer, with variational inference.
This layer works directly on inputs of K category indices rather than one-hot representations, for efficiency. Note, this only works on a single column, see the
PerFeature
layer to embed multiple columns. Eg.cat_layers = [EmbedVar(10, k) for k in x_categories] net = ( ab.InputLayer(name="X", n_samples=n_samples_) >> ab.PerFeature(*cat_layers) >> ab.Activation(tf.nn.selu) >> ... )
This layer is a effectively a
DenseVariational
layer,\[f(\mathbf{X}) = \mathbf{X} \mathbf{W},\]where prior, \(p(\cdot)\), and approximate posterior, \(q(\cdot)\) distributions are placed on the weights. Here \(\mathbf{X} \in \mathbb{N}_2^{N \times K}\) and \(\mathbf{W} \in \mathbb{R}^{K \times D_{out}}\). Though in code we represent \(\mathbf{X}\) as a vector of indices in \(\mathbb{N}_K^{N \times 1}\). By default, the same Normal prior is placed on each of the layer weights,
\[w_{ij} \sim \mathcal{N}(0, \sigma^2),\]and a different Normal posterior is learned for each of the layer weights,
\[w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}).\]We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,
\[\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),\]where \(\mathbf{m}_j \in \mathbb{R}^{K}\) and \(\mathbf{C}_j \in \mathbb{R}^{K \times K}\).
This layer will use variational inference to learn the posterior parameters, and optionally the
prior_std
parameter can be learned iflearn_prior
is set to True. Theprior_std
value given will be used for initialization.Whenever this layer is called, it will return the result,
\[f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)}\]with samples from the posterior, \(\mathbf{W}^{(s)} \sim q(\mathbf{W})\). The number of samples, s, can be controlled by using the
n_samples
argument in anInputLayer
used to feed the first layer of a model, or by tiling \(\mathbf{X}\) on the first dimension. This layer also returns the result of \(\text{KL}[q\|p]\) for all parameters.Parameters: - output_dim (int) – the dimension of the output (embedding) of this layer
- n_categories (int) – the number of categories in the input variable
- prior_std (str, float) – the value of the weight prior standard deviation (\(\sigma\) above). The user can also provide a string to specify an initialisation function. Defaults to ‘glorot’. If a string, must be one of ‘glorot’ or ‘autonorm’.
- learn_prior (bool, optional) – Whether to learn the prior
- full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
Flatten
¶ Bases:
aboleth.baselayers.Layer
Flattening layer.
Reshape and output a tensor to be always rank 3 (keeps first dimension which is samples, and second dimension which is observations).
I.e. if
X.shape
is(3, 100, 5, 5, 3)
this flatten the last dimensions to(3, 100, 75)
.-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.layers.
InputLayer
(name, n_samples=1)¶ Bases:
aboleth.baselayers.MultiLayer
Create an input layer.
This layer defines input kwargs so that a user may easily provide the right inputs to a complex set of layers. It takes a tensor of shape
(N, ...)
. The input is tiled along a new first axis creating a(n_samples, N, ...)
tensor for propagating samples through a variational deep net.Parameters: - name (string) – The name of the input. Used as the argument for input into the net.
- n_samples (int, Tensor) – The number of samples to propagate through the network. We recommend
making this a
tf.placeholder
so you can vary it as required.
Note
We recommend making
n_samples
atf.placeholder
so it can be varied between training and prediction!-
__call__
(**kwargs)¶ Construct the subgraph for this layer.
Parameters: **kwargs – the inputs to this layer (Tensors) Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
MaxPool2D
(pool_size, strides, padding='SAME')¶ Bases:
aboleth.baselayers.Layer
Max pooling layer for 2D inputs (e.g. images).
This is just a thin wrapper around tf.nn.max_pool
Parameters: - pool_size (tuple or list of 2 ints) – width and height of the pooling window.
- strides (tuple or list of 2 ints) – the strides of the pooling operation along the height and width.
- padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
NCPCategoricalPerturb
(n_categories, flip_prob=0.1)¶ Bases:
aboleth.layers.SampleLayer
Noise Constrastive Prior categorical variable perturbation layer.
This layer doubles the number of samples going through the model, and randomly flips the categories in the second set of samples. This implements (the categorical version of) Equation 3 in “Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors” https://arxiv.org/abs/1807.09289.
The choice to randomly flip a category is drawn from a Bernoulli distribution per sample (with probability
flip_prob
), then the new category is randomly chosen with probability1 / n_categories
.This should be the first layer in a network after an input layer, and needs to be used in conjuction with
DenseNCP
. Also, like the embedding layers, this only applies to one column of categorical inputs, so we advise you use it with thePerFeature
layer. For example:cat_layers = [ (NCPCategoricalPerturb(k) >> Embed(10, k)) for k in x_categories ] net = ( ab.InputLayer(name="X", n_samples=n_samples_) >> ab.PerFeature(*cat_layers) >> ab.Activation(tf.nn.selu) >> ab.Dense(output_dim=32) >> ab.Activation(tf.nn.selu) >> ... ab.Dense(output_dim=8) >> ab.Activation(tf.nn.selu) >> ab.DenseNCP(output_dim=1) )
Parameters: input_noise (float, tf.Tensor, tf.Variable) – The standard deviation of the random perturbation to add to the inputs. -
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.layers.
NCPContinuousPerturb
(input_noise=1.0)¶ Bases:
aboleth.layers.SampleLayer
Noise Constrastive Prior continous variable perturbation layer.
This layer doubles the number of samples going through the model, and adds a random normal perturbation to the second set of samples. This implements Equation 3 in “Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors” https://arxiv.org/abs/1807.09289.
This should be the first layer in a network after an input layer, and needs to be used in conjuction with
DenseNCP
. For example:net = ( ab.InputLayer(name="X", n_samples=n_samples_) >> ab.NCPContinuousPerturb() >> ab.Dense(output_dim=32) >> ab.Activation(tf.nn.selu) >> ... ab.Dense(output_dim=8) >> ab.Activation(tf.nn.selu) >> ab.DenseNCP(output_dim=1) )
Parameters: input_noise (float, tf.Tensor, tf.Variable) – The standard deviation of the random perturbation to add to the inputs. -
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.layers.
RandomArcCosine
(n_features, lenscale=None, p=1, variational=False, learn_lenscale=False)¶ Bases:
aboleth.layers.RandomFourier
Random arc-cosine kernel layer.
Parameters: - n_features (int) – the number of unique random features, the actual output dimension of
this layer will be
2 * n_features
. - lenscale (float, ndarray, optional) – The length scales of the arc-cosine kernel. This can be a scalar
for an isotropic kernel, or a vector of shape (input_dim,) for an
automatic relevance detection (ARD) kernel. If not provided, it will
be set to
sqrt(1 / input_dim)
(this is similar to the ‘auto’ setting for a scikit learn SVM with a RBF kernel). If learn_lenscale is True, lenscale will be its initial value. - p (int) – The order of the arc-cosine kernel, this must be an integer greater than, or eual to zero. 0 will lead to sigmoid-like kernels, 1 will lead to relu-like kernels, 2 quadratic-relu kernels etc.
- variational (bool) – use variational features instead of random features, (i.e. VAR-FIXED in [2]).
- learn_lenscale (bool) – Whether to learn the length scale. If True, the lenscale value provided is used for initialisation.
Note
This should be followed by a dense layer to properly implement a kernel approximation.
See also
- [1] Cho, Youngmin, and Lawrence K. Saul.
- “Analysis and extension of arc-cosine kernels for large margin classification.” arXiv preprint arXiv:1112.3712 (2011).
- [2] Cutajar, K. Bonilla, E. Michiardi, P. Filippone, M.
- Random Feature Expansions for Deep Gaussian Processes. In ICML, 2017.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
- n_features (int) – the number of unique random features, the actual output dimension of
this layer will be
-
class
aboleth.layers.
RandomFourier
(n_features, kernel)¶ Bases:
aboleth.layers.SampleLayer3
Random Fourier feature (RFF) kernel approximation layer.
Parameters: - n_features (int) – the number of unique random features, the actual output dimension of
this layer will be
2 * n_features
. - kernel (kernels.ShiftInvariant) – the kernel object that yeilds the random samples from the fourier spectrum of a particular kernel to approximate. See the ab.kernels module.
Note
This should be followed by a dense layer to properly implement a kernel approximation.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
- n_features (int) – the number of unique random features, the actual output dimension of
this layer will be
-
class
aboleth.layers.
SampleLayer
¶ Bases:
aboleth.baselayers.Layer
Sample Layer base class.
This is the base class for layers that build upon stochastic (variational) nets. These expect rank >= 3 input Tensors, where the first dimension indexes the random samples of the stochastic net.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.layers.
SampleLayer3
¶ Bases:
aboleth.layers.SampleLayer
Special case of SampleLayer restricted to rank == 3 input Tensors.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-