ab.layers¶
Network layers and utilities.
-
class
aboleth.layers.
Activation
(h=<function Activation.<lambda>>)¶ Bases:
aboleth.baselayers.Layer
Activation function layer.
Parameters: h (callable) – the element-wise activation function. -
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.layers.
DenseMAP
(output_dim, l1_reg=1.0, l2_reg=1.0, use_bias=True)¶ Bases:
aboleth.layers.SampleLayer
Dense (fully connected) linear layer, with MAP inference.
This implements a linear layer, and when called returns
\[f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}\]where \(\mathbf{X} \in \mathbb{R}^{N \times D_{in}}\), \(\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}\) and \(\mathbf{b} \in \mathbb{R}^{D_{out}}\). This layer uses maximum a-posteriori inference to learn the weights and biases, and so also returns complexity penalities (l1 or l2) for the weights and biases.
Parameters: - output_dim (int) – the dimension of the output of this layer
- l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \|\mathbf{W}\|_1\)
- l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2\)
- use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
DenseVariational
(output_dim, std=1.0, full=False, use_bias=True, prior_W=None, prior_b=None, post_W=None, post_b=None)¶ Bases:
aboleth.layers.SampleLayer3
A dense (fully connected) linear layer, with variational inference.
This implements a dense linear layer,
\[f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}\]where prior, \(p(\cdot)\), and approximate posterior, \(q(\cdot)\) distributions are placed on the weights and also the biases. Here \(\mathbf{X} \in \mathbb{R}^{N \times D_{in}}\), \(\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}\) and \(\mathbf{b} \in \mathbb{R}^{D_{out}}\). By default, the same Normal prior is placed on each of the layer weights and biases,
\[w_{ij} \sim \mathcal{N}(0, \sigma^2), \quad b_{j} \sim \mathcal{N}(0, \sigma^2),\]and a different Normal posterior is learned for each of the layer weights and biases,
\[w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}), \quad b_{j} \sim \mathcal{N}(l_{j}, o_{j}).\]We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,
\[\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),\]where \(\mathbf{m}_j \in \mathbb{R}^{D_{in}}\) and \(\mathbf{C}_j \in \mathbb{R}^{D_{in} \times D_{in}}\).
This layer will use variational inference to learn all of the non-zero prior and posterior parameters.
Whenever this layer is called, it will return the result,
\[f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)} + \mathbf{b}^{(s)}\]with samples from the posteriors, \(\mathbf{W}^{(s)} \sim q(\mathbf{W})\) and \(\mathbf{b}^{(s)} \sim q(\mathbf{b})\). The number of samples, s, can be controlled by using the
n_samples
argument in anInputLayer
used to feed the first layer of a model, or by tiling \(\mathbf{X}\) on the first dimension. This layer also returns the result of \(\text{KL}[q\|p]\) for all parameters.Parameters: - output_dim (int) – the dimension of the output of this layer
- std (float) – the initial value of the weight prior standard deviation (\(\sigma\) above), this is optimized a la maximum likelihood type II.
- full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.
- use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
- prior_W (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer weights. It
must have parameters compatible with (input_dim, output_dim) shaped
weights. This ignores the
std
parameter. - prior_b (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer intercept. It
must have parameters compatible with (output_dim,) shaped weights.
This ignores the
std
anduse_bias
parameters. - post_W (tf.distributions.Distribution, optional) – It must have parameters compatible with (input_dim, output_dim) shaped
weights. This ignores the
full
parameter. See alsodistributions.gaus_posterior
. - post_b (tf.distributions.Distributions, optional) – This is the posterior distribution object to use on the layer
intercept. It must have parameters compatible with (output_dim,) shaped
weights. This ignores the
use_bias
parameters. See alsodistributions.norm_posterior
.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
DropOut
(keep_prob, observation_axis=-2)¶ Bases:
aboleth.baselayers.Layer
Dropout layer, Bernoulli probability of not setting an input to zero.
This is just a thin wrapper around tf.dropout
Parameters: - keep_prob (float, Tensor) –
the probability of keeping an input. See tf.dropout.
- observation_axis (int) – The axis that indexes the observations. This will assume the
obserations are on the second last axis, i.e.
(..., N, D)
. This is so we can repeat the dropout pattern over observations, which has the effect of dropping out weights consistently, thereby sampling the “latent function” of the layer.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
- keep_prob (float, Tensor) –
-
class
aboleth.layers.
EmbedVariational
(output_dim, n_categories, std=1.0, full=False, prior_W=None, post_W=None)¶ Bases:
aboleth.layers.DenseVariational
Dense (fully connected) embedding layer, with variational inference.
This layer works directly on shape
(N, 1)
inputs of K category indices rather than one-hot representations, for efficiency, and is a dense linear layer,\[f(\mathbf{X}) = \mathbf{X} \mathbf{W},\]where prior, \(p(\cdot)\), and approximate posterior, \(q(\cdot)\) distributions are placed on the weights. Here \(\mathbf{X} \in \mathbb{N}_2^{N \times K}\) and \(\mathbf{W} \in \mathbb{R}^{K \times D_{out}}\). Though in code we represent \(\mathbf{X}\) as a vector of indices in \(\mathbb{N}_K^{N \times 1}\). By default, the same Normal prior is placed on each of the layer weights,
\[w_{ij} \sim \mathcal{N}(0, \sigma^2),\]and a different Normal posterior is learned for each of the layer weights,
\[w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}).\]We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,
\[\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),\]where \(\mathbf{m}_j \in \mathbb{R}^{K}\) and \(\mathbf{C}_j \in \mathbb{R}^{K \times K}\).
This layer will use variational inference to learn all of the non-zero prior and posterior parameters.
Whenever this layer is called, it will return the result,
\[f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)}\]with samples from the posterior, \(\mathbf{W}^{(s)} \sim q(\mathbf{W})\). The number of samples, s, can be controlled by using the
n_samples
argument in anInputLayer
used to feed the first layer of a model, or by tiling \(\mathbf{X}\) on the first dimension. This layer also returns the result of \(\text{KL}[q\|p]\) for all parameters.Parameters: - output_dim (int) – the dimension of the output (embedding) of this layer
- n_categories (int) – the number of categories in the input variable
- std (float) – the initial value of the weight prior standard deviation (\(\sigma\) above), this is optimized a la maximum likelihood type II.
- full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.
- prior_W (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer weights. It
must have parameters compatible with (input_dim, output_dim) shaped
weights. This ignores the
std
parameter. - post_W (tf.distributions.Distribution, optional) – This is the posterior distribution object to use on the layer weights.
It must have parameters compatible with (input_dim, output_dim) shaped
weights. This ignores the
full
parameter. See alsodistributions.gaus_posterior
.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
InputLayer
(name, n_samples=None)¶ Bases:
aboleth.baselayers.MultiLayer
Create an input layer.
This layer defines input kwargs so that a user may easily provide the right inputs to a complex set of layers. It takes a 2D tensor of shape
(N, D)
. If n_samples is specified, the input is tiled along a new first axis creating a(n_samples, N, D)
tensor for propogating samples through a variational deep net.Parameters: - name (string) – The name of the input. Used as the agument for input into the net.
- n_samples (int > 0) – The number of samples.
-
__call__
(**kwargs)¶ Construct the subgraph for this layer.
Parameters: **kwargs – the inputs to this layer (Tensors) Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
MaxPool2D
(pool_size, strides, padding='SAME')¶ Bases:
aboleth.baselayers.Layer
Max pooling layer for 2D inputs (e.g. images).
This is just a thin wrapper around tf.nn.max_pool
Parameters: - pool_size (tuple or list of 2 ints) – width and height of the pooling window.
- strides (tuple or list of 2 ints) – the strides of the pooling operation along the height and width.
- padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
RandomArcCosine
(n_features, lenscale=1.0, p=1, variational=False, lenscale_posterior=None)¶ Bases:
aboleth.layers.RandomFourier
Random arc-cosine kernel layer.
- NOTE: This should be followed by a dense layer to properly implement a
- kernel approximation.
Parameters: - n_features (int) – the number of unique random features, the actual output dimension of
this layer will be
2 * n_features
. - lenscale (float, ndarray, Tensor) – the lenght scales of the ar-cosine kernel, this can be a scalar for an isotropic kernel, or a vector for an automatic relevance detection (ARD) kernel.
- p (int) – The order of the arc-cosine kernel, this must be an integer greater than, or eual to zero. 0 will lead to sigmoid-like kernels, 1 will lead to relu-like kernels, 2 quadratic-relu kernels etc.
- variational (bool) – use variational features instead of random features, (i.e. VAR-FIXED in [2]).
- lenscale_posterior (float, ndarray, optional) – the initial value for the posterior length scale. This is only used
if
variational==True
. This can be a scalar or vector (different initial value per input dimension). If this is left as None, it will be set tosqrt(1 / input_dim)
(this is similar to the ‘auto’ setting for a scikit learn SVM with a RBF kernel).
See also
- [1] Cho, Youngmin, and Lawrence K. Saul.
- “Analysis and extension of arc-cosine kernels for large margin classification.” arXiv preprint arXiv:1112.3712 (2011).
- [2] Cutajar, K. Bonilla, E. Michiardi, P. Filippone, M.
- Random Feature Expansions for Deep Gaussian Processes. In ICML, 2017.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
RandomFourier
(n_features, kernel)¶ Bases:
aboleth.layers.SampleLayer3
Random Fourier feature (RFF) kernel approximation layer.
- NOTE: This should be followed by a dense layer to properly implement a
- kernel approximation.
Parameters: - n_features (int) – the number of unique random features, the actual output dimension of
this layer will be
2 * n_features
. - kernel (kernels.ShiftInvariant) – the kernel object that yeilds the random samples from the fourier spectrum of a particular kernel to approximate. See the ab.kernels module.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
Reshape
(target_shape)¶ Bases:
aboleth.baselayers.Layer
Reshape layer.
Reshape and output an tensor to a specified shape.
Parameters: targe_shape (tuple of ints) – Does not include the samples or batch axes. -
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.layers.
SampleLayer
¶ Bases:
aboleth.baselayers.Layer
Sample Layer base class.
This is the base class for layers that build upon stochastic (variational) nets. These expect rank >= 3 input Tensors, where the first dimension indexes the random samples of the stochastic net.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.layers.
SampleLayer3
¶ Bases:
aboleth.layers.SampleLayer
Special case of SampleLayer restricted to rank == 3 input Tensors.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-