ab.layers¶

Network layers and utilities.

class aboleth.layers.Activation(h=<function Activation.<lambda>>)¶

Bases: aboleth.baselayers.Layer

Activation function layer.

Parameters:	h (callable) – the element-wise activation function.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.DenseMAP(output_dim, l1_reg=1.0, l2_reg=1.0, use_bias=True)¶

Bases: aboleth.layers.SampleLayer

Dense (fully connected) linear layer, with MAP inference.

This implements a linear layer, and when called returns

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}\]

where \(\mathbf{X} \in \mathbb{R}^{N \times D_{in}}\), \(\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}\) and \(\mathbf{b} \in \mathbb{R}^{D_{out}}\). This layer uses maximum a-posteriori inference to learn the weights and biases, and so also returns complexity penalities (l1 or l2) for the weights and biases.

Parameters:	output_dim (int) – the dimension of the output of this layer l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \\|\mathbf{W}\\|_1\) l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \\|\mathbf{W}\\|^2_2\) use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.DenseVariational(output_dim, std=1.0, full=False, use_bias=True, prior_W=None, prior_b=None, post_W=None, post_b=None)¶

Bases: aboleth.layers.SampleLayer3

A dense (fully connected) linear layer, with variational inference.

This implements a dense linear layer,

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W} + \mathbf{b}\]

where prior, \(p(\cdot)\), and approximate posterior, \(q(\cdot)\) distributions are placed on the weights and also the biases. Here \(\mathbf{X} \in \mathbb{R}^{N \times D_{in}}\), \(\mathbf{W} \in \mathbb{R}^{D_{in} \times D_{out}}\) and \(\mathbf{b} \in \mathbb{R}^{D_{out}}\). By default, the same Normal prior is placed on each of the layer weights and biases,

\[w_{ij} \sim \mathcal{N}(0, \sigma^2), \quad b_{j} \sim \mathcal{N}(0, \sigma^2),\]

and a different Normal posterior is learned for each of the layer weights and biases,

\[w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}), \quad b_{j} \sim \mathcal{N}(l_{j}, o_{j}).\]

We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,

\[\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),\]

where \(\mathbf{m}_j \in \mathbb{R}^{D_{in}}\) and \(\mathbf{C}_j \in \mathbb{R}^{D_{in} \times D_{in}}\).

This layer will use variational inference to learn all of the non-zero prior and posterior parameters.

Whenever this layer is called, it will return the result,

\[f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)} + \mathbf{b}^{(s)}\]

with samples from the posteriors, \(\mathbf{W}^{(s)} \sim q(\mathbf{W})\) and \(\mathbf{b}^{(s)} \sim q(\mathbf{b})\). The number of samples, s, can be controlled by using the n_samples argument in an InputLayer used to feed the first layer of a model, or by tiling \(\mathbf{X}\) on the first dimension. This layer also returns the result of \(\text{KL}[q\|p]\) for all parameters.

Parameters:

output_dim (int) – the dimension of the output of this layer
std (float) – the initial value of the weight prior standard deviation (\(\sigma\) above), this is optimized a la maximum likelihood type II.
full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.
use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
prior_W (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the std parameter.
prior_b (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the std and use_bias parameters.
post_W (tf.distributions.Distribution, optional) – It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the full parameter. See also distributions.gaus_posterior.
post_b (tf.distributions.Distributions, optional) – This is the posterior distribution object to use on the layer intercept. It must have parameters compatible with (output_dim,) shaped weights. This ignores the use_bias parameters. See also distributions.norm_posterior.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.DropOut(keep_prob, observation_axis=-2)¶

Bases: aboleth.baselayers.Layer

Dropout layer, Bernoulli probability of not setting an input to zero.

This is just a thin wrapper around tf.dropout

Parameters:

keep_prob (float, Tensor) –
the probability of keeping an input. See tf.dropout.
observation_axis (int) – The axis that indexes the observations. This will assume the obserations are on the second last axis, i.e. (..., N, D). This is so we can repeat the dropout pattern over observations, which has the effect of dropping out weights consistently, thereby sampling the “latent function” of the layer.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.EmbedVariational(output_dim, n_categories, std=1.0, full=False, prior_W=None, post_W=None)¶

Bases: aboleth.layers.DenseVariational

Dense (fully connected) embedding layer, with variational inference.

This layer works directly on shape (N, 1) inputs of K category indices rather than one-hot representations, for efficiency, and is a dense linear layer,

\[f(\mathbf{X}) = \mathbf{X} \mathbf{W},\]

where prior, \(p(\cdot)\), and approximate posterior, \(q(\cdot)\) distributions are placed on the weights. Here \(\mathbf{X} \in \mathbb{N}_2^{N \times K}\) and \(\mathbf{W} \in \mathbb{R}^{K \times D_{out}}\). Though in code we represent \(\mathbf{X}\) as a vector of indices in \(\mathbb{N}_K^{N \times 1}\). By default, the same Normal prior is placed on each of the layer weights,

\[w_{ij} \sim \mathcal{N}(0, \sigma^2),\]

and a different Normal posterior is learned for each of the layer weights,

\[w_{ij} \sim \mathcal{N}(m_{ij}, c_{ij}).\]

We also have the option of placing full-covariance Gaussian posteriors on the input dimension of the weights,

\[\mathbf{w}_{j} \sim \mathcal{N}(\mathbf{m}_{j}, \mathbf{C}_{j}),\]

where \(\mathbf{m}_j \in \mathbb{R}^{K}\) and \(\mathbf{C}_j \in \mathbb{R}^{K \times K}\).

This layer will use variational inference to learn all of the non-zero prior and posterior parameters.

Whenever this layer is called, it will return the result,

\[f^{(s)}(\mathbf{X}) = \mathbf{X} \mathbf{W}^{(s)}\]

with samples from the posterior, \(\mathbf{W}^{(s)} \sim q(\mathbf{W})\). The number of samples, s, can be controlled by using the n_samples argument in an InputLayer used to feed the first layer of a model, or by tiling \(\mathbf{X}\) on the first dimension. This layer also returns the result of \(\text{KL}[q\|p]\) for all parameters.

Parameters:

output_dim (int) – the dimension of the output (embedding) of this layer
n_categories (int) – the number of categories in the input variable
std (float) – the initial value of the weight prior standard deviation (\(\sigma\) above), this is optimized a la maximum likelihood type II.
full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.
prior_W (tf.distributions.Distribution, optional) – This is the prior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the std parameter.
post_W (tf.distributions.Distribution, optional) – This is the posterior distribution object to use on the layer weights. It must have parameters compatible with (input_dim, output_dim) shaped weights. This ignores the full parameter. See also distributions.gaus_posterior.

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.InputLayer(name, n_samples=None)¶

Bases: aboleth.baselayers.MultiLayer

Create an input layer.

This layer defines input kwargs so that a user may easily provide the right inputs to a complex set of layers. It takes a 2D tensor of shape (N, D). If n_samples is specified, the input is tiled along a new first axis creating a (n_samples, N, D) tensor for propogating samples through a variational deep net.

Parameters:	name (string) – The name of the input. Used as the agument for input into the net. n_samples (int > 0) – The number of samples.

__call__(**kwargs)¶

Construct the subgraph for this layer.

Parameters:	**kwargs – the inputs to this layer (Tensors)
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.MaxPool2D(pool_size, strides, padding='SAME')¶

Bases: aboleth.baselayers.Layer

Max pooling layer for 2D inputs (e.g. images).

This is just a thin wrapper around tf.nn.max_pool

Parameters:	pool_size (tuple or list of 2 ints) – width and height of the pooling window. strides (tuple or list of 2 ints) – the strides of the pooling operation along the height and width. padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding

__call__(X)¶

Construct the subgraph for this layer.

Parameters:	X (Tensor) – the input to this layer
Returns:	Net (Tensor) – the output of this layer KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.

class aboleth.layers.RandomArcCosine(n_features, lenscale=1.0, p=1, variational=False, lenscale_posterior=None)¶

Bases: aboleth.layers.RandomFourier

Random arc-cosine kernel layer.

NOTE: This should be followed by a dense layer to properly implement a: kernel approximation.

Parameters:

n_features (int) – the number of unique random features, the actual output dimension of this layer will be 2 * n_features.
lenscale (float, ndarray, Tensor) – the lenght scales of the ar-cosine kernel, this can be a scalar for an isotropic kernel, or a vector for an automatic relevance detection (ARD) kernel.
p (int) – The order of the arc-cosine kernel, this must be an integer greater than, or eual to zero. 0 will lead to sigmoid-like kernels, 1 will lead to relu-like kernels, 2 quadratic-relu kernels etc.
variational (bool) – use variational features instead of random features, (i.e. VAR-FIXED in [2]).
lenscale_posterior (float, ndarray, optional) – the initial value for the posterior length scale. This is only used if variational==True. This can be a scalar or vector (different initial value per input dimension). If this is left as None, it will be set to sqrt(1 / input_dim) (this is similar to the ‘auto’ setting for a scikit learn SVM with a RBF kernel).