laplace.utils
#
Classes:
-
SoDSampler
– -
FeatureExtractor
–Feature extractor for a PyTorch neural network.
-
Kron
–Kronecker factored approximate curvature representation for a corresponding
-
KronDecomposed
–Decomposed Kronecker factored approximate curvature representation
-
SubnetMask
–Baseclass for all subnetwork masks in this library (for subnetwork Laplace).
-
RandomSubnetMask
–Subnetwork mask of parameters sampled uniformly at random.
-
LargestMagnitudeSubnetMask
–Subnetwork mask identifying the parameters with the largest magnitude.
-
LargestVarianceDiagLaplaceSubnetMask
–Subnetwork mask identifying the parameters with the largest marginal variances
-
LargestVarianceSWAGSubnetMask
–Subnetwork mask identifying the parameters with the largest marginal variances
-
ParamNameSubnetMask
–Subnetwork mask corresponding to the specified parameters of the neural network.
-
ModuleNameSubnetMask
–Subnetwork mask corresponding to the specified modules of the neural network.
-
LastLayerSubnetMask
–Subnetwork mask corresponding to the last layer of the neural network.
-
RunningNLLMetric
–NLL metrics that
Functions:
-
get_nll
– -
validate
– -
parameters_per_layer
–Get number of parameters per layer.
-
invsqrt_precision
–Compute
M^{-0.5}
as a tridiagonal matrix. -
kron
–Computes the Kronecker product between two tensors.
-
diagonal_add_scalar
–Add scalar value
value
to diagonal ofX
. -
symeig
–Symetric eigendecomposition avoiding failure cases by
-
block_diag
–Compose block-diagonal matrix of individual blocks.
-
normal_samples
–Produce samples from a batch of Normal distributions either parameterized
-
_is_batchnorm
– -
_is_valid_scalar
– -
expand_prior_precision
–Expand prior precision to match the shape of the model parameters.
-
fix_prior_prec_structure
–Create a tensor of prior precision with the correct shape, depending on the
-
fit_diagonal_swag_var
–Fit diagonal SWAG [1], which estimates marginal variances of model parameters by
FeatureExtractor
#
FeatureExtractor(model: Module, last_layer_name: str | None = None, enable_backprop: bool = False, feature_reduction: FeatureReduction | str | None = None)
Bases: Module
Feature extractor for a PyTorch neural network. A wrapper which can return the output of the penultimate layer in addition to the output of the last layer for each forward pass. If the name of the last layer is not known, it can determine it automatically. It assumes that the last layer is linear and that for every forward pass the last layer is the same. If the name of the last layer is known, it can be passed as a parameter at initilization; this is the safest way to use this class. Based on https://gist.github.com/fkodom/27ed045c9051a39102e8bcf4ce31df76.
Parameters:
-
model
#Module
) –PyTorch model
-
last_layer_name
#str
, default:None
) –if the name of the last layer is already known, otherwise it will be determined automatically.
-
enable_backprop
#bool
, default:False
) –whether to enable backprop through the feature extactor to get the gradients of the inputs. Useful for e.g. Bayesian optimization.
-
feature_reduction
#FeatureReduction | str | None
, default:None
) –when the last-layer
features
is a tensor of dim >= 3, this tells how to reduce it into a dim-2 tensor. E.g. in LLMs for non-language modeling problems, the penultultimate output is a tensor of shape(batch_size, seq_len, embd_dim)
. But the last layer maps(batch_size, embd_dim)
to(batch_size, n_classes)
. Note: Make sure that this option faithfully reflects the reduction in the model definition. When inputting a string, available options are{'pick_first', 'pick_last', 'average'}
.
Methods:
-
forward
–Forward pass. If the last layer is not known yet, it will be
-
forward_with_features
–Forward pass which returns the output of the penultimate layer along
-
set_last_layer
–Set the last layer of the model by its name. This sets the forward
-
find_last_layer
–Automatically determines the last layer of the model with one
Source code in laplace/utils/feature_extractor.py
forward
#
forward(x: Tensor | MutableMapping[str, Tensor | Any]) -> Tensor
Forward pass. If the last layer is not known yet, it will be determined when this function is called for the first time.
Parameters:
-
x
#torch.Tensor or a dict-like object containing the input tensors
) –one batch of data to use as input for the forward pass
Source code in laplace/utils/feature_extractor.py
forward_with_features
#
forward_with_features(x: Tensor | MutableMapping[str, Tensor | Any]) -> tuple[Tensor, Tensor]
Forward pass which returns the output of the penultimate layer along with the output of the last layer. If the last layer is not known yet, it will be determined when this function is called for the first time.
Parameters:
-
x
#torch.Tensor or a dict-like object containing the input tensors
) –one batch of data to use as input for the forward pass
Source code in laplace/utils/feature_extractor.py
set_last_layer
#
set_last_layer(last_layer_name: str) -> None
Set the last layer of the model by its name. This sets the forward hook to get the output of the penultimate layer.
Parameters:
Source code in laplace/utils/feature_extractor.py
find_last_layer
#
find_last_layer(x: Tensor | MutableMapping[str, Tensor | Any]) -> Tensor
Automatically determines the last layer of the model with one
forward pass. It assumes that the last layer is the same for every
forward pass and that it is an instance of torch.nn.Linear
.
Might not work with every architecture, but is tested with all PyTorch
torchvision classification models (besides SqueezeNet, which has no
linear last layer).
Parameters:
-
x
#torch.Tensor or dict-like object containing the input tensors
) –one batch of data to use as input for the forward pass
Source code in laplace/utils/feature_extractor.py
Kron
#
Kronecker factored approximate curvature representation for a corresponding
neural network.
Each element in kfacs
is either a tuple or single matrix.
A tuple represents two Kronecker factors \(Q\), and \(H\) and a single element
is just a full block Hessian approximation.
Parameters:
-
kfacs
#list[Iterable[Tensor] | Tensor]
) –each element in the list is a tuple of two Kronecker factors Q, H or a single matrix approximating the Hessian (in case of bias, for example)
Methods:
-
init_from_model
–Initialize Kronecker factors based on a models architecture.
-
__add__
–Add up Kronecker factors
self
andother
. -
__mul__
–Multiply all Kronecker factors by scalar.
-
decompose
–Eigendecompose Kronecker factors and turn into
KronDecomposed
. -
bmm
–Batched matrix multiplication with the Kronecker factors.
-
logdet
–Compute log determinant of the Kronecker factors and sums them up.
-
diag
–Extract diagonal of the entire Kronecker factorization.
-
to_matrix
–Make the Kronecker factorization dense by computing the kronecker product.
Source code in laplace/utils/matrix.py
init_from_model
#
Initialize Kronecker factors based on a models architecture.
Parameters:
Returns:
-
kron
(Kron
) –
Source code in laplace/utils/matrix.py
__add__
#
Add up Kronecker factors self
and other
.
Parameters:
Returns:
-
kron
(Kron
) –
Source code in laplace/utils/matrix.py
__mul__
#
Multiply all Kronecker factors by scalar.
The multiplication is distributed across the number of factors
using pow(scalar, 1 / len(F))
. len(F)
is either 1
or 2
.
Parameters:
Returns:
-
kron
(Kron
) –
Source code in laplace/utils/matrix.py
decompose
#
decompose(damping: bool = False) -> KronDecomposed
Eigendecompose Kronecker factors and turn into KronDecomposed
.
Parameters:
Returns:
-
kron_decomposed
(KronDecomposed
) –
Source code in laplace/utils/matrix.py
_bmm
#
_bmm(W: Tensor) -> Tensor
Implementation of bmm
which casts the parameters to the right shape.
Parameters:
-
W
#Tensor
) –matrix
(batch, classes, params)
Returns:
-
SW
(Tensor
) –result
(batch, classes, params)
Source code in laplace/utils/matrix.py
bmm
#
Batched matrix multiplication with the Kronecker factors.
If Kron is H
, we compute H @ W
.
This is useful for computing the predictive or a regularization
based on Kronecker factors as in continual learning.
Parameters:
-
W
#Tensor
) –matrix
(batch, classes, params)
-
exponent
#float
, default:1
) –only can be
1
for Kron, requiresKronDecomposed
for other exponent values of the Kronecker factors.
Returns:
-
SW
(Tensor
) –result
(batch, classes, params)
Source code in laplace/utils/matrix.py
logdet
#
Compute log determinant of the Kronecker factors and sums them up. This corresponds to the log determinant of the entire Hessian approximation.
Returns:
-
logdet
(Tensor
) –
Source code in laplace/utils/matrix.py
diag
#
Extract diagonal of the entire Kronecker factorization.
Returns:
-
diag
(Tensor
) –
Source code in laplace/utils/matrix.py
to_matrix
#
Make the Kronecker factorization dense by computing the kronecker product. Warning: this should only be used for testing purposes as it will allocate large amounts of memory for big architectures.
Returns:
-
block_diag
(Tensor
) –
Source code in laplace/utils/matrix.py
KronDecomposed
#
KronDecomposed(eigenvectors: list[tuple[Tensor]], eigenvalues: list[tuple[Tensor]], deltas: Tensor | None = None, damping: bool = False)
Decomposed Kronecker factored approximate curvature representation
for a corresponding neural network.
Each matrix in Kron
is decomposed to obtain KronDecomposed
.
Front-loading decomposition allows cheap repeated computation
of inverses and log determinants.
In contrast to Kron
, we can add scalar or layerwise scalars but
we cannot add other Kron
or KronDecomposed
anymore.
Parameters:
-
eigenvectors
#list[Tuple[Tensor]]
) –eigenvectors corresponding to matrices in a corresponding
Kron
-
eigenvalues
#list[Tuple[Tensor]]
) –eigenvalues corresponding to matrices in a corresponding
Kron
-
deltas
#Tensor
, default:None
) –addend for each group of Kronecker factors representing, for example, a prior precision
-
dampen
#bool
, default:False
) –use dampen approximation mixing prior and Kron partially multiplicatively
Methods:
-
__add__
–Add scalar per layer or only scalar to Kronecker factors.
-
__mul__
–Multiply by a scalar by changing the eigenvalues.
-
logdet
–Compute log determinant of the Kronecker factors and sums them up.
-
bmm
–Batched matrix multiplication with the decomposed Kronecker factors.
-
diag
–Extract diagonal of the entire decomposed Kronecker factorization.
-
to_matrix
–Make the Kronecker factorization dense by computing the kronecker product.
Source code in laplace/utils/matrix.py
__add__
#
__add__(deltas: Tensor) -> KronDecomposed
Add scalar per layer or only scalar to Kronecker factors.
Parameters:
-
deltas
#Tensor
) –either same length as
eigenvalues
or scalar.
Returns:
-
kron
(KronDecomposed
) –
Source code in laplace/utils/matrix.py
__mul__
#
__mul__(scalar: Tensor | float) -> KronDecomposed
Multiply by a scalar by changing the eigenvalues.
Same as for the case of Kron
.
Parameters:
Returns:
-
kron
(KronDecomposed
) –
Source code in laplace/utils/matrix.py
logdet
#
Compute log determinant of the Kronecker factors and sums them up.
This corresponds to the log determinant of the entire Hessian approximation.
In contrast to Kron.logdet()
, additive deltas
corresponding to prior
precisions are added.
Returns:
-
logdet
(Tensor
) –
Source code in laplace/utils/matrix.py
_bmm
#
Implementation of bmm
, i.e., self ** exponent @ W
.
Parameters:
Returns:
-
SW
(Tensor
) –result
(batch, classes, params)
Source code in laplace/utils/matrix.py
bmm
#
Batched matrix multiplication with the decomposed Kronecker factors.
This is useful for computing the predictive or a regularization loss.
Compared to Kron.bmm
, a prior can be added here in form of deltas
and the exponent can be other than just 1.
Computes \(H^{exponent} W\).
Parameters:
Returns:
-
SW
(Tensor
) –result
(batch, classes, params)
Source code in laplace/utils/matrix.py
diag
#
Extract diagonal of the entire decomposed Kronecker factorization.
Parameters:
Returns:
-
diag
(Tensor
) –
Source code in laplace/utils/matrix.py
to_matrix
#
Make the Kronecker factorization dense by computing the kronecker product. Warning: this should only be used for testing purposes as it will allocate large amounts of memory for big architectures.
Parameters:
Returns:
-
block_diag
(Tensor
) –
Source code in laplace/utils/matrix.py
SubnetMask
#
SubnetMask(model: Module)
Baseclass for all subnetwork masks in this library (for subnetwork Laplace).
Parameters:
-
model
#Module
) –
Methods:
-
convert_subnet_mask_to_indices
–Converts a subnetwork mask into subnetwork indices.
-
select
–Select the subnetwork mask.
-
get_subnet_mask
–Get the subnetwork mask.
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
convert_subnet_mask_to_indices(subnet_mask: Tensor) -> LongTensor
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
#Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
select(train_loader: DataLoader | None = None) -> LongTensor
Select the subnetwork mask.
Parameters:
-
train_loader
#DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
get_subnet_mask(train_loader: DataLoader) -> Tensor
Get the subnetwork mask.
Parameters:
-
train_loader
#DataLoader
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask
(Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Source code in laplace/utils/subnetmask.py
RandomSubnetMask
#
RandomSubnetMask(model: Module, n_params_subnet: int)
Bases: ScoreBasedSubnetMask
Subnetwork mask of parameters sampled uniformly at random.
Methods:
-
convert_subnet_mask_to_indices
–Converts a subnetwork mask into subnetwork indices.
-
select
–Select the subnetwork mask.
-
get_subnet_mask
–Get the subnetwork mask by (descendingly) ranking parameters based on their scores.
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
convert_subnet_mask_to_indices(subnet_mask: Tensor) -> LongTensor
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
#Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
select(train_loader: DataLoader | None = None) -> LongTensor
Select the subnetwork mask.
Parameters:
-
train_loader
#DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
Get the subnetwork mask by (descendingly) ranking parameters based on their scores.
Source code in laplace/utils/subnetmask.py
LargestMagnitudeSubnetMask
#
LargestMagnitudeSubnetMask(model: Module, n_params_subnet: int)
Bases: ScoreBasedSubnetMask
Subnetwork mask identifying the parameters with the largest magnitude.
Methods:
-
convert_subnet_mask_to_indices
–Converts a subnetwork mask into subnetwork indices.
-
select
–Select the subnetwork mask.
-
get_subnet_mask
–Get the subnetwork mask by (descendingly) ranking parameters based on their scores.
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
convert_subnet_mask_to_indices(subnet_mask: Tensor) -> LongTensor
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
#Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
select(train_loader: DataLoader | None = None) -> LongTensor
Select the subnetwork mask.
Parameters:
-
train_loader
#DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
Get the subnetwork mask by (descendingly) ranking parameters based on their scores.
Source code in laplace/utils/subnetmask.py
LargestVarianceDiagLaplaceSubnetMask
#
LargestVarianceDiagLaplaceSubnetMask(model: Module, n_params_subnet: int, diag_laplace_model: DiagLaplace)
Bases: ScoreBasedSubnetMask
Subnetwork mask identifying the parameters with the largest marginal variances (estimated using a diagonal Laplace approximation over all model parameters).
Parameters:
-
model
#Module
) – -
n_params_subnet
#int
) –number of parameters in the subnetwork (i.e. number of top-scoring parameters to select)
-
diag_laplace_model
#`laplace.baselaplace.DiagLaplace`
) –diagonal Laplace model to use for variance estimation
Methods:
-
convert_subnet_mask_to_indices
–Converts a subnetwork mask into subnetwork indices.
-
select
–Select the subnetwork mask.
-
get_subnet_mask
–Get the subnetwork mask by (descendingly) ranking parameters based on their scores.
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
convert_subnet_mask_to_indices(subnet_mask: Tensor) -> LongTensor
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
#Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
select(train_loader: DataLoader | None = None) -> LongTensor
Select the subnetwork mask.
Parameters:
-
train_loader
#DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
Get the subnetwork mask by (descendingly) ranking parameters based on their scores.
Source code in laplace/utils/subnetmask.py
LargestVarianceSWAGSubnetMask
#
LargestVarianceSWAGSubnetMask(model: Module, n_params_subnet: int, likelihood: Likelihood | str = CLASSIFICATION, swag_n_snapshots: int = 40, swag_snapshot_freq: int = 1, swag_lr: float = 0.01)
Bases: ScoreBasedSubnetMask
Subnetwork mask identifying the parameters with the largest marginal variances (estimated using diagonal SWAG over all model parameters).
Parameters:
-
model
#Module
) – -
n_params_subnet
#int
) –number of parameters in the subnetwork (i.e. number of top-scoring parameters to select)
-
likelihood
#str
, default:CLASSIFICATION
) –'classification' or 'regression'
-
swag_n_snapshots
#int
, default:40
) –number of model snapshots to collect for SWAG
-
swag_snapshot_freq
#int
, default:1
) –SWAG snapshot collection frequency (in epochs)
-
swag_lr
#float
, default:0.01
) –learning rate for SWAG snapshot collection
Methods:
-
convert_subnet_mask_to_indices
–Converts a subnetwork mask into subnetwork indices.
-
select
–Select the subnetwork mask.
-
get_subnet_mask
–Get the subnetwork mask by (descendingly) ranking parameters based on their scores.
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
convert_subnet_mask_to_indices(subnet_mask: Tensor) -> LongTensor
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
#Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
select(train_loader: DataLoader | None = None) -> LongTensor
Select the subnetwork mask.
Parameters:
-
train_loader
#DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
Get the subnetwork mask by (descendingly) ranking parameters based on their scores.
Source code in laplace/utils/subnetmask.py
ParamNameSubnetMask
#
ParamNameSubnetMask(model: Module, parameter_names: list[str])
Bases: SubnetMask
Subnetwork mask corresponding to the specified parameters of the neural network.
Parameters:
-
model
#Module
) – -
parameter_names
#list[str]
) –list of names of the parameters (as in
model.named_parameters()
) that define the subnetwork
Methods:
-
convert_subnet_mask_to_indices
–Converts a subnetwork mask into subnetwork indices.
-
select
–Select the subnetwork mask.
-
get_subnet_mask
–Get the subnetwork mask identifying the specified parameters.
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
convert_subnet_mask_to_indices(subnet_mask: Tensor) -> LongTensor
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
#Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
select(train_loader: DataLoader | None = None) -> LongTensor
Select the subnetwork mask.
Parameters:
-
train_loader
#DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
Get the subnetwork mask identifying the specified parameters.
Source code in laplace/utils/subnetmask.py
ModuleNameSubnetMask
#
Bases: SubnetMask
Subnetwork mask corresponding to the specified modules of the neural network.
Parameters:
-
model
#Module
) – -
parameter_names
#list of names of the modules (as in
model.named_modules()
) that define the subnetwork; the modules cannot have children, i.e. need to be leaf modules
Methods:
-
convert_subnet_mask_to_indices
–Converts a subnetwork mask into subnetwork indices.
-
select
–Select the subnetwork mask.
-
get_subnet_mask
–Get the subnetwork mask identifying the specified modules.
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
convert_subnet_mask_to_indices(subnet_mask: Tensor) -> LongTensor
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
#Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
select(train_loader: DataLoader | None = None) -> LongTensor
Select the subnetwork mask.
Parameters:
-
train_loader
#DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
Get the subnetwork mask identifying the specified modules.
Source code in laplace/utils/subnetmask.py
LastLayerSubnetMask
#
LastLayerSubnetMask(model: Module, last_layer_name: str | None = None)
Bases: ModuleNameSubnetMask
Subnetwork mask corresponding to the last layer of the neural network.
Parameters:
-
model
#Module
) – -
last_layer_name
#str | None
, default:None
) –name of the model's last layer, if None it will be determined automatically
Methods:
-
convert_subnet_mask_to_indices
–Converts a subnetwork mask into subnetwork indices.
-
select
–Select the subnetwork mask.
-
get_subnet_mask
–Get the subnetwork mask identifying the last layer.
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
convert_subnet_mask_to_indices(subnet_mask: Tensor) -> LongTensor
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
#Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
select(train_loader: DataLoader | None = None) -> LongTensor
Select the subnetwork mask.
Parameters:
-
train_loader
#DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
Get the subnetwork mask identifying the last layer.
Source code in laplace/utils/subnetmask.py
RunningNLLMetric
#
RunningNLLMetric(ignore_index: int = -100)
Bases: Metric
NLL metrics that
Parameters:
Methods:
-
update
–Parameters
Source code in laplace/utils/metrics.py
update
#
Parameters:
-
probs
#Tensor
) –probability tensor of shape (..., n_classes)
-
targets
#Tensor
) –integer tensor of shape (...)
Source code in laplace/utils/metrics.py
get_nll
#
validate
#
validate(laplace: BaseLaplace, val_loader: DataLoader, loss: Metric | Callable[[Tensor, Tensor], Tensor] | Callable[[Tensor, Tensor, Tensor], Tensor], pred_type: PredType | str = GLM, link_approx: LinkApprox | str = PROBIT, n_samples: int = 100, dict_key_y: str = 'labels') -> float
Source code in laplace/utils/utils.py
parameters_per_layer
#
Get number of parameters per layer.
Parameters:
-
model
#Module
) –
Returns:
Source code in laplace/utils/utils.py
invsqrt_precision
#
invsqrt_precision(M: Tensor) -> Tensor
Compute M^{-0.5}
as a tridiagonal matrix.
Parameters:
-
M
#Tensor
) –
Returns:
-
M_invsqrt
(Tensor
) –
Source code in laplace/utils/utils.py
kron
#
Computes the Kronecker product between two tensors.
Parameters:
Returns:
-
kron_product
(Tensor
) –
Source code in laplace/utils/utils.py
diagonal_add_scalar
#
Add scalar value value
to diagonal of X
.
Parameters:
Returns:
-
X_add_scalar
(Tensor
) –
Source code in laplace/utils/utils.py
symeig
#
Symetric eigendecomposition avoiding failure cases by adding and removing jitter to the diagonal.
Parameters:
-
M
#Tensor
) –
Returns:
-
L
(Tensor
) –eigenvalues
-
W
(Tensor
) –eigenvectors
Source code in laplace/utils/utils.py
block_diag
#
Compose block-diagonal matrix of individual blocks.
Parameters:
Returns:
-
M
(Tensor
) –
Source code in laplace/utils/utils.py
normal_samples
#
normal_samples(mean: Tensor, var: Tensor, n_samples: int, generator: Generator | None = None) -> Tensor
Produce samples from a batch of Normal distributions either parameterized
by a diagonal or full covariance given by var
.
Parameters:
-
mean
#Tensor
) –(batch_size, output_dim)
-
var
#Tensor
) –(co)variance of the Normal distribution
(batch_size, output_dim, output_dim)
or(batch_size, output_dim)
-
generator
#Generator
, default:None
) –random number generator
Source code in laplace/utils/utils.py
_is_valid_scalar
#
Source code in laplace/utils/utils.py
expand_prior_precision
#
expand_prior_precision(prior_prec: Tensor, model: Module) -> Tensor
Expand prior precision to match the shape of the model parameters.
Parameters:
-
prior_prec
#torch.Tensor 1-dimensional
) –prior precision
-
model
#Module
) –torch model with parameters that are regularized by prior_prec
Returns:
-
expanded_prior_prec
(Tensor
) –expanded prior precision has the same shape as model parameters
Source code in laplace/utils/utils.py
fix_prior_prec_structure
#
fix_prior_prec_structure(prior_prec_init: Tensor, prior_structure: PriorStructure | str, n_layers: int, n_params: int, device: device, dtype: dtype) -> Tensor
Create a tensor of prior precision with the correct shape, depending on the choice of the prior structure type.
Parameters:
-
prior_prec_init
#Tensor
) –the initial prior precision tensor (could be scalar)
-
prior_structure
#PriorStructure | str
) –the choice of the prior structure type
-
n_layers
#int
) – -
n_params
#int
) – -
device
#device
) – -
dtype
#dtype
) –
Returns:
-
correct_prior_precision
(Tensor
) –
Source code in laplace/utils/utils.py
fit_diagonal_swag_var
#
fit_diagonal_swag_var(model: Module, train_loader: DataLoader, criterion: CrossEntropyLoss | MSELoss, n_snapshots_total: int = 40, snapshot_freq: int = 1, lr: float = 0.01, momentum: float = 0.9, weight_decay: float = 0.0003, min_var: float = 1e-30) -> Tensor
Fit diagonal SWAG [1], which estimates marginal variances of model parameters by computing the first and second moment of SGD iterates with a large learning rate.
Implementation partly adapted from: - https://github.com/wjmaddox/swa_gaussian/blob/master/swag/posteriors/swag.py - https://github.com/wjmaddox/swa_gaussian/blob/master/experiments/train/run_swag.py
References
[1] Maddox, W., Garipov, T., Izmailov, P., Vetrov, D., Wilson, AG. A Simple Baseline for Bayesian Uncertainty in Deep Learning. NeurIPS 2019.
Parameters:
-
model
#Module
) – -
train_loader
#DataLoader
) –training data loader to use for snapshot collection
-
criterion
#CrossEntropyLoss or MSELoss
) –loss function to use for snapshot collection
-
n_snapshots_total
#int
, default:40
) –total number of model snapshots to collect
-
snapshot_freq
#int
, default:1
) –snapshot collection frequency (in epochs)
-
lr
#float
, default:0.01
) –SGD learning rate for collecting snapshots
-
momentum
#float
, default:0.9
) –SGD momentum
-
weight_decay
#float
, default:0.0003
) –SGD weight decay
-
min_var
#float
, default:1e-30
) –minimum parameter variance to clamp to (for numerical stability)
Returns:
-
param_variances
(Tensor
) –vector of marginal variances for each model parameter