laplace.utils
#
FeatureExtractor
#
FeatureExtractor(model: Module, last_layer_name: str | None = None, enable_backprop: bool = False, feature_reduction: FeatureReduction | str | None = None)
Bases: Module
Feature extractor for a PyTorch neural network. A wrapper which can return the output of the penultimate layer in addition to the output of the last layer for each forward pass. If the name of the last layer is not known, it can determine it automatically. It assumes that the last layer is linear and that for every forward pass the last layer is the same. If the name of the last layer is known, it can be passed as a parameter at initilization; this is the safest way to use this class. Based on https://gist.github.com/fkodom/27ed045c9051a39102e8bcf4ce31df76.
Parameters:
-
model
(Module
) –PyTorch model
-
last_layer_name
(str
, default:None
) –if the name of the last layer is already known, otherwise it will be determined automatically.
-
enable_backprop
(bool
, default:False
) –whether to enable backprop through the feature extactor to get the gradients of the inputs. Useful for e.g. Bayesian optimization.
-
feature_reduction
(FeatureReduction | str | None
, default:None
) –when the last-layer
features
is a tensor of dim >= 3, this tells how to reduce it into a dim-2 tensor. E.g. in LLMs for non-language modeling problems, the penultultimate output is a tensor of shape(batch_size, seq_len, embd_dim)
. But the last layer maps(batch_size, embd_dim)
to(batch_size, n_classes)
. Note: Make sure that this option faithfully reflects the reduction in the model definition. When inputting a string, available options are{'pick_first', 'pick_last', 'average'}
.
Source code in laplace/utils/feature_extractor.py
forward
#
forward(x: Tensor | MutableMapping[str, Tensor | Any]) -> Tensor
Forward pass. If the last layer is not known yet, it will be determined when this function is called for the first time.
Parameters:
-
x
(torch.Tensor or a dict-like object containing the input tensors
) –one batch of data to use as input for the forward pass
Source code in laplace/utils/feature_extractor.py
forward_with_features
#
forward_with_features(x: Tensor | MutableMapping[str, Tensor | Any]) -> tuple[Tensor, Tensor]
Forward pass which returns the output of the penultimate layer along with the output of the last layer. If the last layer is not known yet, it will be determined when this function is called for the first time.
Parameters:
-
x
(torch.Tensor or a dict-like object containing the input tensors
) –one batch of data to use as input for the forward pass
Source code in laplace/utils/feature_extractor.py
set_last_layer
#
set_last_layer(last_layer_name: str) -> None
Set the last layer of the model by its name. This sets the forward hook to get the output of the penultimate layer.
Parameters:
-
last_layer_name
(str
) –the name of the last layer (fixed in
model.named_modules()
).
Source code in laplace/utils/feature_extractor.py
find_last_layer
#
find_last_layer(x: Tensor | MutableMapping[str, Tensor | Any]) -> Tensor
Automatically determines the last layer of the model with one
forward pass. It assumes that the last layer is the same for every
forward pass and that it is an instance of torch.nn.Linear
.
Might not work with every architecture, but is tested with all PyTorch
torchvision classification models (besides SqueezeNet, which has no
linear last layer).
Parameters:
-
x
(torch.Tensor or dict-like object containing the input tensors
) –one batch of data to use as input for the forward pass
Source code in laplace/utils/feature_extractor.py
Kron
#
Kronecker factored approximate curvature representation for a corresponding
neural network.
Each element in kfacs
is either a tuple or single matrix.
A tuple represents two Kronecker factors \(Q\), and \(H\) and a single element
is just a full block Hessian approximation.
Parameters:
-
kfacs
(list[Iterable[Tensor] | Tensor]
) –each element in the list is a tuple of two Kronecker factors Q, H or a single matrix approximating the Hessian (in case of bias, for example)
Source code in laplace/utils/matrix.py
init_from_model
#
Initialize Kronecker factors based on a models architecture.
Parameters:
-
model
(nn.Module or iterable of parameters, e.g. model.parameters()
) – -
device
(device
) –
Returns:
-
kron
(Kron
) –
Source code in laplace/utils/matrix.py
__add__
#
Add up Kronecker factors self
and other
.
Parameters:
-
other
(Kron
) –
Returns:
-
kron
(Kron
) –
Source code in laplace/utils/matrix.py
__mul__
#
Multiply all Kronecker factors by scalar.
The multiplication is distributed across the number of factors
using pow(scalar, 1 / len(F))
. len(F)
is either 1
or 2
.
Parameters:
-
scalar
((float, Tensor)
) –
Returns:
-
kron
(Kron
) –
Source code in laplace/utils/matrix.py
decompose
#
decompose(damping: bool = False) -> KronDecomposed
Eigendecompose Kronecker factors and turn into KronDecomposed
.
Parameters:
-
damping
(bool
, default:False
) –use damping
Returns:
-
kron_decomposed
(KronDecomposed
) –
Source code in laplace/utils/matrix.py
_bmm
#
Implementation of bmm
which casts the parameters to the right shape.
Parameters:
-
W
(Tensor
) –matrix
(batch, classes, params)
Returns:
-
SW
(Tensor
) –result
(batch, classes, params)
Source code in laplace/utils/matrix.py
bmm
#
bmm(W: Tensor, exponent: float = 1) -> Tensor
Batched matrix multiplication with the Kronecker factors.
If Kron is H
, we compute H @ W
.
This is useful for computing the predictive or a regularization
based on Kronecker factors as in continual learning.
Parameters:
-
W
(Tensor
) –matrix
(batch, classes, params)
-
exponent
(float
, default:1
) –only can be
1
for Kron, requiresKronDecomposed
for other exponent values of the Kronecker factors.
Returns:
-
SW
(Tensor
) –result
(batch, classes, params)
Source code in laplace/utils/matrix.py
logdet
#
Compute log determinant of the Kronecker factors and sums them up. This corresponds to the log determinant of the entire Hessian approximation.
Returns:
-
logdet
(Tensor
) –
Source code in laplace/utils/matrix.py
diag
#
Extract diagonal of the entire Kronecker factorization.
Returns:
-
diag
(Tensor
) –
Source code in laplace/utils/matrix.py
to_matrix
#
Make the Kronecker factorization dense by computing the kronecker product. Warning: this should only be used for testing purposes as it will allocate large amounts of memory for big architectures.
Returns:
-
block_diag
(Tensor
) –
Source code in laplace/utils/matrix.py
KronDecomposed
#
KronDecomposed(eigenvectors: list[tuple[Tensor]], eigenvalues: list[tuple[Tensor]], deltas: Tensor | None = None, damping: bool = False)
Decomposed Kronecker factored approximate curvature representation
for a corresponding neural network.
Each matrix in Kron
is decomposed to obtain KronDecomposed
.
Front-loading decomposition allows cheap repeated computation
of inverses and log determinants.
In contrast to Kron
, we can add scalar or layerwise scalars but
we cannot add other Kron
or KronDecomposed
anymore.
Parameters:
-
eigenvectors
(list[Tuple[Tensor]]
) –eigenvectors corresponding to matrices in a corresponding
Kron
-
eigenvalues
(list[Tuple[Tensor]]
) –eigenvalues corresponding to matrices in a corresponding
Kron
-
deltas
(Tensor
, default:None
) –addend for each group of Kronecker factors representing, for example, a prior precision
-
dampen
(bool
, default:False
) –use dampen approximation mixing prior and Kron partially multiplicatively
Source code in laplace/utils/matrix.py
__add__
#
__add__(deltas: Tensor) -> KronDecomposed
Add scalar per layer or only scalar to Kronecker factors.
Parameters:
-
deltas
(Tensor
) –either same length as
eigenvalues
or scalar.
Returns:
-
kron
(KronDecomposed
) –
Source code in laplace/utils/matrix.py
__mul__
#
__mul__(scalar: Tensor | float) -> KronDecomposed
Multiply by a scalar by changing the eigenvalues.
Same as for the case of Kron
.
Parameters:
-
scalar
(Tensor or float
) –
Returns:
-
kron
(KronDecomposed
) –
Source code in laplace/utils/matrix.py
logdet
#
Compute log determinant of the Kronecker factors and sums them up.
This corresponds to the log determinant of the entire Hessian approximation.
In contrast to Kron.logdet()
, additive deltas
corresponding to prior
precisions are added.
Returns:
-
logdet
(Tensor
) –
Source code in laplace/utils/matrix.py
_bmm
#
_bmm(W: Tensor, exponent: float = -1) -> Tensor
Implementation of bmm
, i.e., self ** exponent @ W
.
Parameters:
-
W
(Tensor
) –matrix
(batch, classes, params)
-
exponent
(float
, default:-1
) –exponent on
self
Returns:
-
SW
(Tensor
) –result
(batch, classes, params)
Source code in laplace/utils/matrix.py
bmm
#
bmm(W: Tensor, exponent: float = -1) -> Tensor
Batched matrix multiplication with the decomposed Kronecker factors.
This is useful for computing the predictive or a regularization loss.
Compared to Kron.bmm
, a prior can be added here in form of deltas
and the exponent can be other than just 1.
Computes \(H^{exponent} W\).
Parameters:
-
W
(Tensor
) –matrix
(batch, classes, params)
-
exponent
(float
, default:-1
) –
Returns:
-
SW
(Tensor
) –result
(batch, classes, params)
Source code in laplace/utils/matrix.py
diag
#
diag(exponent: float = 1) -> Tensor
Extract diagonal of the entire decomposed Kronecker factorization.
Parameters:
-
exponent
(float
, default:1
) –exponent of the Kronecker factorization
Returns:
-
diag
(Tensor
) –
Source code in laplace/utils/matrix.py
to_matrix
#
to_matrix(exponent: float = 1) -> Tensor
Make the Kronecker factorization dense by computing the kronecker product. Warning: this should only be used for testing purposes as it will allocate large amounts of memory for big architectures.
Parameters:
-
exponent
(float
, default:1
) –exponent of the Kronecker factorization
Returns:
-
block_diag
(Tensor
) –
Source code in laplace/utils/matrix.py
SubnetMask
#
Baseclass for all subnetwork masks in this library (for subnetwork Laplace).
Parameters:
-
model
(Module
) –
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
(Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
Select the subnetwork mask.
Parameters:
-
train_loader
(DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
Get the subnetwork mask.
Parameters:
-
train_loader
(DataLoader
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask
(Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Source code in laplace/utils/subnetmask.py
RandomSubnetMask
#
RandomSubnetMask(model: Module, n_params_subnet: int)
Bases: ScoreBasedSubnetMask
Subnetwork mask of parameters sampled uniformly at random.
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
(Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
Select the subnetwork mask.
Parameters:
-
train_loader
(DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
Get the subnetwork mask by (descendingly) ranking parameters based on their scores.
Source code in laplace/utils/subnetmask.py
LargestMagnitudeSubnetMask
#
LargestMagnitudeSubnetMask(model: Module, n_params_subnet: int)
Bases: ScoreBasedSubnetMask
Subnetwork mask identifying the parameters with the largest magnitude.
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
(Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
Select the subnetwork mask.
Parameters:
-
train_loader
(DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
Get the subnetwork mask by (descendingly) ranking parameters based on their scores.
Source code in laplace/utils/subnetmask.py
LargestVarianceDiagLaplaceSubnetMask
#
LargestVarianceDiagLaplaceSubnetMask(model: Module, n_params_subnet: int, diag_laplace_model: DiagLaplace)
Bases: ScoreBasedSubnetMask
Subnetwork mask identifying the parameters with the largest marginal variances (estimated using a diagonal Laplace approximation over all model parameters).
Parameters:
-
model
(Module
) – -
n_params_subnet
(int
) –number of parameters in the subnetwork (i.e. number of top-scoring parameters to select)
-
diag_laplace_model
(`laplace.baselaplace.DiagLaplace`
) –diagonal Laplace model to use for variance estimation
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
(Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
Select the subnetwork mask.
Parameters:
-
train_loader
(DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
Get the subnetwork mask by (descendingly) ranking parameters based on their scores.
Source code in laplace/utils/subnetmask.py
LargestVarianceSWAGSubnetMask
#
LargestVarianceSWAGSubnetMask(model: Module, n_params_subnet: int, likelihood: Likelihood | str = Likelihood.CLASSIFICATION, swag_n_snapshots: int = 40, swag_snapshot_freq: int = 1, swag_lr: float = 0.01)
Bases: ScoreBasedSubnetMask
Subnetwork mask identifying the parameters with the largest marginal variances (estimated using diagonal SWAG over all model parameters).
Parameters:
-
model
(Module
) – -
n_params_subnet
(int
) –number of parameters in the subnetwork (i.e. number of top-scoring parameters to select)
-
likelihood
(str
, default:CLASSIFICATION
) –'classification' or 'regression'
-
swag_n_snapshots
(int
, default:40
) –number of model snapshots to collect for SWAG
-
swag_snapshot_freq
(int
, default:1
) –SWAG snapshot collection frequency (in epochs)
-
swag_lr
(float
, default:0.01
) –learning rate for SWAG snapshot collection
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
(Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
Select the subnetwork mask.
Parameters:
-
train_loader
(DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
Get the subnetwork mask by (descendingly) ranking parameters based on their scores.
Source code in laplace/utils/subnetmask.py
ParamNameSubnetMask
#
Bases: SubnetMask
Subnetwork mask corresponding to the specified parameters of the neural network.
Parameters:
-
model
(Module
) – -
parameter_names
(list[str]
) –list of names of the parameters (as in
model.named_parameters()
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
(Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
Select the subnetwork mask.
Parameters:
-
train_loader
(DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
Get the subnetwork mask identifying the specified parameters.
Source code in laplace/utils/subnetmask.py
ModuleNameSubnetMask
#
Bases: SubnetMask
Subnetwork mask corresponding to the specified modules of the neural network.
Parameters:
-
model
(Module
) – -
parameter_names
–list of names of the modules (as in
model.named_modules()
) that define the subnetwork; the modules cannot have children, i.e. need to be leaf modules
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
(Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
Select the subnetwork mask.
Parameters:
-
train_loader
(DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
Get the subnetwork mask identifying the specified modules.
Source code in laplace/utils/subnetmask.py
LastLayerSubnetMask
#
LastLayerSubnetMask(model: Module, last_layer_name: str | None = None)
Bases: ModuleNameSubnetMask
Subnetwork mask corresponding to the last layer of the neural network.
Parameters:
-
model
(Module
) – -
last_layer_name
(str | None
, default:None
) –name of the model's last layer, if None it will be determined automatically
Source code in laplace/utils/subnetmask.py
convert_subnet_mask_to_indices
#
Converts a subnetwork mask into subnetwork indices.
Parameters:
-
subnet_mask
(Tensor
) –a binary vector of size (n_params) where 1s locate the subnetwork parameters within the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
)
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
select
#
Select the subnetwork mask.
Parameters:
-
train_loader
(DataLoader
, default:None
) –each iterate is a training batch (X, y);
train_loader.dataset
needs to be set to access \(N\), size of the data set
Returns:
-
subnet_mask_indices
(LongTensor
) –a vector of indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork
Source code in laplace/utils/subnetmask.py
get_subnet_mask
#
Get the subnetwork mask identifying the last layer.
Source code in laplace/utils/subnetmask.py
RunningNLLMetric
#
RunningNLLMetric(ignore_index: int = -100)
Bases: Metric
NLL metrics that
Parameters:
-
ignore_index
(int
, default:-100
) –which class label to ignore when computing the NLL loss
Source code in laplace/utils/metrics.py
update
#
Parameters:
-
probs
(Tensor
) –probability tensor of shape (..., n_classes)
-
targets
(Tensor
) –integer tensor of shape (...)
Source code in laplace/utils/metrics.py
get_nll
#
validate
#
validate(laplace: BaseLaplace, val_loader: DataLoader, loss: Metric | Callable[[Tensor, Tensor], Tensor] | Callable[[Tensor, Tensor, Tensor], Tensor], pred_type: PredType | str = PredType.GLM, link_approx: LinkApprox | str = LinkApprox.PROBIT, n_samples: int = 100, dict_key_y: str = 'labels') -> float
Source code in laplace/utils/utils.py
parameters_per_layer
#
Get number of parameters per layer.
Parameters:
-
model
(Module
) –
Returns:
Source code in laplace/utils/utils.py
invsqrt_precision
#
Compute M^{-0.5}
as a tridiagonal matrix.
Parameters:
-
M
(Tensor
) –
Returns:
-
M_invsqrt
(Tensor
) –
Source code in laplace/utils/utils.py
kron
#
Computes the Kronecker product between two tensors.
Parameters:
-
t1
(Tensor
) – -
t2
(Tensor
) –
Returns:
-
kron_product
(Tensor
) –
Source code in laplace/utils/utils.py
diagonal_add_scalar
#
Add scalar value value
to diagonal of X
.
Parameters:
-
X
(Tensor
) – -
value
(Tensor or float
) –
Returns:
-
X_add_scalar
(Tensor
) –
Source code in laplace/utils/utils.py
symeig
#
symeig(M: Tensor) -> tuple[Tensor, Tensor]
Symetric eigendecomposition avoiding failure cases by adding and removing jitter to the diagonal.
Parameters:
-
M
(Tensor
) –
Returns:
-
L
(Tensor
) –eigenvalues
-
W
(Tensor
) –eigenvectors
Source code in laplace/utils/utils.py
block_diag
#
block_diag(blocks: list[Tensor]) -> Tensor
Compose block-diagonal matrix of individual blocks.
Parameters:
-
blocks
(list[Tensor]
) –
Returns:
-
M
(Tensor
) –
Source code in laplace/utils/utils.py
normal_samples
#
normal_samples(mean: Tensor, var: Tensor, n_samples: int, generator: Generator | None = None) -> Tensor
Produce samples from a batch of Normal distributions either parameterized
by a diagonal or full covariance given by var
.
Parameters:
-
mean
(Tensor
) –(batch_size, output_dim)
-
var
(Tensor
) –(co)variance of the Normal distribution
(batch_size, output_dim, output_dim)
or(batch_size, output_dim)
-
generator
(Generator
, default:None
) –random number generator
Source code in laplace/utils/utils.py
_is_valid_scalar
#
Source code in laplace/utils/utils.py
expand_prior_precision
#
Expand prior precision to match the shape of the model parameters.
Parameters:
-
prior_prec
(torch.Tensor 1-dimensional
) –prior precision
-
model
(Module
) –torch model with parameters that are regularized by prior_prec
Returns:
-
expanded_prior_prec
(Tensor
) –expanded prior precision has the same shape as model parameters
Source code in laplace/utils/utils.py
fix_prior_prec_structure
#
fix_prior_prec_structure(prior_prec_init: Tensor, prior_structure: PriorStructure | str, n_layers: int, n_params: int, device: device) -> Tensor
Create a tensor of prior precision with the correct shape, depending on the choice of the prior structure type.
Parameters:
-
prior_prec_init
(Tensor
) –the initial prior precision tensor (could be scalar)
-
prior_structure
(PriorStructure | str
) –the choice of the prior structure type
-
n_layers
(int
) – -
n_params
(int
) – -
device
(device
) –
Returns:
-
correct_prior_precision
(Tensor
) –
Source code in laplace/utils/utils.py
fit_diagonal_swag_var
#
fit_diagonal_swag_var(model: Module, train_loader: DataLoader, criterion: CrossEntropyLoss | MSELoss, n_snapshots_total: int = 40, snapshot_freq: int = 1, lr: float = 0.01, momentum: float = 0.9, weight_decay: float = 0.0003, min_var: float = 1e-30) -> Tensor
Fit diagonal SWAG [1], which estimates marginal variances of model parameters by computing the first and second moment of SGD iterates with a large learning rate.
Implementation partly adapted from: - https://github.com/wjmaddox/swa_gaussian/blob/master/swag/posteriors/swag.py - https://github.com/wjmaddox/swa_gaussian/blob/master/experiments/train/run_swag.py
References
[1] Maddox, W., Garipov, T., Izmailov, P., Vetrov, D., Wilson, AG. A Simple Baseline for Bayesian Uncertainty in Deep Learning. NeurIPS 2019.
Parameters:
-
model
(Module
) – -
train_loader
(DataLoader
) –training data loader to use for snapshot collection
-
criterion
(CrossEntropyLoss or MSELoss
) –loss function to use for snapshot collection
-
n_snapshots_total
(int
, default:40
) –total number of model snapshots to collect
-
snapshot_freq
(int
, default:1
) –snapshot collection frequency (in epochs)
-
lr
(float
, default:0.01
) –SGD learning rate for collecting snapshots
-
momentum
(float
, default:0.9
) –SGD momentum
-
weight_decay
(float
, default:0.0003
) –SGD weight decay
-
min_var
(float
, default:1e-30
) –minimum parameter variance to clamp to (for numerical stability)
Returns:
-
param_variances
(Tensor
) –vector of marginal variances for each model parameter