laplace.lllaplace
#
Classes:
-
LLLaplace
–Baseclass for all last-layer Laplace approximations in this library.
-
DiagLLLaplace
–Last-layer Laplace approximation with diagonal log likelihood Hessian approximation
-
KronLLLaplace
–Last-layer Laplace approximation with Kronecker factored log likelihood Hessian approximation
-
FullLLLaplace
–Last-layer Laplace approximation with full, i.e., dense, log likelihood Hessian approximation
LLLaplace
#
LLLaplace(model: Module, likelihood: Likelihood | str, sigma_noise: float | Tensor = 1.0, prior_precision: float | Tensor = 1.0, prior_mean: float | Tensor = 0.0, temperature: float = 1.0, enable_backprop: bool = False, feature_reduction: FeatureReduction | str | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels', backend: type[CurvatureInterface] | None = None, last_layer_name: str | None = None, backend_kwargs: dict[str, Any] | None = None, asdl_fisher_kwargs: dict[str, Any] | None = None)
Bases: ParametricLaplace
Baseclass for all last-layer Laplace approximations in this library.
Subclasses specify the structure of the Hessian approximation.
See BaseLaplace
for the full interface.
A Laplace approximation is represented by a MAP which is given by the
model
parameter and a posterior precision or covariance specifying
a Gaussian distribution \(\mathcal{N}(\theta_{MAP}, P^{-1})\).
Here, only the parameters of the last layer of the neural network
are treated probabilistically.
The goal of this class is to compute the posterior precision \(P\)
which sums as
Every subclass implements different approximations to the log likelihood Hessians, for example, a diagonal one. The prior is assumed to be Gaussian and therefore we have a simple form for \(\nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}} = P_0 \). In particular, we assume a scalar or diagonal prior precision so that in all cases \(P_0 = \textrm{diag}(p_0)\) and the structure of \(p_0\) can be varied.
Parameters:
-
model
#torch.nn.Module or `laplace.utils.feature_extractor.FeatureExtractor`
) – -
likelihood
#Likelihood or {'classification', 'regression'}
) –determines the log likelihood Hessian approximation
-
sigma_noise
#Tensor or float
, default:1
) –observation noise for the regression setting; must be 1 for classification
-
prior_precision
#Tensor or float
, default:1
) –prior precision of a Gaussian prior (= weight decay); can be scalar, per-layer, or diagonal in the most general case
-
prior_mean
#Tensor or float
, default:0
) –prior mean of a Gaussian prior, useful for continual learning
-
temperature
#float
, default:1
) –temperature of the likelihood; lower temperature leads to more concentrated posterior and vice versa.
-
enable_backprop
#bool
, default:False
) –whether to enable backprop to the input
x
through the Laplace predictive. Useful for e.g. Bayesian optimization. -
feature_reduction
#FeatureReduction | str | None
, default:None
) –when the last-layer
features
is a tensor of dim >= 3, this tells how to reduce it into a dim-2 tensor. E.g. in LLMs for non-language modeling problems, the penultultimate output is a tensor of shape(batch_size, seq_len, embd_dim)
. But the last layer maps(batch_size, embd_dim)
to(batch_size, n_classes)
. Note: Make sure that this option faithfully reflects the reduction in the model definition. When inputting a string, available options are{'pick_first', 'pick_last', 'average'}
. -
dict_key_x
#str
, default:'input_ids'
) –The dictionary key under which the input tensor
x
is stored. Only has effect when the model takes aMutableMapping
as the input. Useful for Huggingface LLM models. -
dict_key_y
#str
, default:'labels'
) –The dictionary key under which the target tensor
y
is stored. Only has effect when the model takes aMutableMapping
as the input. Useful for Huggingface LLM models. -
backend
#subclasses of `laplace.curvature.CurvatureInterface`
, default:None
) –backend for access to curvature/Hessian approximations
-
last_layer_name
#str | None
, default:None
) –name of the model's last layer, if None it will be determined automatically
-
backend_kwargs
#dict
, default:None
) –arguments passed to the backend on initialization, for example to set the number of MC samples for stochastic approximations.
Methods:
-
log_marginal_likelihood
–Compute the Laplace approximation to the log marginal likelihood subject
-
__call__
–Compute the posterior predictive on input data
x
. -
square_norm
–Compute the square norm under post. Precision with
value-self.mean
as 𝛥: -
log_prob
–Compute the log probability under the (current) Laplace approximation.
-
functional_samples
–Sample from the function-space posterior on input data
x
. -
predictive_samples
–Sample from the posterior predictive on input data
x
. I.e., the respective -
functional_variance
–Compute functional variance for the
'glm'
predictive: -
functional_covariance
–Compute functional covariance for the
'glm'
predictive: -
sample
–Sample from the Laplace posterior approximation, i.e.,
-
fit
–Fit the local Laplace approximation at the parameters of the model.
-
functional_variance_fast
–Should be overriden if there exists a trick to make this fast!
Attributes:
-
log_likelihood
(Tensor
) –Compute log likelihood on the training data after
.fit()
has been called. -
scatter
(Tensor
) –Computes the scatter, a term of the log marginal likelihood that
-
log_det_prior_precision
(Tensor
) –Compute log determinant of the prior precision
-
log_det_posterior_precision
(Tensor
) –Compute log determinant of the posterior precision
-
log_det_ratio
(Tensor
) –Compute the log determinant ratio, a part of the log marginal likelihood.
-
posterior_precision
(Tensor
) –Compute or return the posterior precision \(P\).
-
prior_precision_diag
(Tensor
) –Obtain the diagonal prior precision \(p_0\) constructed from either
Source code in laplace/lllaplace.py
log_likelihood
#
Compute log likelihood on the training data after .fit()
has been called.
The log likelihood is computed on-demand based on the loss and, for example,
the observation noise which makes it differentiable in the latter for
iterative updates.
Returns:
-
log_likelihood
(Tensor
) –
scatter
#
Computes the scatter, a term of the log marginal likelihood that
corresponds to L-2 regularization:
scatter
= \((\theta_{MAP} - \mu_0)^{T} P_0 (\theta_{MAP} - \mu_0) \).
Returns:
-
scatter
(Tensor
) –
log_det_prior_precision
#
Compute log determinant of the prior precision \(\log \det P_0\)
Returns:
-
log_det
(Tensor
) –
log_det_posterior_precision
#
Compute log determinant of the posterior precision \(\log \det P\) which depends on the subclasses structure used for the Hessian approximation.
Returns:
-
log_det
(Tensor
) –
log_det_ratio
#
Compute the log determinant ratio, a part of the log marginal likelihood.
Returns:
-
log_det_ratio
(Tensor
) –
posterior_precision
#
Compute or return the posterior precision \(P\).
Returns:
-
posterior_prec
(Tensor
) –
prior_precision_diag
#
Obtain the diagonal prior precision \(p_0\) constructed from either a scalar or diagonal prior precision.
Returns:
-
prior_precision_diag
(Tensor
) –
log_marginal_likelihood
#
log_marginal_likelihood(prior_precision: Tensor | None = None, sigma_noise: Tensor | None = None) -> Tensor
Compute the Laplace approximation to the log marginal likelihood subject
to specific Hessian approximations that subclasses implement.
Requires that the Laplace approximation has been fit before.
The resulting torch.Tensor is differentiable in prior_precision
and
sigma_noise
if these have gradients enabled.
By passing prior_precision
or sigma_noise
, the current value is
overwritten. This is useful for iterating on the log marginal likelihood.
Parameters:
-
prior_precision
#Tensor
, default:None
) –prior precision if should be changed from current
prior_precision
value -
sigma_noise
#Tensor
, default:None
) –observation noise standard deviation if should be changed
Returns:
-
log_marglik
(Tensor
) –
Source code in laplace/baselaplace.py
__call__
#
__call__(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = GLM, joint: bool = False, link_approx: LinkApprox | str = PROBIT, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None, fitting: bool = False, **model_kwargs: dict[str, Any]) -> Tensor | tuple[Tensor, Tensor]
Compute the posterior predictive on input data x
.
Parameters:
-
x
#Tensor or MutableMapping
) –(batch_size, input_shape)
if tensor. If MutableMapping, must contain the said tensor. -
pred_type
#('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here. When Laplace is done only on subset of parameters (i.e. some grad are disabled), only
nn
predictive is supported. -
link_approx
#('mc', 'probit', 'bridge', 'bridge_norm')
, default:'mc'
) –how to approximate the classification link function for the
'glm'
. Forpred_type='nn'
, only 'mc' is possible. -
joint
#bool
, default:False
) –Whether to output a joint predictive distribution in regression with
pred_type='glm'
. If set toTrue
, the predictive distribution has the same form as GP posterior, i.e. N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). IfFalse
, then only outputs the marginal predictive distribution. Only available for regression and GLM predictive. -
n_samples
#int
, default:100
) –number of samples for
link_approx='mc'
. -
diagonal_output
#bool
, default:False
) –whether to use a diagonalized posterior predictive on the outputs. Only works for
pred_type='glm'
whenjoint=False
in regression. In the case of last-layer Laplace with a diagonal or Kron Hessian, setting this toTrue
makes computation much(!) faster for large number of outputs. -
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used).
-
fitting
#bool
, default:False
) –whether or not this predictive call is done during fitting. Only useful for reward modeling: the likelihood is set to
"regression"
whenFalse
and"classification"
whenTrue
.
Returns:
-
predictive
(Tensor or tuple[Tensor]
) –For
likelihood='classification'
, a torch.Tensor is returned with a distribution over classes (similar to a Softmax). Forlikelihood='regression'
, a tuple of torch.Tensor is returned with the mean and the predictive variance. Forlikelihood='regression'
andjoint=True
, a tuple of torch.Tensor is returned with the mean and the predictive covariance.
Source code in laplace/baselaplace.py
1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 |
|
_glm_forward_call
#
_glm_forward_call(x: Tensor | MutableMapping, likelihood: Likelihood | str, joint: bool = False, link_approx: LinkApprox | str = PROBIT, n_samples: int = 100, diagonal_output: bool = False) -> Tensor | tuple[Tensor, Tensor]
Compute the posterior predictive on input data x
for "glm" pred type.
Parameters:
-
x
#Tensor or MutableMapping
) –(batch_size, input_shape)
if tensor. If MutableMapping, must contain the said tensor. -
likelihood
#Likelihood or str in {'classification', 'regression', 'reward_modeling'}
) –determines the log likelihood Hessian approximation.
-
link_approx
#('mc', 'probit', 'bridge', 'bridge_norm')
, default:'mc'
) –how to approximate the classification link function for the
'glm'
. Forpred_type='nn'
, only 'mc' is possible. -
joint
#bool
, default:False
) –Whether to output a joint predictive distribution in regression with
pred_type='glm'
. If set toTrue
, the predictive distribution has the same form as GP posterior, i.e. N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). IfFalse
, then only outputs the marginal predictive distribution. Only available for regression and GLM predictive. -
n_samples
#int
, default:100
) –number of samples for
link_approx='mc'
. -
diagonal_output
#bool
, default:False
) –whether to use a diagonalized posterior predictive on the outputs. Only works for
pred_type='glm'
andlink_approx='mc'
.
Returns:
-
predictive
(Tensor or tuple[Tensor]
) –For
likelihood='classification'
, a torch.Tensor is returned with a distribution over classes (similar to a Softmax). Forlikelihood='regression'
, a tuple of torch.Tensor is returned with the mean and the predictive variance. Forlikelihood='regression'
andjoint=True
, a tuple of torch.Tensor is returned with the mean and the predictive covariance.
Source code in laplace/baselaplace.py
598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 |
|
_glm_functional_samples
#
_glm_functional_samples(f_mu: Tensor, f_var: Tensor, n_samples: int, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior functional on input data x
using "glm" prediction
type.
Parameters:
-
f_mu
#Tensor or MutableMapping
) –glm predictive mean
(batch_size, output_shape)
-
f_var
#Tensor or MutableMapping
) –glm predictive covariances
(batch_size, output_shape, output_shape)
-
n_samples
#int
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs.
-
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
_glm_predictive_samples
#
_glm_predictive_samples(f_mu: Tensor, f_var: Tensor, n_samples: int, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior predictive on input data x
using "glm" prediction
type. I.e., the inverse-link function correponding to the likelihood is applied
on top of the functional sample.
Parameters:
-
f_mu
#Tensor or MutableMapping
) –glm predictive mean
(batch_size, output_shape)
-
f_var
#Tensor or MutableMapping
) –glm predictive covariances
(batch_size, output_shape, output_shape)
-
n_samples
#int
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs.
-
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
square_norm
#
Compute the square norm under post. Precision with value-self.mean
as 𝛥:
Returns:
-
square_form
–
log_prob
#
log_prob(value: Tensor, normalized: bool = True) -> Tensor
Compute the log probability under the (current) Laplace approximation.
Parameters:
-
value
#Tensor
) – -
normalized
#bool
, default:True
) –whether to return log of a properly normalized Gaussian or just the terms that depend on
value
.
Returns:
-
log_prob
(Tensor
) –
Source code in laplace/baselaplace.py
functional_samples
#
functional_samples(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = GLM, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the function-space posterior on input data x
.
Can be used, for example, for Thompson sampling or to compute an arbitrary
expectation.
Parameters:
-
x
#Tensor or MutableMapping
) –input data
(batch_size, input_shape)
-
pred_type
#('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here.
-
n_samples
#int
, default:100
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs. Only applies when
pred_type='glm'
. -
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
predictive_samples
#
predictive_samples(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = GLM, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior predictive on input data x
. I.e., the respective
inverse-link function (e.g. softmax) is applied on top of the functional
sample.
Parameters:
-
x
#Tensor or MutableMapping
) –input data
(batch_size, input_shape)
-
pred_type
#('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here.
-
n_samples
#int
, default:100
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs. Only applies when
pred_type='glm'
. -
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
functional_variance
#
functional_variance(Js: Tensor) -> Tensor
Compute functional variance for the 'glm'
predictive:
f_var[i] = Js[i] @ P.inv() @ Js[i].T
, which is a output x output
predictive covariance matrix.
Mathematically, we have for a single Jacobian
\(\mathcal{J} = \nabla_\theta f(x;\theta)\vert_{\theta_{MAP}}\)
the output covariance matrix
\( \mathcal{J} P^{-1} \mathcal{J}^T \).
Parameters:
-
Js
#Tensor
) –Jacobians of model output wrt parameters
(batch, outputs, parameters)
Returns:
-
f_var
(Tensor
) –output covariance
(batch, outputs, outputs)
Source code in laplace/baselaplace.py
functional_covariance
#
functional_covariance(Js: Tensor) -> Tensor
Compute functional covariance for the 'glm'
predictive:
f_cov = Js @ P.inv() @ Js.T
, which is a batchoutput x batchoutput
predictive covariance matrix.
This emulates the GP posterior covariance N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). Useful for joint predictions, such as in batched Bayesian optimization.
Parameters:
-
Js
#Tensor
) –Jacobians of model output wrt parameters
(batch*outputs, parameters)
Returns:
-
f_cov
(Tensor
) –output covariance
(batch*outputs, batch*outputs)
Source code in laplace/baselaplace.py
sample
#
Sample from the Laplace posterior approximation, i.e., \( \theta \sim \mathcal{N}(\theta_{MAP}, P^{-1})\).
Parameters:
-
n_samples
#int
, default:100
) –number of samples
-
generator
#Generator
, default:None
) –random number generator to control the samples
Returns:
-
samples
(Tensor
) –
Source code in laplace/baselaplace.py
fit
#
fit(train_loader: DataLoader, override: bool = True, progress_bar: bool = False) -> None
Fit the local Laplace approximation at the parameters of the model.
Parameters:
-
train_loader
#DataLoader
) –each iterate is a training batch, either
(X, y)
tensors or a dict-like object containing keys as expressed byself.dict_key_x
andself.dict_key_y
.train_loader.dataset
needs to be set to access \(N\), size of the data set. -
override
#bool
, default:True
) –whether to initialize H, loss, and n_data again; setting to False is useful for online learning settings to accumulate a sequential posterior approximation.
-
progress_bar
#bool
, default:False
) –
Source code in laplace/lllaplace.py
functional_variance_fast
#
functional_variance_fast(X)
Should be overriden if there exists a trick to make this fast!
Parameters:
-
X
#
Returns:
-
f_var_diag
(torch.Tensor of shape (batch_size, num_outputs)
) –Corresponding to the diagonal of the covariance matrix of the outputs
Source code in laplace/lllaplace.py
DiagLLLaplace
#
DiagLLLaplace(model: Module, likelihood: Likelihood | str, sigma_noise: float | Tensor = 1.0, prior_precision: float | Tensor = 1.0, prior_mean: float | Tensor = 0.0, temperature: float = 1.0, enable_backprop: bool = False, feature_reduction: FeatureReduction | str | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels', backend: type[CurvatureInterface] | None = None, last_layer_name: str | None = None, backend_kwargs: dict[str, Any] | None = None, asdl_fisher_kwargs: dict[str, Any] | None = None)
Bases: LLLaplace
, DiagLaplace
Last-layer Laplace approximation with diagonal log likelihood Hessian approximation
and hence posterior precision.
Mathematically, we have \(P \approx \textrm{diag}(P)\).
See DiagLaplace
, LLLaplace
, and BaseLaplace
for the full interface.
Methods:
-
fit
–Fit the local Laplace approximation at the parameters of the model.
-
log_marginal_likelihood
–Compute the Laplace approximation to the log marginal likelihood subject
-
__call__
–Compute the posterior predictive on input data
x
. -
log_prob
–Compute the log probability under the (current) Laplace approximation.
-
functional_samples
–Sample from the function-space posterior on input data
x
. -
predictive_samples
–Sample from the posterior predictive on input data
x
. I.e., the respective
Attributes:
-
log_likelihood
(Tensor
) –Compute log likelihood on the training data after
.fit()
has been called. -
prior_precision_diag
(Tensor
) –Obtain the diagonal prior precision \(p_0\) constructed from either
-
scatter
(Tensor
) –Computes the scatter, a term of the log marginal likelihood that
-
log_det_prior_precision
(Tensor
) –Compute log determinant of the prior precision
-
log_det_ratio
(Tensor
) –Compute the log determinant ratio, a part of the log marginal likelihood.
-
posterior_precision
(Tensor
) –Diagonal posterior precision \(p\).
-
posterior_scale
(Tensor
) –Diagonal posterior scale \(\sqrt{p^{-1}}\).
-
posterior_variance
(Tensor
) –Diagonal posterior variance \(p^{-1}\).
Source code in laplace/lllaplace.py
log_likelihood
#
Compute log likelihood on the training data after .fit()
has been called.
The log likelihood is computed on-demand based on the loss and, for example,
the observation noise which makes it differentiable in the latter for
iterative updates.
Returns:
-
log_likelihood
(Tensor
) –
prior_precision_diag
#
Obtain the diagonal prior precision \(p_0\) constructed from either a scalar or diagonal prior precision.
Returns:
-
prior_precision_diag
(Tensor
) –
scatter
#
Computes the scatter, a term of the log marginal likelihood that
corresponds to L-2 regularization:
scatter
= \((\theta_{MAP} - \mu_0)^{T} P_0 (\theta_{MAP} - \mu_0) \).
Returns:
-
scatter
(Tensor
) –
log_det_prior_precision
#
Compute log determinant of the prior precision \(\log \det P_0\)
Returns:
-
log_det
(Tensor
) –
log_det_ratio
#
Compute the log determinant ratio, a part of the log marginal likelihood.
Returns:
-
log_det_ratio
(Tensor
) –
posterior_precision
#
Diagonal posterior precision \(p\).
Returns:
-
precision
(tensor
) –(parameters)
posterior_scale
#
Diagonal posterior scale \(\sqrt{p^{-1}}\).
Returns:
-
precision
(tensor
) –(parameters)
posterior_variance
#
Diagonal posterior variance \(p^{-1}\).
Returns:
-
precision
(tensor
) –(parameters)
fit
#
fit(train_loader: DataLoader, override: bool = True, progress_bar: bool = False) -> None
Fit the local Laplace approximation at the parameters of the model.
Parameters:
-
train_loader
#DataLoader
) –each iterate is a training batch, either
(X, y)
tensors or a dict-like object containing keys as expressed byself.dict_key_x
andself.dict_key_y
.train_loader.dataset
needs to be set to access \(N\), size of the data set. -
override
#bool
, default:True
) –whether to initialize H, loss, and n_data again; setting to False is useful for online learning settings to accumulate a sequential posterior approximation.
-
progress_bar
#bool
, default:False
) –
Source code in laplace/lllaplace.py
log_marginal_likelihood
#
log_marginal_likelihood(prior_precision: Tensor | None = None, sigma_noise: Tensor | None = None) -> Tensor
Compute the Laplace approximation to the log marginal likelihood subject
to specific Hessian approximations that subclasses implement.
Requires that the Laplace approximation has been fit before.
The resulting torch.Tensor is differentiable in prior_precision
and
sigma_noise
if these have gradients enabled.
By passing prior_precision
or sigma_noise
, the current value is
overwritten. This is useful for iterating on the log marginal likelihood.
Parameters:
-
prior_precision
#Tensor
, default:None
) –prior precision if should be changed from current
prior_precision
value -
sigma_noise
#Tensor
, default:None
) –observation noise standard deviation if should be changed
Returns:
-
log_marglik
(Tensor
) –
Source code in laplace/baselaplace.py
__call__
#
__call__(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = GLM, joint: bool = False, link_approx: LinkApprox | str = PROBIT, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None, fitting: bool = False, **model_kwargs: dict[str, Any]) -> Tensor | tuple[Tensor, Tensor]
Compute the posterior predictive on input data x
.
Parameters:
-
x
#Tensor or MutableMapping
) –(batch_size, input_shape)
if tensor. If MutableMapping, must contain the said tensor. -
pred_type
#('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here. When Laplace is done only on subset of parameters (i.e. some grad are disabled), only
nn
predictive is supported. -
link_approx
#('mc', 'probit', 'bridge', 'bridge_norm')
, default:'mc'
) –how to approximate the classification link function for the
'glm'
. Forpred_type='nn'
, only 'mc' is possible. -
joint
#bool
, default:False
) –Whether to output a joint predictive distribution in regression with
pred_type='glm'
. If set toTrue
, the predictive distribution has the same form as GP posterior, i.e. N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). IfFalse
, then only outputs the marginal predictive distribution. Only available for regression and GLM predictive. -
n_samples
#int
, default:100
) –number of samples for
link_approx='mc'
. -
diagonal_output
#bool
, default:False
) –whether to use a diagonalized posterior predictive on the outputs. Only works for
pred_type='glm'
whenjoint=False
in regression. In the case of last-layer Laplace with a diagonal or Kron Hessian, setting this toTrue
makes computation much(!) faster for large number of outputs. -
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used).
-
fitting
#bool
, default:False
) –whether or not this predictive call is done during fitting. Only useful for reward modeling: the likelihood is set to
"regression"
whenFalse
and"classification"
whenTrue
.
Returns:
-
predictive
(Tensor or tuple[Tensor]
) –For
likelihood='classification'
, a torch.Tensor is returned with a distribution over classes (similar to a Softmax). Forlikelihood='regression'
, a tuple of torch.Tensor is returned with the mean and the predictive variance. Forlikelihood='regression'
andjoint=True
, a tuple of torch.Tensor is returned with the mean and the predictive covariance.
Source code in laplace/baselaplace.py
1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 |
|
_glm_forward_call
#
_glm_forward_call(x: Tensor | MutableMapping, likelihood: Likelihood | str, joint: bool = False, link_approx: LinkApprox | str = PROBIT, n_samples: int = 100, diagonal_output: bool = False) -> Tensor | tuple[Tensor, Tensor]
Compute the posterior predictive on input data x
for "glm" pred type.
Parameters:
-
x
#Tensor or MutableMapping
) –(batch_size, input_shape)
if tensor. If MutableMapping, must contain the said tensor. -
likelihood
#Likelihood or str in {'classification', 'regression', 'reward_modeling'}
) –determines the log likelihood Hessian approximation.
-
link_approx
#('mc', 'probit', 'bridge', 'bridge_norm')
, default:'mc'
) –how to approximate the classification link function for the
'glm'
. Forpred_type='nn'
, only 'mc' is possible. -
joint
#bool
, default:False
) –Whether to output a joint predictive distribution in regression with
pred_type='glm'
. If set toTrue
, the predictive distribution has the same form as GP posterior, i.e. N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). IfFalse
, then only outputs the marginal predictive distribution. Only available for regression and GLM predictive. -
n_samples
#int
, default:100
) –number of samples for
link_approx='mc'
. -
diagonal_output
#bool
, default:False
) –whether to use a diagonalized posterior predictive on the outputs. Only works for
pred_type='glm'
andlink_approx='mc'
.
Returns:
-
predictive
(Tensor or tuple[Tensor]
) –For
likelihood='classification'
, a torch.Tensor is returned with a distribution over classes (similar to a Softmax). Forlikelihood='regression'
, a tuple of torch.Tensor is returned with the mean and the predictive variance. Forlikelihood='regression'
andjoint=True
, a tuple of torch.Tensor is returned with the mean and the predictive covariance.
Source code in laplace/baselaplace.py
598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 |
|
_glm_functional_samples
#
_glm_functional_samples(f_mu: Tensor, f_var: Tensor, n_samples: int, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior functional on input data x
using "glm" prediction
type.
Parameters:
-
f_mu
#Tensor or MutableMapping
) –glm predictive mean
(batch_size, output_shape)
-
f_var
#Tensor or MutableMapping
) –glm predictive covariances
(batch_size, output_shape, output_shape)
-
n_samples
#int
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs.
-
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
_glm_predictive_samples
#
_glm_predictive_samples(f_mu: Tensor, f_var: Tensor, n_samples: int, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior predictive on input data x
using "glm" prediction
type. I.e., the inverse-link function correponding to the likelihood is applied
on top of the functional sample.
Parameters:
-
f_mu
#Tensor or MutableMapping
) –glm predictive mean
(batch_size, output_shape)
-
f_var
#Tensor or MutableMapping
) –glm predictive covariances
(batch_size, output_shape, output_shape)
-
n_samples
#int
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs.
-
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
log_prob
#
log_prob(value: Tensor, normalized: bool = True) -> Tensor
Compute the log probability under the (current) Laplace approximation.
Parameters:
-
value
#Tensor
) – -
normalized
#bool
, default:True
) –whether to return log of a properly normalized Gaussian or just the terms that depend on
value
.
Returns:
-
log_prob
(Tensor
) –
Source code in laplace/baselaplace.py
functional_samples
#
functional_samples(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = GLM, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the function-space posterior on input data x
.
Can be used, for example, for Thompson sampling or to compute an arbitrary
expectation.
Parameters:
-
x
#Tensor or MutableMapping
) –input data
(batch_size, input_shape)
-
pred_type
#('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here.
-
n_samples
#int
, default:100
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs. Only applies when
pred_type='glm'
. -
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
predictive_samples
#
predictive_samples(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = GLM, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior predictive on input data x
. I.e., the respective
inverse-link function (e.g. softmax) is applied on top of the functional
sample.
Parameters:
-
x
#Tensor or MutableMapping
) –input data
(batch_size, input_shape)
-
pred_type
#('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here.
-
n_samples
#int
, default:100
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs. Only applies when
pred_type='glm'
. -
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
KronLLLaplace
#
KronLLLaplace(model: Module, likelihood: Likelihood | str, sigma_noise: float | Tensor = 1.0, prior_precision: float | Tensor = 1.0, prior_mean: float | Tensor = 0.0, temperature: float = 1.0, enable_backprop: bool = False, feature_reduction: FeatureReduction | str | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels', backend: type[CurvatureInterface] | None = None, last_layer_name: str | None = None, damping: bool = False, backend_kwargs: dict[str, Any] | None = None, asdl_fisher_kwargs: dict[str, Any] | None = None)
Bases: LLLaplace
, KronLaplace
Last-layer Laplace approximation with Kronecker factored log likelihood Hessian approximation
and hence posterior precision.
Mathematically, we have for the last parameter group, i.e., torch.nn.Linear,
that \P\approx Q \otimes H.
See KronLaplace
, LLLaplace
, and BaseLaplace
for the full interface and see
laplace.utils.matrix.Kron
and laplace.utils.matrix.KronDecomposed
for the structure of
the Kronecker factors. Kron
is used to aggregate factors by summing up and
KronDecomposed
is used to add the prior, a Hessian factor (e.g. temperature),
and computing posterior covariances, marginal likelihood, etc.
Use of damping
is possible by initializing or setting damping=True
.
Methods:
-
fit
–Fit the local Laplace approximation at the parameters of the model.
-
log_marginal_likelihood
–Compute the Laplace approximation to the log marginal likelihood subject
-
__call__
–Compute the posterior predictive on input data
x
. -
log_prob
–Compute the log probability under the (current) Laplace approximation.
-
functional_samples
–Sample from the function-space posterior on input data
x
. -
predictive_samples
–Sample from the posterior predictive on input data
x
. I.e., the respective
Attributes:
-
log_likelihood
(Tensor
) –Compute log likelihood on the training data after
.fit()
has been called. -
prior_precision_diag
(Tensor
) –Obtain the diagonal prior precision \(p_0\) constructed from either
-
scatter
(Tensor
) –Computes the scatter, a term of the log marginal likelihood that
-
log_det_prior_precision
(Tensor
) –Compute log determinant of the prior precision
-
log_det_ratio
(Tensor
) –Compute the log determinant ratio, a part of the log marginal likelihood.
-
posterior_precision
(KronDecomposed
) –Kronecker factored Posterior precision \(P\).
Source code in laplace/lllaplace.py
log_likelihood
#
Compute log likelihood on the training data after .fit()
has been called.
The log likelihood is computed on-demand based on the loss and, for example,
the observation noise which makes it differentiable in the latter for
iterative updates.
Returns:
-
log_likelihood
(Tensor
) –
prior_precision_diag
#
Obtain the diagonal prior precision \(p_0\) constructed from either a scalar or diagonal prior precision.
Returns:
-
prior_precision_diag
(Tensor
) –
scatter
#
Computes the scatter, a term of the log marginal likelihood that
corresponds to L-2 regularization:
scatter
= \((\theta_{MAP} - \mu_0)^{T} P_0 (\theta_{MAP} - \mu_0) \).
Returns:
-
scatter
(Tensor
) –
log_det_prior_precision
#
Compute log determinant of the prior precision \(\log \det P_0\)
Returns:
-
log_det
(Tensor
) –
log_det_ratio
#
Compute the log determinant ratio, a part of the log marginal likelihood.
Returns:
-
log_det_ratio
(Tensor
) –
posterior_precision
#
posterior_precision: KronDecomposed
Kronecker factored Posterior precision \(P\).
Returns:
-
precision
(`laplace.utils.matrix.KronDecomposed`
) –
fit
#
fit(train_loader: DataLoader, override: bool = True, progress_bar: bool = False) -> None
Fit the local Laplace approximation at the parameters of the model.
Parameters:
-
train_loader
#DataLoader
) –each iterate is a training batch, either
(X, y)
tensors or a dict-like object containing keys as expressed byself.dict_key_x
andself.dict_key_y
.train_loader.dataset
needs to be set to access \(N\), size of the data set. -
override
#bool
, default:True
) –whether to initialize H, loss, and n_data again; setting to False is useful for online learning settings to accumulate a sequential posterior approximation.
-
progress_bar
#bool
, default:False
) –
Source code in laplace/lllaplace.py
log_marginal_likelihood
#
log_marginal_likelihood(prior_precision: Tensor | None = None, sigma_noise: Tensor | None = None) -> Tensor
Compute the Laplace approximation to the log marginal likelihood subject
to specific Hessian approximations that subclasses implement.
Requires that the Laplace approximation has been fit before.
The resulting torch.Tensor is differentiable in prior_precision
and
sigma_noise
if these have gradients enabled.
By passing prior_precision
or sigma_noise
, the current value is
overwritten. This is useful for iterating on the log marginal likelihood.
Parameters:
-
prior_precision
#Tensor
, default:None
) –prior precision if should be changed from current
prior_precision
value -
sigma_noise
#Tensor
, default:None
) –observation noise standard deviation if should be changed
Returns:
-
log_marglik
(Tensor
) –
Source code in laplace/baselaplace.py
__call__
#
__call__(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = GLM, joint: bool = False, link_approx: LinkApprox | str = PROBIT, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None, fitting: bool = False, **model_kwargs: dict[str, Any]) -> Tensor | tuple[Tensor, Tensor]
Compute the posterior predictive on input data x
.
Parameters:
-
x
#Tensor or MutableMapping
) –(batch_size, input_shape)
if tensor. If MutableMapping, must contain the said tensor. -
pred_type
#('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here. When Laplace is done only on subset of parameters (i.e. some grad are disabled), only
nn
predictive is supported. -
link_approx
#('mc', 'probit', 'bridge', 'bridge_norm')
, default:'mc'
) –how to approximate the classification link function for the
'glm'
. Forpred_type='nn'
, only 'mc' is possible. -
joint
#bool
, default:False
) –Whether to output a joint predictive distribution in regression with
pred_type='glm'
. If set toTrue
, the predictive distribution has the same form as GP posterior, i.e. N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). IfFalse
, then only outputs the marginal predictive distribution. Only available for regression and GLM predictive. -
n_samples
#int
, default:100
) –number of samples for
link_approx='mc'
. -
diagonal_output
#bool
, default:False
) –whether to use a diagonalized posterior predictive on the outputs. Only works for
pred_type='glm'
whenjoint=False
in regression. In the case of last-layer Laplace with a diagonal or Kron Hessian, setting this toTrue
makes computation much(!) faster for large number of outputs. -
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used).
-
fitting
#bool
, default:False
) –whether or not this predictive call is done during fitting. Only useful for reward modeling: the likelihood is set to
"regression"
whenFalse
and"classification"
whenTrue
.
Returns:
-
predictive
(Tensor or tuple[Tensor]
) –For
likelihood='classification'
, a torch.Tensor is returned with a distribution over classes (similar to a Softmax). Forlikelihood='regression'
, a tuple of torch.Tensor is returned with the mean and the predictive variance. Forlikelihood='regression'
andjoint=True
, a tuple of torch.Tensor is returned with the mean and the predictive covariance.
Source code in laplace/baselaplace.py
1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 |
|
_glm_forward_call
#
_glm_forward_call(x: Tensor | MutableMapping, likelihood: Likelihood | str, joint: bool = False, link_approx: LinkApprox | str = PROBIT, n_samples: int = 100, diagonal_output: bool = False) -> Tensor | tuple[Tensor, Tensor]
Compute the posterior predictive on input data x
for "glm" pred type.
Parameters:
-
x
#Tensor or MutableMapping
) –(batch_size, input_shape)
if tensor. If MutableMapping, must contain the said tensor. -
likelihood
#Likelihood or str in {'classification', 'regression', 'reward_modeling'}
) –determines the log likelihood Hessian approximation.
-
link_approx
#('mc', 'probit', 'bridge', 'bridge_norm')
, default:'mc'
) –how to approximate the classification link function for the
'glm'
. Forpred_type='nn'
, only 'mc' is possible. -
joint
#bool
, default:False
) –Whether to output a joint predictive distribution in regression with
pred_type='glm'
. If set toTrue
, the predictive distribution has the same form as GP posterior, i.e. N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). IfFalse
, then only outputs the marginal predictive distribution. Only available for regression and GLM predictive. -
n_samples
#int
, default:100
) –number of samples for
link_approx='mc'
. -
diagonal_output
#bool
, default:False
) –whether to use a diagonalized posterior predictive on the outputs. Only works for
pred_type='glm'
andlink_approx='mc'
.
Returns:
-
predictive
(Tensor or tuple[Tensor]
) –For
likelihood='classification'
, a torch.Tensor is returned with a distribution over classes (similar to a Softmax). Forlikelihood='regression'
, a tuple of torch.Tensor is returned with the mean and the predictive variance. Forlikelihood='regression'
andjoint=True
, a tuple of torch.Tensor is returned with the mean and the predictive covariance.
Source code in laplace/baselaplace.py
598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 |
|
_glm_functional_samples
#
_glm_functional_samples(f_mu: Tensor, f_var: Tensor, n_samples: int, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior functional on input data x
using "glm" prediction
type.
Parameters:
-
f_mu
#Tensor or MutableMapping
) –glm predictive mean
(batch_size, output_shape)
-
f_var
#Tensor or MutableMapping
) –glm predictive covariances
(batch_size, output_shape, output_shape)
-
n_samples
#int
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs.
-
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
_glm_predictive_samples
#
_glm_predictive_samples(f_mu: Tensor, f_var: Tensor, n_samples: int, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior predictive on input data x
using "glm" prediction
type. I.e., the inverse-link function correponding to the likelihood is applied
on top of the functional sample.
Parameters:
-
f_mu
#Tensor or MutableMapping
) –glm predictive mean
(batch_size, output_shape)
-
f_var
#Tensor or MutableMapping
) –glm predictive covariances
(batch_size, output_shape, output_shape)
-
n_samples
#int
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs.
-
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
log_prob
#
log_prob(value: Tensor, normalized: bool = True) -> Tensor
Compute the log probability under the (current) Laplace approximation.
Parameters:
-
value
#Tensor
) – -
normalized
#bool
, default:True
) –whether to return log of a properly normalized Gaussian or just the terms that depend on
value
.
Returns:
-
log_prob
(Tensor
) –
Source code in laplace/baselaplace.py
functional_samples
#
functional_samples(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = GLM, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the function-space posterior on input data x
.
Can be used, for example, for Thompson sampling or to compute an arbitrary
expectation.
Parameters:
-
x
#Tensor or MutableMapping
) –input data
(batch_size, input_shape)
-
pred_type
#('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here.
-
n_samples
#int
, default:100
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs. Only applies when
pred_type='glm'
. -
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
predictive_samples
#
predictive_samples(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = GLM, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior predictive on input data x
. I.e., the respective
inverse-link function (e.g. softmax) is applied on top of the functional
sample.
Parameters:
-
x
#Tensor or MutableMapping
) –input data
(batch_size, input_shape)
-
pred_type
#('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here.
-
n_samples
#int
, default:100
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs. Only applies when
pred_type='glm'
. -
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
FullLLLaplace
#
FullLLLaplace(model: Module, likelihood: Likelihood | str, sigma_noise: float | Tensor = 1.0, prior_precision: float | Tensor = 1.0, prior_mean: float | Tensor = 0.0, temperature: float = 1.0, enable_backprop: bool = False, feature_reduction: FeatureReduction | str | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels', backend: type[CurvatureInterface] | None = None, last_layer_name: str | None = None, backend_kwargs: dict[str, Any] | None = None, asdl_fisher_kwargs: dict[str, Any] | None = None)
Bases: LLLaplace
, FullLaplace
Last-layer Laplace approximation with full, i.e., dense, log likelihood Hessian approximation
and hence posterior precision. Based on the chosen backend
parameter, the full
approximation can be, for example, a generalized Gauss-Newton matrix.
Mathematically, we have \(P \in \mathbb{R}^{P \times P}\).
See FullLaplace
, LLLaplace
, and BaseLaplace
for the full interface.
Methods:
-
fit
–Fit the local Laplace approximation at the parameters of the model.
-
log_marginal_likelihood
–Compute the Laplace approximation to the log marginal likelihood subject
-
__call__
–Compute the posterior predictive on input data
x
. -
log_prob
–Compute the log probability under the (current) Laplace approximation.
-
functional_samples
–Sample from the function-space posterior on input data
x
. -
predictive_samples
–Sample from the posterior predictive on input data
x
. I.e., the respective -
functional_variance_fast
–Should be overriden if there exists a trick to make this fast!
Attributes:
-
log_likelihood
(Tensor
) –Compute log likelihood on the training data after
.fit()
has been called. -
prior_precision_diag
(Tensor
) –Obtain the diagonal prior precision \(p_0\) constructed from either
-
scatter
(Tensor
) –Computes the scatter, a term of the log marginal likelihood that
-
log_det_prior_precision
(Tensor
) –Compute log determinant of the prior precision
-
log_det_ratio
(Tensor
) –Compute the log determinant ratio, a part of the log marginal likelihood.
-
posterior_precision
(Tensor
) –Posterior precision \(P\).
-
posterior_scale
(Tensor
) –Posterior scale (square root of the covariance), i.e.,
-
posterior_covariance
(Tensor
) –Posterior covariance, i.e., \(P^{-1}\).
Source code in laplace/lllaplace.py
log_likelihood
#
Compute log likelihood on the training data after .fit()
has been called.
The log likelihood is computed on-demand based on the loss and, for example,
the observation noise which makes it differentiable in the latter for
iterative updates.
Returns:
-
log_likelihood
(Tensor
) –
prior_precision_diag
#
Obtain the diagonal prior precision \(p_0\) constructed from either a scalar or diagonal prior precision.
Returns:
-
prior_precision_diag
(Tensor
) –
scatter
#
Computes the scatter, a term of the log marginal likelihood that
corresponds to L-2 regularization:
scatter
= \((\theta_{MAP} - \mu_0)^{T} P_0 (\theta_{MAP} - \mu_0) \).
Returns:
-
scatter
(Tensor
) –
log_det_prior_precision
#
Compute log determinant of the prior precision \(\log \det P_0\)
Returns:
-
log_det
(Tensor
) –
log_det_ratio
#
Compute the log determinant ratio, a part of the log marginal likelihood.
Returns:
-
log_det_ratio
(Tensor
) –
posterior_precision
#
Posterior precision \(P\).
Returns:
-
precision
(tensor
) –(parameters, parameters)
posterior_scale
#
Posterior scale (square root of the covariance), i.e., \(P^{-\frac{1}{2}}\).
Returns:
-
scale
(tensor
) –(parameters, parameters)
posterior_covariance
#
Posterior covariance, i.e., \(P^{-1}\).
Returns:
-
covariance
(tensor
) –(parameters, parameters)
fit
#
fit(train_loader: DataLoader, override: bool = True, progress_bar: bool = False) -> None
Fit the local Laplace approximation at the parameters of the model.
Parameters:
-
train_loader
#DataLoader
) –each iterate is a training batch, either
(X, y)
tensors or a dict-like object containing keys as expressed byself.dict_key_x
andself.dict_key_y
.train_loader.dataset
needs to be set to access \(N\), size of the data set. -
override
#bool
, default:True
) –whether to initialize H, loss, and n_data again; setting to False is useful for online learning settings to accumulate a sequential posterior approximation.
-
progress_bar
#bool
, default:False
) –
Source code in laplace/lllaplace.py
log_marginal_likelihood
#
log_marginal_likelihood(prior_precision: Tensor | None = None, sigma_noise: Tensor | None = None) -> Tensor
Compute the Laplace approximation to the log marginal likelihood subject
to specific Hessian approximations that subclasses implement.
Requires that the Laplace approximation has been fit before.
The resulting torch.Tensor is differentiable in prior_precision
and
sigma_noise
if these have gradients enabled.
By passing prior_precision
or sigma_noise
, the current value is
overwritten. This is useful for iterating on the log marginal likelihood.
Parameters:
-
prior_precision
#Tensor
, default:None
) –prior precision if should be changed from current
prior_precision
value -
sigma_noise
#Tensor
, default:None
) –observation noise standard deviation if should be changed
Returns:
-
log_marglik
(Tensor
) –
Source code in laplace/baselaplace.py
__call__
#
__call__(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = GLM, joint: bool = False, link_approx: LinkApprox | str = PROBIT, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None, fitting: bool = False, **model_kwargs: dict[str, Any]) -> Tensor | tuple[Tensor, Tensor]
Compute the posterior predictive on input data x
.
Parameters:
-
x
#Tensor or MutableMapping
) –(batch_size, input_shape)
if tensor. If MutableMapping, must contain the said tensor. -
pred_type
#('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here. When Laplace is done only on subset of parameters (i.e. some grad are disabled), only
nn
predictive is supported. -
link_approx
#('mc', 'probit', 'bridge', 'bridge_norm')
, default:'mc'
) –how to approximate the classification link function for the
'glm'
. Forpred_type='nn'
, only 'mc' is possible. -
joint
#bool
, default:False
) –Whether to output a joint predictive distribution in regression with
pred_type='glm'
. If set toTrue
, the predictive distribution has the same form as GP posterior, i.e. N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). IfFalse
, then only outputs the marginal predictive distribution. Only available for regression and GLM predictive. -
n_samples
#int
, default:100
) –number of samples for
link_approx='mc'
. -
diagonal_output
#bool
, default:False
) –whether to use a diagonalized posterior predictive on the outputs. Only works for
pred_type='glm'
whenjoint=False
in regression. In the case of last-layer Laplace with a diagonal or Kron Hessian, setting this toTrue
makes computation much(!) faster for large number of outputs. -
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used).
-
fitting
#bool
, default:False
) –whether or not this predictive call is done during fitting. Only useful for reward modeling: the likelihood is set to
"regression"
whenFalse
and"classification"
whenTrue
.
Returns:
-
predictive
(Tensor or tuple[Tensor]
) –For
likelihood='classification'
, a torch.Tensor is returned with a distribution over classes (similar to a Softmax). Forlikelihood='regression'
, a tuple of torch.Tensor is returned with the mean and the predictive variance. Forlikelihood='regression'
andjoint=True
, a tuple of torch.Tensor is returned with the mean and the predictive covariance.
Source code in laplace/baselaplace.py
1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 |
|
_glm_forward_call
#
_glm_forward_call(x: Tensor | MutableMapping, likelihood: Likelihood | str, joint: bool = False, link_approx: LinkApprox | str = PROBIT, n_samples: int = 100, diagonal_output: bool = False) -> Tensor | tuple[Tensor, Tensor]
Compute the posterior predictive on input data x
for "glm" pred type.
Parameters:
-
x
#Tensor or MutableMapping
) –(batch_size, input_shape)
if tensor. If MutableMapping, must contain the said tensor. -
likelihood
#Likelihood or str in {'classification', 'regression', 'reward_modeling'}
) –determines the log likelihood Hessian approximation.
-
link_approx
#('mc', 'probit', 'bridge', 'bridge_norm')
, default:'mc'
) –how to approximate the classification link function for the
'glm'
. Forpred_type='nn'
, only 'mc' is possible. -
joint
#bool
, default:False
) –Whether to output a joint predictive distribution in regression with
pred_type='glm'
. If set toTrue
, the predictive distribution has the same form as GP posterior, i.e. N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). IfFalse
, then only outputs the marginal predictive distribution. Only available for regression and GLM predictive. -
n_samples
#int
, default:100
) –number of samples for
link_approx='mc'
. -
diagonal_output
#bool
, default:False
) –whether to use a diagonalized posterior predictive on the outputs. Only works for
pred_type='glm'
andlink_approx='mc'
.
Returns:
-
predictive
(Tensor or tuple[Tensor]
) –For
likelihood='classification'
, a torch.Tensor is returned with a distribution over classes (similar to a Softmax). Forlikelihood='regression'
, a tuple of torch.Tensor is returned with the mean and the predictive variance. Forlikelihood='regression'
andjoint=True
, a tuple of torch.Tensor is returned with the mean and the predictive covariance.
Source code in laplace/baselaplace.py
598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 |
|
_glm_functional_samples
#
_glm_functional_samples(f_mu: Tensor, f_var: Tensor, n_samples: int, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior functional on input data x
using "glm" prediction
type.
Parameters:
-
f_mu
#Tensor or MutableMapping
) –glm predictive mean
(batch_size, output_shape)
-
f_var
#Tensor or MutableMapping
) –glm predictive covariances
(batch_size, output_shape, output_shape)
-
n_samples
#int
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs.
-
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
_glm_predictive_samples
#
_glm_predictive_samples(f_mu: Tensor, f_var: Tensor, n_samples: int, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior predictive on input data x
using "glm" prediction
type. I.e., the inverse-link function correponding to the likelihood is applied
on top of the functional sample.
Parameters:
-
f_mu
#Tensor or MutableMapping
) –glm predictive mean
(batch_size, output_shape)
-
f_var
#Tensor or MutableMapping
) –glm predictive covariances
(batch_size, output_shape, output_shape)
-
n_samples
#int
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs.
-
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
log_prob
#
log_prob(value: Tensor, normalized: bool = True) -> Tensor
Compute the log probability under the (current) Laplace approximation.
Parameters:
-
value
#Tensor
) – -
normalized
#bool
, default:True
) –whether to return log of a properly normalized Gaussian or just the terms that depend on
value
.
Returns:
-
log_prob
(Tensor
) –
Source code in laplace/baselaplace.py
functional_samples
#
functional_samples(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = GLM, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the function-space posterior on input data x
.
Can be used, for example, for Thompson sampling or to compute an arbitrary
expectation.
Parameters:
-
x
#Tensor or MutableMapping
) –input data
(batch_size, input_shape)
-
pred_type
#('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here.
-
n_samples
#int
, default:100
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs. Only applies when
pred_type='glm'
. -
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
predictive_samples
#
predictive_samples(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = GLM, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior predictive on input data x
. I.e., the respective
inverse-link function (e.g. softmax) is applied on top of the functional
sample.
Parameters:
-
x
#Tensor or MutableMapping
) –input data
(batch_size, input_shape)
-
pred_type
#('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here.
-
n_samples
#int
, default:100
) –number of samples
-
diagonal_output
#bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs. Only applies when
pred_type='glm'
. -
generator
#Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
functional_variance_fast
#
functional_variance_fast(X)
Should be overriden if there exists a trick to make this fast!
Parameters:
-
X
#
Returns:
-
f_var_diag
(torch.Tensor of shape (batch_size, num_outputs)
) –Corresponding to the diagonal of the covariance matrix of the outputs