laplace.subnetlaplace
#
SubnetLaplace
#
SubnetLaplace(model: Module, likelihood: Likelihood | str, subnetwork_indices: LongTensor, sigma_noise: float | Tensor = 1.0, prior_precision: float | Tensor = 1.0, prior_mean: float | Tensor = 0.0, temperature: float = 1.0, backend: Type[CurvatureInterface] | None = None, backend_kwargs: dict | None = None, asdl_fisher_kwargs: dict | None = None)
Bases: ParametricLaplace
Class for subnetwork Laplace, which computes the Laplace approximation over just a subset of the model parameters (i.e. a subnetwork within the neural network), as proposed in [1]. Subnetwork Laplace can only be used with either a full or a diagonal Hessian approximation.
A Laplace approximation is represented by a MAP which is given by the
model
parameter and a posterior precision or covariance specifying
a Gaussian distribution \(\mathcal{N}(\theta_{MAP}, P^{-1})\).
Here, only a subset of the model parameters (i.e. a subnetwork of the
neural network) are treated probabilistically.
The goal of this class is to compute the posterior precision \(P\)
which sums as
The prior is assumed to be Gaussian and therefore we have a simple form for \(\nabla^2_\theta \log p(\theta) \vert_{\theta_{MAP}} = P_0 \). In particular, we assume a scalar or diagonal prior precision so that in all cases \(P_0 = \textrm{diag}(p_0)\) and the structure of \(p_0\) can be varied.
The subnetwork Laplace approximation only supports a full, i.e., dense, log likelihood
Hessian approximation and hence posterior precision. Based on the chosen backend
parameter, the full approximation can be, for example, a generalized Gauss-Newton
matrix. Mathematically, we have \(P \in \mathbb{R}^{P \times P}\).
See FullLaplace
and BaseLaplace
for the full interface.
References
[1] Daxberger, E., Nalisnick, E., Allingham, JU., Antorán, J., Hernández-Lobato, JM. Bayesian Deep Learning via Subnetwork Inference. ICML 2021.
Parameters:
-
model
(torch.nn.Module or `laplace.utils.feature_extractor.FeatureExtractor`
) – -
likelihood
(('classification', 'regression')
, default:'classification'
) –determines the log likelihood Hessian approximation
-
subnetwork_indices
(LongTensor
) –indices of the vectorized model parameters (i.e.
torch.nn.utils.parameters_to_vector(model.parameters())
) that define the subnetwork to apply the Laplace approximation over -
sigma_noise
(Tensor or float
, default:1
) –observation noise for the regression setting; must be 1 for classification
-
prior_precision
(Tensor or float
, default:1
) –prior precision of a Gaussian prior (= weight decay); can be scalar, per-layer, or diagonal in the most general case
-
prior_mean
(Tensor or float
, default:0
) –prior mean of a Gaussian prior, useful for continual learning
-
temperature
(float
, default:1
) –temperature of the likelihood; lower temperature leads to more concentrated posterior and vice versa.
-
backend
(subclasses of `laplace.curvature.{GGNInterface,EFInterface}`
, default:None
) –backend for access to curvature/Hessian approximations
-
backend_kwargs
(dict
, default:None
) –arguments passed to the backend on initialization, for example to set the number of MC samples for stochastic approximations.
Source code in laplace/subnetlaplace.py
log_likelihood
#
Compute log likelihood on the training data after .fit()
has been called.
The log likelihood is computed on-demand based on the loss and, for example,
the observation noise which makes it differentiable in the latter for
iterative updates.
Returns:
-
log_likelihood
(Tensor
) –
log_det_prior_precision
#
Compute log determinant of the prior precision \(\log \det P_0\)
Returns:
-
log_det
(Tensor
) –
log_det_posterior_precision
#
Compute log determinant of the posterior precision \(\log \det P\) which depends on the subclasses structure used for the Hessian approximation.
Returns:
-
log_det
(Tensor
) –
log_det_ratio
#
Compute the log determinant ratio, a part of the log marginal likelihood.
Returns:
-
log_det_ratio
(Tensor
) –
posterior_precision
#
Compute or return the posterior precision \(P\).
Returns:
-
posterior_prec
(Tensor
) –
prior_precision_diag
#
Obtain the diagonal prior precision \(p_0\) constructed from either a scalar or diagonal prior precision.
Returns:
-
prior_precision_diag
(Tensor
) –
fit
#
Fit the local Laplace approximation at the parameters of the model.
Parameters:
-
train_loader
(DataLoader
) –each iterate is a training batch, either
(X, y)
tensors or a dict-like object containing keys as expressed byself.dict_key_x
andself.dict_key_y
.train_loader.dataset
needs to be set to access \(N\), size of the data set. -
override
(bool
, default:True
) –whether to initialize H, loss, and n_data again; setting to False is useful for online learning settings to accumulate a sequential posterior approximation.
-
progress_bar
(bool
, default:False
) –whether to show a progress bar; updated at every batch-Hessian computation. Useful for very large model and large amount of data, esp. when
subset_of_weights='all'
.
Source code in laplace/baselaplace.py
780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 |
|
log_marginal_likelihood
#
log_marginal_likelihood(prior_precision: Tensor | None = None, sigma_noise: Tensor | None = None) -> Tensor
Compute the Laplace approximation to the log marginal likelihood subject
to specific Hessian approximations that subclasses implement.
Requires that the Laplace approximation has been fit before.
The resulting torch.Tensor is differentiable in prior_precision
and
sigma_noise
if these have gradients enabled.
By passing prior_precision
or sigma_noise
, the current value is
overwritten. This is useful for iterating on the log marginal likelihood.
Parameters:
-
prior_precision
(Tensor
, default:None
) –prior precision if should be changed from current
prior_precision
value -
sigma_noise
(Tensor
, default:None
) –observation noise standard deviation if should be changed
Returns:
-
log_marglik
(Tensor
) –
Source code in laplace/baselaplace.py
__call__
#
__call__(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = PredType.GLM, joint: bool = False, link_approx: LinkApprox | str = LinkApprox.PROBIT, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None, fitting: bool = False, **model_kwargs: dict[str, Any]) -> Tensor | tuple[Tensor, Tensor]
Compute the posterior predictive on input data x
.
Parameters:
-
x
(Tensor or MutableMapping
) –(batch_size, input_shape)
if tensor. If MutableMapping, must contain the said tensor. -
pred_type
(('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here. When Laplace is done only on subset of parameters (i.e. some grad are disabled), only
nn
predictive is supported. -
link_approx
(('mc', 'probit', 'bridge', 'bridge_norm')
, default:'mc'
) –how to approximate the classification link function for the
'glm'
. Forpred_type='nn'
, only 'mc' is possible. -
joint
(bool
, default:False
) –Whether to output a joint predictive distribution in regression with
pred_type='glm'
. If set toTrue
, the predictive distribution has the same form as GP posterior, i.e. N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). IfFalse
, then only outputs the marginal predictive distribution. Only available for regression and GLM predictive. -
n_samples
(int
, default:100
) –number of samples for
link_approx='mc'
. -
diagonal_output
(bool
, default:False
) –whether to use a diagonalized posterior predictive on the outputs. Only works for
pred_type='glm'
whenjoint=False
in regression. In the case of last-layer Laplace with a diagonal or Kron Hessian, setting this toTrue
makes computation much(!) faster for large number of outputs. -
generator
(Generator
, default:None
) –random number generator to control the samples (if sampling used).
-
fitting
(bool
, default:False
) –whether or not this predictive call is done during fitting. Only useful for reward modeling: the likelihood is set to
"regression"
whenFalse
and"classification"
whenTrue
.
Returns:
-
predictive
(Tensor or tuple[Tensor]
) –For
likelihood='classification'
, a torch.Tensor is returned with a distribution over classes (similar to a Softmax). Forlikelihood='regression'
, a tuple of torch.Tensor is returned with the mean and the predictive variance. Forlikelihood='regression'
andjoint=True
, a tuple of torch.Tensor is returned with the mean and the predictive covariance.
Source code in laplace/baselaplace.py
987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 |
|
_glm_forward_call
#
_glm_forward_call(x: Tensor | MutableMapping, likelihood: Likelihood | str, joint: bool = False, link_approx: LinkApprox | str = LinkApprox.PROBIT, n_samples: int = 100, diagonal_output: bool = False) -> Tensor | tuple[Tensor, Tensor]
Compute the posterior predictive on input data x
for "glm" pred type.
Parameters:
-
x
(Tensor or MutableMapping
) –(batch_size, input_shape)
if tensor. If MutableMapping, must contain the said tensor. -
likelihood
(Likelihood or str in {'classification', 'regression', 'reward_modeling'}
) –determines the log likelihood Hessian approximation.
-
link_approx
(('mc', 'probit', 'bridge', 'bridge_norm')
, default:'mc'
) –how to approximate the classification link function for the
'glm'
. Forpred_type='nn'
, only 'mc' is possible. -
joint
(bool
, default:False
) –Whether to output a joint predictive distribution in regression with
pred_type='glm'
. If set toTrue
, the predictive distribution has the same form as GP posterior, i.e. N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). IfFalse
, then only outputs the marginal predictive distribution. Only available for regression and GLM predictive. -
n_samples
(int
, default:100
) –number of samples for
link_approx='mc'
. -
diagonal_output
(bool
, default:False
) –whether to use a diagonalized posterior predictive on the outputs. Only works for
pred_type='glm'
andlink_approx='mc'
.
Returns:
-
predictive
(Tensor or tuple[Tensor]
) –For
likelihood='classification'
, a torch.Tensor is returned with a distribution over classes (similar to a Softmax). Forlikelihood='regression'
, a tuple of torch.Tensor is returned with the mean and the predictive variance. Forlikelihood='regression'
andjoint=True
, a tuple of torch.Tensor is returned with the mean and the predictive covariance.
Source code in laplace/baselaplace.py
570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 |
|
_glm_predictive_samples
#
_glm_predictive_samples(f_mu: Tensor, f_var: Tensor, n_samples: int, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior predictive on input data x
using "glm" prediction
type.
Parameters:
-
f_mu
(Tensor or MutableMapping
) –glm predictive mean
(batch_size, output_shape)
-
f_var
(Tensor or MutableMapping
) –glm predictive covariances
(batch_size, output_shape, output_shape)
-
n_samples
(int
) –number of samples
-
diagonal_output
(bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs.
-
generator
(Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
square_norm
#
Compute the square norm under post. Precision with value-self.mean
as 𝛥:
Returns:
-
square_form
–
log_prob
#
log_prob(value: Tensor, normalized: bool = True) -> Tensor
Compute the log probability under the (current) Laplace approximation.
Parameters:
-
value
(Tensor
) – -
normalized
(bool
, default:True
) –whether to return log of a properly normalized Gaussian or just the terms that depend on
value
.
Returns:
-
log_prob
(Tensor
) –
Source code in laplace/baselaplace.py
predictive_samples
#
predictive_samples(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = PredType.GLM, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior predictive on input data x
.
Can be used, for example, for Thompson sampling.
Parameters:
-
x
(Tensor or MutableMapping
) –input data
(batch_size, input_shape)
-
pred_type
(('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here.
-
n_samples
(int
, default:100
) –number of samples
-
diagonal_output
(bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs. Only applies when
pred_type='glm'
. -
generator
(Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
functional_variance
#
Compute functional variance for the 'glm'
predictive:
f_var[i] = Js[i] @ P.inv() @ Js[i].T
, which is a output x output
predictive covariance matrix.
Mathematically, we have for a single Jacobian
\(\mathcal{J} = \nabla_\theta f(x;\theta)\vert_{\theta_{MAP}}\)
the output covariance matrix
\( \mathcal{J} P^{-1} \mathcal{J}^T \).
Parameters:
-
Js
(Tensor
) –Jacobians of model output wrt parameters
(batch, outputs, parameters)
Returns:
-
f_var
(Tensor
) –output covariance
(batch, outputs, outputs)
Source code in laplace/baselaplace.py
functional_covariance
#
Compute functional covariance for the 'glm'
predictive:
f_cov = Js @ P.inv() @ Js.T
, which is a batchoutput x batchoutput
predictive covariance matrix.
This emulates the GP posterior covariance N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). Useful for joint predictions, such as in batched Bayesian optimization.
Parameters:
-
Js
(Tensor
) –Jacobians of model output wrt parameters
(batch*outputs, parameters)
Returns:
-
f_cov
(Tensor
) –output covariance
(batch*outputs, batch*outputs)
Source code in laplace/baselaplace.py
sample
#
sample(n_samples: int = 100, generator: Generator | None = None) -> Tensor
Sample from the Laplace posterior approximation, i.e., \( \theta \sim \mathcal{N}(\theta_{MAP}, P^{-1})\).
Parameters:
-
n_samples
(int
, default:100
) –number of samples
-
generator
(Generator
, default:None
) –random number generator to control the samples
Returns:
-
samples
(Tensor
) –
Source code in laplace/baselaplace.py
_check_subnetwork_indices
#
Check that subnetwork indices are valid indices of the vectorized model parameters
(i.e. torch.nn.utils.parameters_to_vector(model.parameters())
).
Source code in laplace/subnetlaplace.py
DiagSubnetLaplace
#
DiagSubnetLaplace(model: Module, likelihood: Likelihood | str, subnetwork_indices: LongTensor, sigma_noise: float | Tensor = 1.0, prior_precision: float | Tensor = 1.0, prior_mean: float | Tensor = 0.0, temperature: float = 1.0, backend: Type[CurvatureInterface] | None = None, backend_kwargs: dict | None = None, asdl_fisher_kwargs: dict | None = None)
Bases: SubnetLaplace
, DiagLaplace
Subnetwork Laplace approximation with diagonal log likelihood Hessian approximation
and hence posterior precision.
Mathematically, we have \(P \approx \textrm{diag}(P)\).
See DiagLaplace
, SubnetLaplace
, and BaseLaplace
for the full interface.
Source code in laplace/subnetlaplace.py
log_likelihood
#
Compute log likelihood on the training data after .fit()
has been called.
The log likelihood is computed on-demand based on the loss and, for example,
the observation noise which makes it differentiable in the latter for
iterative updates.
Returns:
-
log_likelihood
(Tensor
) –
prior_precision_diag
#
Obtain the diagonal prior precision \(p_0\) constructed from either a scalar or diagonal prior precision.
Returns:
-
prior_precision_diag
(Tensor
) –
log_det_prior_precision
#
Compute log determinant of the prior precision \(\log \det P_0\)
Returns:
-
log_det
(Tensor
) –
log_det_ratio
#
Compute the log determinant ratio, a part of the log marginal likelihood.
Returns:
-
log_det_ratio
(Tensor
) –
posterior_precision
#
Diagonal posterior precision \(p\).
Returns:
-
precision
(tensor
) –(parameters)
posterior_scale
#
Diagonal posterior scale \(\sqrt{p^{-1}}\).
Returns:
-
precision
(tensor
) –(parameters)
posterior_variance
#
Diagonal posterior variance \(p^{-1}\).
Returns:
-
precision
(tensor
) –(parameters)
fit
#
Fit the local Laplace approximation at the parameters of the model.
Parameters:
-
train_loader
(DataLoader
) –each iterate is a training batch, either
(X, y)
tensors or a dict-like object containing keys as expressed byself.dict_key_x
andself.dict_key_y
.train_loader.dataset
needs to be set to access \(N\), size of the data set. -
override
(bool
, default:True
) –whether to initialize H, loss, and n_data again; setting to False is useful for online learning settings to accumulate a sequential posterior approximation.
-
progress_bar
(bool
, default:False
) –whether to show a progress bar; updated at every batch-Hessian computation. Useful for very large model and large amount of data, esp. when
subset_of_weights='all'
.
Source code in laplace/baselaplace.py
780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 |
|
log_marginal_likelihood
#
log_marginal_likelihood(prior_precision: Tensor | None = None, sigma_noise: Tensor | None = None) -> Tensor
Compute the Laplace approximation to the log marginal likelihood subject
to specific Hessian approximations that subclasses implement.
Requires that the Laplace approximation has been fit before.
The resulting torch.Tensor is differentiable in prior_precision
and
sigma_noise
if these have gradients enabled.
By passing prior_precision
or sigma_noise
, the current value is
overwritten. This is useful for iterating on the log marginal likelihood.
Parameters:
-
prior_precision
(Tensor
, default:None
) –prior precision if should be changed from current
prior_precision
value -
sigma_noise
(Tensor
, default:None
) –observation noise standard deviation if should be changed
Returns:
-
log_marglik
(Tensor
) –
Source code in laplace/baselaplace.py
__call__
#
__call__(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = PredType.GLM, joint: bool = False, link_approx: LinkApprox | str = LinkApprox.PROBIT, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None, fitting: bool = False, **model_kwargs: dict[str, Any]) -> Tensor | tuple[Tensor, Tensor]
Compute the posterior predictive on input data x
.
Parameters:
-
x
(Tensor or MutableMapping
) –(batch_size, input_shape)
if tensor. If MutableMapping, must contain the said tensor. -
pred_type
(('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here. When Laplace is done only on subset of parameters (i.e. some grad are disabled), only
nn
predictive is supported. -
link_approx
(('mc', 'probit', 'bridge', 'bridge_norm')
, default:'mc'
) –how to approximate the classification link function for the
'glm'
. Forpred_type='nn'
, only 'mc' is possible. -
joint
(bool
, default:False
) –Whether to output a joint predictive distribution in regression with
pred_type='glm'
. If set toTrue
, the predictive distribution has the same form as GP posterior, i.e. N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). IfFalse
, then only outputs the marginal predictive distribution. Only available for regression and GLM predictive. -
n_samples
(int
, default:100
) –number of samples for
link_approx='mc'
. -
diagonal_output
(bool
, default:False
) –whether to use a diagonalized posterior predictive on the outputs. Only works for
pred_type='glm'
whenjoint=False
in regression. In the case of last-layer Laplace with a diagonal or Kron Hessian, setting this toTrue
makes computation much(!) faster for large number of outputs. -
generator
(Generator
, default:None
) –random number generator to control the samples (if sampling used).
-
fitting
(bool
, default:False
) –whether or not this predictive call is done during fitting. Only useful for reward modeling: the likelihood is set to
"regression"
whenFalse
and"classification"
whenTrue
.
Returns:
-
predictive
(Tensor or tuple[Tensor]
) –For
likelihood='classification'
, a torch.Tensor is returned with a distribution over classes (similar to a Softmax). Forlikelihood='regression'
, a tuple of torch.Tensor is returned with the mean and the predictive variance. Forlikelihood='regression'
andjoint=True
, a tuple of torch.Tensor is returned with the mean and the predictive covariance.
Source code in laplace/baselaplace.py
987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 |
|
_glm_forward_call
#
_glm_forward_call(x: Tensor | MutableMapping, likelihood: Likelihood | str, joint: bool = False, link_approx: LinkApprox | str = LinkApprox.PROBIT, n_samples: int = 100, diagonal_output: bool = False) -> Tensor | tuple[Tensor, Tensor]
Compute the posterior predictive on input data x
for "glm" pred type.
Parameters:
-
x
(Tensor or MutableMapping
) –(batch_size, input_shape)
if tensor. If MutableMapping, must contain the said tensor. -
likelihood
(Likelihood or str in {'classification', 'regression', 'reward_modeling'}
) –determines the log likelihood Hessian approximation.
-
link_approx
(('mc', 'probit', 'bridge', 'bridge_norm')
, default:'mc'
) –how to approximate the classification link function for the
'glm'
. Forpred_type='nn'
, only 'mc' is possible. -
joint
(bool
, default:False
) –Whether to output a joint predictive distribution in regression with
pred_type='glm'
. If set toTrue
, the predictive distribution has the same form as GP posterior, i.e. N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). IfFalse
, then only outputs the marginal predictive distribution. Only available for regression and GLM predictive. -
n_samples
(int
, default:100
) –number of samples for
link_approx='mc'
. -
diagonal_output
(bool
, default:False
) –whether to use a diagonalized posterior predictive on the outputs. Only works for
pred_type='glm'
andlink_approx='mc'
.
Returns:
-
predictive
(Tensor or tuple[Tensor]
) –For
likelihood='classification'
, a torch.Tensor is returned with a distribution over classes (similar to a Softmax). Forlikelihood='regression'
, a tuple of torch.Tensor is returned with the mean and the predictive variance. Forlikelihood='regression'
andjoint=True
, a tuple of torch.Tensor is returned with the mean and the predictive covariance.
Source code in laplace/baselaplace.py
570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 |
|
_glm_predictive_samples
#
_glm_predictive_samples(f_mu: Tensor, f_var: Tensor, n_samples: int, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior predictive on input data x
using "glm" prediction
type.
Parameters:
-
f_mu
(Tensor or MutableMapping
) –glm predictive mean
(batch_size, output_shape)
-
f_var
(Tensor or MutableMapping
) –glm predictive covariances
(batch_size, output_shape, output_shape)
-
n_samples
(int
) –number of samples
-
diagonal_output
(bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs.
-
generator
(Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
log_prob
#
log_prob(value: Tensor, normalized: bool = True) -> Tensor
Compute the log probability under the (current) Laplace approximation.
Parameters:
-
value
(Tensor
) – -
normalized
(bool
, default:True
) –whether to return log of a properly normalized Gaussian or just the terms that depend on
value
.
Returns:
-
log_prob
(Tensor
) –
Source code in laplace/baselaplace.py
predictive_samples
#
predictive_samples(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = PredType.GLM, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior predictive on input data x
.
Can be used, for example, for Thompson sampling.
Parameters:
-
x
(Tensor or MutableMapping
) –input data
(batch_size, input_shape)
-
pred_type
(('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here.
-
n_samples
(int
, default:100
) –number of samples
-
diagonal_output
(bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs. Only applies when
pred_type='glm'
. -
generator
(Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
_check_subnetwork_indices
#
Check that subnetwork indices are valid indices of the vectorized model parameters
(i.e. torch.nn.utils.parameters_to_vector(model.parameters())
).
Source code in laplace/subnetlaplace.py
FullSubnetLaplace
#
FullSubnetLaplace(model: Module, likelihood: Likelihood | str, subnetwork_indices: LongTensor, sigma_noise: float | Tensor = 1.0, prior_precision: float | Tensor = 1.0, prior_mean: float | Tensor = 0.0, temperature: float = 1.0, backend: Type[CurvatureInterface] | None = None, backend_kwargs: dict | None = None, asdl_fisher_kwargs: dict | None = None)
Bases: SubnetLaplace
, FullLaplace
Subnetwork Laplace approximation with full, i.e., dense, log likelihood Hessian
approximation and hence posterior precision. Based on the chosen backend
parameter,
the full approximation can be, for example, a generalized Gauss-Newton matrix.
Mathematically, we have \(P \in \mathbb{R}^{P \times P}\).
See FullLaplace
, SubnetLaplace
, and BaseLaplace
for the full interface.
Source code in laplace/subnetlaplace.py
log_likelihood
#
Compute log likelihood on the training data after .fit()
has been called.
The log likelihood is computed on-demand based on the loss and, for example,
the observation noise which makes it differentiable in the latter for
iterative updates.
Returns:
-
log_likelihood
(Tensor
) –
prior_precision_diag
#
Obtain the diagonal prior precision \(p_0\) constructed from either a scalar or diagonal prior precision.
Returns:
-
prior_precision_diag
(Tensor
) –
log_det_prior_precision
#
Compute log determinant of the prior precision \(\log \det P_0\)
Returns:
-
log_det
(Tensor
) –
log_det_ratio
#
Compute the log determinant ratio, a part of the log marginal likelihood.
Returns:
-
log_det_ratio
(Tensor
) –
posterior_precision
#
Posterior precision \(P\).
Returns:
-
precision
(tensor
) –(parameters, parameters)
posterior_scale
#
Posterior scale (square root of the covariance), i.e., \(P^{-\frac{1}{2}}\).
Returns:
-
scale
(tensor
) –(parameters, parameters)
posterior_covariance
#
Posterior covariance, i.e., \(P^{-1}\).
Returns:
-
covariance
(tensor
) –(parameters, parameters)
log_marginal_likelihood
#
log_marginal_likelihood(prior_precision: Tensor | None = None, sigma_noise: Tensor | None = None) -> Tensor
Compute the Laplace approximation to the log marginal likelihood subject
to specific Hessian approximations that subclasses implement.
Requires that the Laplace approximation has been fit before.
The resulting torch.Tensor is differentiable in prior_precision
and
sigma_noise
if these have gradients enabled.
By passing prior_precision
or sigma_noise
, the current value is
overwritten. This is useful for iterating on the log marginal likelihood.
Parameters:
-
prior_precision
(Tensor
, default:None
) –prior precision if should be changed from current
prior_precision
value -
sigma_noise
(Tensor
, default:None
) –observation noise standard deviation if should be changed
Returns:
-
log_marglik
(Tensor
) –
Source code in laplace/baselaplace.py
__call__
#
__call__(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = PredType.GLM, joint: bool = False, link_approx: LinkApprox | str = LinkApprox.PROBIT, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None, fitting: bool = False, **model_kwargs: dict[str, Any]) -> Tensor | tuple[Tensor, Tensor]
Compute the posterior predictive on input data x
.
Parameters:
-
x
(Tensor or MutableMapping
) –(batch_size, input_shape)
if tensor. If MutableMapping, must contain the said tensor. -
pred_type
(('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here. When Laplace is done only on subset of parameters (i.e. some grad are disabled), only
nn
predictive is supported. -
link_approx
(('mc', 'probit', 'bridge', 'bridge_norm')
, default:'mc'
) –how to approximate the classification link function for the
'glm'
. Forpred_type='nn'
, only 'mc' is possible. -
joint
(bool
, default:False
) –Whether to output a joint predictive distribution in regression with
pred_type='glm'
. If set toTrue
, the predictive distribution has the same form as GP posterior, i.e. N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). IfFalse
, then only outputs the marginal predictive distribution. Only available for regression and GLM predictive. -
n_samples
(int
, default:100
) –number of samples for
link_approx='mc'
. -
diagonal_output
(bool
, default:False
) –whether to use a diagonalized posterior predictive on the outputs. Only works for
pred_type='glm'
whenjoint=False
in regression. In the case of last-layer Laplace with a diagonal or Kron Hessian, setting this toTrue
makes computation much(!) faster for large number of outputs. -
generator
(Generator
, default:None
) –random number generator to control the samples (if sampling used).
-
fitting
(bool
, default:False
) –whether or not this predictive call is done during fitting. Only useful for reward modeling: the likelihood is set to
"regression"
whenFalse
and"classification"
whenTrue
.
Returns:
-
predictive
(Tensor or tuple[Tensor]
) –For
likelihood='classification'
, a torch.Tensor is returned with a distribution over classes (similar to a Softmax). Forlikelihood='regression'
, a tuple of torch.Tensor is returned with the mean and the predictive variance. Forlikelihood='regression'
andjoint=True
, a tuple of torch.Tensor is returned with the mean and the predictive covariance.
Source code in laplace/baselaplace.py
987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 |
|
_glm_forward_call
#
_glm_forward_call(x: Tensor | MutableMapping, likelihood: Likelihood | str, joint: bool = False, link_approx: LinkApprox | str = LinkApprox.PROBIT, n_samples: int = 100, diagonal_output: bool = False) -> Tensor | tuple[Tensor, Tensor]
Compute the posterior predictive on input data x
for "glm" pred type.
Parameters:
-
x
(Tensor or MutableMapping
) –(batch_size, input_shape)
if tensor. If MutableMapping, must contain the said tensor. -
likelihood
(Likelihood or str in {'classification', 'regression', 'reward_modeling'}
) –determines the log likelihood Hessian approximation.
-
link_approx
(('mc', 'probit', 'bridge', 'bridge_norm')
, default:'mc'
) –how to approximate the classification link function for the
'glm'
. Forpred_type='nn'
, only 'mc' is possible. -
joint
(bool
, default:False
) –Whether to output a joint predictive distribution in regression with
pred_type='glm'
. If set toTrue
, the predictive distribution has the same form as GP posterior, i.e. N([f(x1), ...,f(xm)], Cov[f(x1), ..., f(xm)]). IfFalse
, then only outputs the marginal predictive distribution. Only available for regression and GLM predictive. -
n_samples
(int
, default:100
) –number of samples for
link_approx='mc'
. -
diagonal_output
(bool
, default:False
) –whether to use a diagonalized posterior predictive on the outputs. Only works for
pred_type='glm'
andlink_approx='mc'
.
Returns:
-
predictive
(Tensor or tuple[Tensor]
) –For
likelihood='classification'
, a torch.Tensor is returned with a distribution over classes (similar to a Softmax). Forlikelihood='regression'
, a tuple of torch.Tensor is returned with the mean and the predictive variance. Forlikelihood='regression'
andjoint=True
, a tuple of torch.Tensor is returned with the mean and the predictive covariance.
Source code in laplace/baselaplace.py
570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 |
|
_glm_predictive_samples
#
_glm_predictive_samples(f_mu: Tensor, f_var: Tensor, n_samples: int, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior predictive on input data x
using "glm" prediction
type.
Parameters:
-
f_mu
(Tensor or MutableMapping
) –glm predictive mean
(batch_size, output_shape)
-
f_var
(Tensor or MutableMapping
) –glm predictive covariances
(batch_size, output_shape, output_shape)
-
n_samples
(int
) –number of samples
-
diagonal_output
(bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs.
-
generator
(Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
log_prob
#
log_prob(value: Tensor, normalized: bool = True) -> Tensor
Compute the log probability under the (current) Laplace approximation.
Parameters:
-
value
(Tensor
) – -
normalized
(bool
, default:True
) –whether to return log of a properly normalized Gaussian or just the terms that depend on
value
.
Returns:
-
log_prob
(Tensor
) –
Source code in laplace/baselaplace.py
predictive_samples
#
predictive_samples(x: Tensor | MutableMapping[str, Tensor | Any], pred_type: PredType | str = PredType.GLM, n_samples: int = 100, diagonal_output: bool = False, generator: Generator | None = None) -> Tensor
Sample from the posterior predictive on input data x
.
Can be used, for example, for Thompson sampling.
Parameters:
-
x
(Tensor or MutableMapping
) –input data
(batch_size, input_shape)
-
pred_type
(('glm', 'nn')
, default:'glm'
) –type of posterior predictive, linearized GLM predictive or neural network sampling predictive. The GLM predictive is consistent with the curvature approximations used here.
-
n_samples
(int
, default:100
) –number of samples
-
diagonal_output
(bool
, default:False
) –whether to use a diagonalized glm posterior predictive on the outputs. Only applies when
pred_type='glm'
. -
generator
(Generator
, default:None
) –random number generator to control the samples (if sampling used)
Returns:
-
samples
(Tensor
) –samples
(n_samples, batch_size, output_shape)
Source code in laplace/baselaplace.py
_check_subnetwork_indices
#
Check that subnetwork indices are valid indices of the vectorized model parameters
(i.e. torch.nn.utils.parameters_to_vector(model.parameters())
).