laplace.curvature
#
CurvatureInterface
#
CurvatureInterface(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Interface to access curvature for a model and corresponding likelihood.
A CurvatureInterface
must inherit from this baseclass and implement the
necessary functions jacobians
, full
, kron
, and diag
.
The interface might be extended in the future to account for other curvature
structures, for example, a block-diagonal one.
Parameters:
-
model
(torch.nn.Module or `laplace.utils.feature_extractor.FeatureExtractor`
) –torch model (neural network)
-
likelihood
(('classification', 'regression')
, default:'classification'
) – -
last_layer
(bool
, default:False
) –only consider curvature of last layer
-
subnetwork_indices
(LongTensor
, default:None
) –indices of the vectorized model parameters that define the subnetwork to apply the Laplace approximation over
-
dict_key_x
(str
, default:'input_ids'
) –The dictionary key under which the input tensor
x
is stored. Only has effect when the model takes aMutableMapping
as the input. Useful for Huggingface LLM models. -
dict_key_y
(str
, default:'labels'
) –The dictionary key under which the target tensor
y
is stored. Only has effect when the model takes aMutableMapping
as the input. Useful for Huggingface LLM models.
Attributes:
-
lossfunc
(MSELoss or CrossEntropyLoss
) – -
factor
(float
) –conversion factor between torch losses and base likelihoods For example, \(\frac{1}{2}\) to get to \(\mathcal{N}(f, 1)\) from MSELoss.
Source code in laplace/curvature/curvature.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\), via torch.func.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
enable_backprop
(bool
, default:= False
) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js
(Tensor
) –Jacobians
(batch, parameters, outputs)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
-
x
(Tensor
) – -
enable_backprop
(bool
, default:False
) –
Returns:
-
Js
(Tensor
) –Jacobians
(batch, outputs, last-layer-parameters)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
gradients(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor) -> tuple[Tensor, Tensor]
Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
y
(Tensor
) –
Returns:
-
Gs
(Tensor
) –gradients
(batch, parameters)
-
loss
(Tensor
) –
Source code in laplace/curvature/curvature.py
full
#
Compute a dense curvature (approximation) in the form of a \(P \times P\) matrix \(H\) with respect to parameters \(\theta \in \mathbb{R}^P\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
Returns:
-
loss
(Tensor
) – -
H
(Tensor
) –Hessian approximation
(parameters, parameters)
Source code in laplace/curvature/curvature.py
kron
#
kron(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, N: int, **kwargs: dict[str, Any]) -> tuple[Tensor, Kron]
Compute a Kronecker factored curvature approximation (such as KFAC). The approximation to \(H\) takes the form of two Kronecker factors \(Q, H\), i.e., \(H \approx Q \otimes H\) for each Module in the neural network permitting such curvature. \(Q\) is quadratic in the input-dimension of a module \(p_{in} \times p_{in}\) and \(H\) in the output-dimension \(p_{out} \times p_{out}\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
-
N
(int
) –total number of data points
Returns:
-
loss
(Tensor
) – -
H
(`laplace.utils.matrix.Kron`
) –Kronecker factored Hessian approximation.
Source code in laplace/curvature/curvature.py
diag
#
Compute a diagonal Hessian approximation to \(H\) and is represented as a vector of the dimensionality of parameters \(\theta\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
Returns:
-
loss
(Tensor
) – -
H
(Tensor
) –vector representing the diagonal of H
Source code in laplace/curvature/curvature.py
GGNInterface
#
GGNInterface(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels', stochastic: bool = False, num_samples: int = 1)
Bases: CurvatureInterface
Generalized Gauss-Newton or Fisher Curvature Interface.
The GGN is equal to the Fisher information for the available likelihoods.
In addition to CurvatureInterface
, methods for Jacobians are required by subclasses.
Parameters:
-
model
(torch.nn.Module or `laplace.utils.feature_extractor.FeatureExtractor`
) –torch model (neural network)
-
likelihood
(('classification', 'regression')
, default:'classification'
) – -
last_layer
(bool
, default:False
) –only consider curvature of last layer
-
subnetwork_indices
(Tensor
, default:None
) –indices of the vectorized model parameters that define the subnetwork to apply the Laplace approximation over
-
dict_key_x
(str
, default:'input_ids'
) –The dictionary key under which the input tensor
x
is stored. Only has effect when the model takes aMutableMapping
as the input. Useful for Huggingface LLM models. -
dict_key_y
(str
, default:'labels'
) –The dictionary key under which the target tensor
y
is stored. Only has effect when the model takes aMutableMapping
as the input. Useful for Huggingface LLM models. -
stochastic
(bool
, default:False
) –Fisher if stochastic else GGN
-
num_samples
(int
, default:1
) –Number of samples used to approximate the stochastic Fisher
Source code in laplace/curvature/curvature.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\), via torch.func.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
enable_backprop
(bool
, default:= False
) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js
(Tensor
) –Jacobians
(batch, parameters, outputs)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
-
x
(Tensor
) – -
enable_backprop
(bool
, default:False
) –
Returns:
-
Js
(Tensor
) –Jacobians
(batch, outputs, last-layer-parameters)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
gradients(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor) -> tuple[Tensor, Tensor]
Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
y
(Tensor
) –
Returns:
-
Gs
(Tensor
) –gradients
(batch, parameters)
-
loss
(Tensor
) –
Source code in laplace/curvature/curvature.py
kron
#
kron(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, N: int, **kwargs: dict[str, Any]) -> tuple[Tensor, Kron]
Compute a Kronecker factored curvature approximation (such as KFAC). The approximation to \(H\) takes the form of two Kronecker factors \(Q, H\), i.e., \(H \approx Q \otimes H\) for each Module in the neural network permitting such curvature. \(Q\) is quadratic in the input-dimension of a module \(p_{in} \times p_{in}\) and \(H\) in the output-dimension \(p_{out} \times p_{out}\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
-
N
(int
) –total number of data points
Returns:
-
loss
(Tensor
) – -
H
(`laplace.utils.matrix.Kron`
) –Kronecker factored Hessian approximation.
Source code in laplace/curvature/curvature.py
_get_mc_functional_fisher
#
Approximate the Fisher's middle matrix (expected outer product of the functional gradient)
using MC integral with self.num_samples
many samples.
Source code in laplace/curvature/curvature.py
full
#
full(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, **kwargs: dict[str, Any]) -> tuple[Tensor, Tensor]
Compute the full GGN \(P \times P\) matrix as Hessian approximation \(H_{ggn}\) with respect to parameters \(\theta \in \mathbb{R}^P\). For last-layer, reduced to \(\theta_{last}\)
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
Returns:
-
loss
(Tensor
) – -
H
(Tensor
) –GGN
(parameters, parameters)
Source code in laplace/curvature/curvature.py
EFInterface
#
EFInterface(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: CurvatureInterface
Interface for Empirical Fisher as Hessian approximation.
In addition to CurvatureInterface
, methods for gradients are required by subclasses.
Parameters:
-
model
(torch.nn.Module or `laplace.utils.feature_extractor.FeatureExtractor`
) –torch model (neural network)
-
likelihood
(('classification', 'regression')
, default:'classification'
) – -
last_layer
(bool
, default:False
) –only consider curvature of last layer
-
subnetwork_indices
(Tensor
, default:None
) –indices of the vectorized model parameters that define the subnetwork to apply the Laplace approximation over
-
dict_key_x
(str
, default:'input_ids'
) –The dictionary key under which the input tensor
x
is stored. Only has effect when the model takes aMutableMapping
as the input. Useful for Huggingface LLM models. -
dict_key_y
(str
, default:'labels'
) –The dictionary key under which the target tensor
y
is stored. Only has effect when the model takes aMutableMapping
as the input. Useful for Huggingface LLM models.
Attributes:
-
lossfunc
(MSELoss or CrossEntropyLoss
) – -
factor
(float
) –conversion factor between torch losses and base likelihoods For example, \(\frac{1}{2}\) to get to \(\mathcal{N}(f, 1)\) from MSELoss.
Source code in laplace/curvature/curvature.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\), via torch.func.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
enable_backprop
(bool
, default:= False
) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js
(Tensor
) –Jacobians
(batch, parameters, outputs)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
-
x
(Tensor
) – -
enable_backprop
(bool
, default:False
) –
Returns:
-
Js
(Tensor
) –Jacobians
(batch, outputs, last-layer-parameters)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
gradients(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor) -> tuple[Tensor, Tensor]
Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
y
(Tensor
) –
Returns:
-
Gs
(Tensor
) –gradients
(batch, parameters)
-
loss
(Tensor
) –
Source code in laplace/curvature/curvature.py
kron
#
kron(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, N: int, **kwargs: dict[str, Any]) -> tuple[Tensor, Kron]
Compute a Kronecker factored curvature approximation (such as KFAC). The approximation to \(H\) takes the form of two Kronecker factors \(Q, H\), i.e., \(H \approx Q \otimes H\) for each Module in the neural network permitting such curvature. \(Q\) is quadratic in the input-dimension of a module \(p_{in} \times p_{in}\) and \(H\) in the output-dimension \(p_{out} \times p_{out}\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
-
N
(int
) –total number of data points
Returns:
-
loss
(Tensor
) – -
H
(`laplace.utils.matrix.Kron`
) –Kronecker factored Hessian approximation.
Source code in laplace/curvature/curvature.py
full
#
full(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, **kwargs: dict[str, Any]) -> tuple[Tensor, Tensor]
Compute the full EF \(P \times P\) matrix as Hessian approximation \(H_{ef}\) with respect to parameters \(\theta \in \mathbb{R}^P\). For last-layer, reduced to \(\theta_{last}\)
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
Returns:
-
loss
(Tensor
) – -
H_ef
(Tensor
) –EF
(parameters, parameters)
Source code in laplace/curvature/curvature.py
AsdlInterface
#
AsdlInterface(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: CurvatureInterface
Interface for asdfghjkl backend.
Source code in laplace/curvature/asdl.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
-
x
(Tensor
) – -
enable_backprop
(bool
, default:False
) –
Returns:
-
Js
(Tensor
) –Jacobians
(batch, outputs, last-layer-parameters)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
full
#
Compute a dense curvature (approximation) in the form of a \(P \times P\) matrix \(H\) with respect to parameters \(\theta \in \mathbb{R}^P\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
Returns:
-
loss
(Tensor
) – -
H
(Tensor
) –Hessian approximation
(parameters, parameters)
Source code in laplace/curvature/curvature.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_\theta f(x;\theta)\) at current parameter \(\theta\) using asdfghjkl's gradient per output dimension.
Parameters:
-
x
(Tensor or MutableMapping(dict, UserDict)
) –input data
(batch, input_shape)
on compatible device with model if torch.Tensor. If MutableMapping, then at least containsself.dict_key_x
. The latter is specific for reward modeling. -
enable_backprop
(bool
, default:= False
) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js
(Tensor
) –Jacobians
(batch, parameters, outputs)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/asdl.py
gradients
#
gradients(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor) -> tuple[Tensor, Tensor]
Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\) using asdfghjkl's backend.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
y
(Tensor
) –
Returns:
-
loss
(Tensor
) – -
Gs
(Tensor
) –gradients
(batch, parameters)
Source code in laplace/curvature/asdl.py
_get_batch_size
#
_get_batch_size(x: Tensor | MutableMapping[str, Tensor | Any]) -> int | None
ASDL assumes that all leading dimensions are the batch size by default (batch_size = None). Here, we want to specify that only the first dimension is the actual batch size. This is the case for LLMs.
Source code in laplace/curvature/asdl.py
AsdlGGN
#
AsdlGGN(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels', stochastic: bool = False)
Bases: AsdlInterface
, GGNInterface
Implementation of the GGNInterface
using asdfghjkl.
Source code in laplace/curvature/asdl.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_\theta f(x;\theta)\) at current parameter \(\theta\) using asdfghjkl's gradient per output dimension.
Parameters:
-
x
(Tensor or MutableMapping(dict, UserDict)
) –input data
(batch, input_shape)
on compatible device with model if torch.Tensor. If MutableMapping, then at least containsself.dict_key_x
. The latter is specific for reward modeling. -
enable_backprop
(bool
, default:= False
) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js
(Tensor
) –Jacobians
(batch, parameters, outputs)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/asdl.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
-
x
(Tensor
) – -
enable_backprop
(bool
, default:False
) –
Returns:
-
Js
(Tensor
) –Jacobians
(batch, outputs, last-layer-parameters)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
gradients(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor) -> tuple[Tensor, Tensor]
Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\) using asdfghjkl's backend.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
y
(Tensor
) –
Returns:
-
loss
(Tensor
) – -
Gs
(Tensor
) –gradients
(batch, parameters)
Source code in laplace/curvature/asdl.py
full
#
full(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, **kwargs: dict[str, Any]) -> tuple[Tensor, Tensor]
Compute the full GGN \(P \times P\) matrix as Hessian approximation \(H_{ggn}\) with respect to parameters \(\theta \in \mathbb{R}^P\). For last-layer, reduced to \(\theta_{last}\)
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
Returns:
-
loss
(Tensor
) – -
H
(Tensor
) –GGN
(parameters, parameters)
Source code in laplace/curvature/curvature.py
_get_mc_functional_fisher
#
Approximate the Fisher's middle matrix (expected outer product of the functional gradient)
using MC integral with self.num_samples
many samples.
Source code in laplace/curvature/curvature.py
_get_batch_size
#
_get_batch_size(x: Tensor | MutableMapping[str, Tensor | Any]) -> int | None
ASDL assumes that all leading dimensions are the batch size by default (batch_size = None). Here, we want to specify that only the first dimension is the actual batch size. This is the case for LLMs.
Source code in laplace/curvature/asdl.py
AsdlEF
#
AsdlEF(model: Module, likelihood: Likelihood | str, last_layer: bool = False, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: AsdlInterface
, EFInterface
Implementation of the EFInterface
using asdfghjkl.
Source code in laplace/curvature/asdl.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_\theta f(x;\theta)\) at current parameter \(\theta\) using asdfghjkl's gradient per output dimension.
Parameters:
-
x
(Tensor or MutableMapping(dict, UserDict)
) –input data
(batch, input_shape)
on compatible device with model if torch.Tensor. If MutableMapping, then at least containsself.dict_key_x
. The latter is specific for reward modeling. -
enable_backprop
(bool
, default:= False
) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js
(Tensor
) –Jacobians
(batch, parameters, outputs)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/asdl.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
-
x
(Tensor
) – -
enable_backprop
(bool
, default:False
) –
Returns:
-
Js
(Tensor
) –Jacobians
(batch, outputs, last-layer-parameters)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
gradients(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor) -> tuple[Tensor, Tensor]
Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\) using asdfghjkl's backend.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
y
(Tensor
) –
Returns:
-
loss
(Tensor
) – -
Gs
(Tensor
) –gradients
(batch, parameters)
Source code in laplace/curvature/asdl.py
full
#
full(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, **kwargs: dict[str, Any]) -> tuple[Tensor, Tensor]
Compute the full EF \(P \times P\) matrix as Hessian approximation \(H_{ef}\) with respect to parameters \(\theta \in \mathbb{R}^P\). For last-layer, reduced to \(\theta_{last}\)
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
Returns:
-
loss
(Tensor
) – -
H_ef
(Tensor
) –EF
(parameters, parameters)
Source code in laplace/curvature/curvature.py
_get_batch_size
#
_get_batch_size(x: Tensor | MutableMapping[str, Tensor | Any]) -> int | None
ASDL assumes that all leading dimensions are the batch size by default (batch_size = None). Here, we want to specify that only the first dimension is the actual batch size. This is the case for LLMs.
Source code in laplace/curvature/asdl.py
AsdlHessian
#
AsdlHessian(model: Module, likelihood: Likelihood | str, last_layer: bool = False, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: AsdlInterface
Source code in laplace/curvature/asdl.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_\theta f(x;\theta)\) at current parameter \(\theta\) using asdfghjkl's gradient per output dimension.
Parameters:
-
x
(Tensor or MutableMapping(dict, UserDict)
) –input data
(batch, input_shape)
on compatible device with model if torch.Tensor. If MutableMapping, then at least containsself.dict_key_x
. The latter is specific for reward modeling. -
enable_backprop
(bool
, default:= False
) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js
(Tensor
) –Jacobians
(batch, parameters, outputs)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/asdl.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
-
x
(Tensor
) – -
enable_backprop
(bool
, default:False
) –
Returns:
-
Js
(Tensor
) –Jacobians
(batch, outputs, last-layer-parameters)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
gradients(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor) -> tuple[Tensor, Tensor]
Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\) using asdfghjkl's backend.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
y
(Tensor
) –
Returns:
-
loss
(Tensor
) – -
Gs
(Tensor
) –gradients
(batch, parameters)
Source code in laplace/curvature/asdl.py
_get_batch_size
#
_get_batch_size(x: Tensor | MutableMapping[str, Tensor | Any]) -> int | None
ASDL assumes that all leading dimensions are the batch size by default (batch_size = None). Here, we want to specify that only the first dimension is the actual batch size. This is the case for LLMs.
Source code in laplace/curvature/asdl.py
BackPackInterface
#
BackPackInterface(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: CurvatureInterface
Interface for Backpack backend.
Source code in laplace/curvature/backpack.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
-
x
(Tensor
) – -
enable_backprop
(bool
, default:False
) –
Returns:
-
Js
(Tensor
) –Jacobians
(batch, outputs, last-layer-parameters)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
full
#
Compute a dense curvature (approximation) in the form of a \(P \times P\) matrix \(H\) with respect to parameters \(\theta \in \mathbb{R}^P\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
Returns:
-
loss
(Tensor
) – -
H
(Tensor
) –Hessian approximation
(parameters, parameters)
Source code in laplace/curvature/curvature.py
kron
#
kron(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, N: int, **kwargs: dict[str, Any]) -> tuple[Tensor, Kron]
Compute a Kronecker factored curvature approximation (such as KFAC). The approximation to \(H\) takes the form of two Kronecker factors \(Q, H\), i.e., \(H \approx Q \otimes H\) for each Module in the neural network permitting such curvature. \(Q\) is quadratic in the input-dimension of a module \(p_{in} \times p_{in}\) and \(H\) in the output-dimension \(p_{out} \times p_{out}\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
-
N
(int
) –total number of data points
Returns:
-
loss
(Tensor
) – -
H
(`laplace.utils.matrix.Kron`
) –Kronecker factored Hessian approximation.
Source code in laplace/curvature/curvature.py
diag
#
Compute a diagonal Hessian approximation to \(H\) and is represented as a vector of the dimensionality of parameters \(\theta\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
Returns:
-
loss
(Tensor
) – -
H
(Tensor
) –vector representing the diagonal of H
Source code in laplace/curvature/curvature.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\) using backpack's BatchGrad per output dimension. Note that BackPACK doesn't play well with torch.func, so this method has to be overridden.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
enable_backprop
(bool
, default:= False
) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js
(Tensor
) –Jacobians
(batch, parameters, outputs)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/backpack.py
gradients
#
gradients(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor) -> tuple[Tensor, Tensor]
Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\) using Backpack's BatchGrad. Note that BackPACK doesn't play well with torch.func, so this method has to be overridden.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
y
(Tensor
) –
Returns:
-
Gs
(Tensor
) –gradients
(batch, parameters)
-
loss
(Tensor
) –
Source code in laplace/curvature/backpack.py
BackPackGGN
#
BackPackGGN(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels', stochastic: bool = False)
Bases: BackPackInterface
, GGNInterface
Implementation of the GGNInterface
using Backpack.
Source code in laplace/curvature/backpack.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\) using backpack's BatchGrad per output dimension. Note that BackPACK doesn't play well with torch.func, so this method has to be overridden.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
enable_backprop
(bool
, default:= False
) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js
(Tensor
) –Jacobians
(batch, parameters, outputs)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/backpack.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
-
x
(Tensor
) – -
enable_backprop
(bool
, default:False
) –
Returns:
-
Js
(Tensor
) –Jacobians
(batch, outputs, last-layer-parameters)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
gradients(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor) -> tuple[Tensor, Tensor]
Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\) using Backpack's BatchGrad. Note that BackPACK doesn't play well with torch.func, so this method has to be overridden.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
y
(Tensor
) –
Returns:
-
Gs
(Tensor
) –gradients
(batch, parameters)
-
loss
(Tensor
) –
Source code in laplace/curvature/backpack.py
full
#
full(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, **kwargs: dict[str, Any]) -> tuple[Tensor, Tensor]
Compute the full GGN \(P \times P\) matrix as Hessian approximation \(H_{ggn}\) with respect to parameters \(\theta \in \mathbb{R}^P\). For last-layer, reduced to \(\theta_{last}\)
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
Returns:
-
loss
(Tensor
) – -
H
(Tensor
) –GGN
(parameters, parameters)
Source code in laplace/curvature/curvature.py
_get_mc_functional_fisher
#
Approximate the Fisher's middle matrix (expected outer product of the functional gradient)
using MC integral with self.num_samples
many samples.
Source code in laplace/curvature/curvature.py
BackPackEF
#
BackPackEF(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: BackPackInterface
, EFInterface
Implementation of EFInterface
using Backpack.
Source code in laplace/curvature/backpack.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\) using backpack's BatchGrad per output dimension. Note that BackPACK doesn't play well with torch.func, so this method has to be overridden.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
enable_backprop
(bool
, default:= False
) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js
(Tensor
) –Jacobians
(batch, parameters, outputs)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/backpack.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
-
x
(Tensor
) – -
enable_backprop
(bool
, default:False
) –
Returns:
-
Js
(Tensor
) –Jacobians
(batch, outputs, last-layer-parameters)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
gradients(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor) -> tuple[Tensor, Tensor]
Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\) using Backpack's BatchGrad. Note that BackPACK doesn't play well with torch.func, so this method has to be overridden.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
y
(Tensor
) –
Returns:
-
Gs
(Tensor
) –gradients
(batch, parameters)
-
loss
(Tensor
) –
Source code in laplace/curvature/backpack.py
full
#
full(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, **kwargs: dict[str, Any]) -> tuple[Tensor, Tensor]
Compute the full EF \(P \times P\) matrix as Hessian approximation \(H_{ef}\) with respect to parameters \(\theta \in \mathbb{R}^P\). For last-layer, reduced to \(\theta_{last}\)
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
Returns:
-
loss
(Tensor
) – -
H_ef
(Tensor
) –EF
(parameters, parameters)
Source code in laplace/curvature/curvature.py
CurvlinopsInterface
#
CurvlinopsInterface(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: CurvatureInterface
Interface for Curvlinops backend. https://github.com/f-dangel/curvlinops
Source code in laplace/curvature/curvlinops.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\), via torch.func.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
enable_backprop
(bool
, default:= False
) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js
(Tensor
) –Jacobians
(batch, parameters, outputs)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
-
x
(Tensor
) – -
enable_backprop
(bool
, default:False
) –
Returns:
-
Js
(Tensor
) –Jacobians
(batch, outputs, last-layer-parameters)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
gradients(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor) -> tuple[Tensor, Tensor]
Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
y
(Tensor
) –
Returns:
-
Gs
(Tensor
) –gradients
(batch, parameters)
-
loss
(Tensor
) –
Source code in laplace/curvature/curvature.py
diag
#
Compute a diagonal Hessian approximation to \(H\) and is represented as a vector of the dimensionality of parameters \(\theta\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
Returns:
-
loss
(Tensor
) – -
H
(Tensor
) –vector representing the diagonal of H
Source code in laplace/curvature/curvature.py
CurvlinopsGGN
#
CurvlinopsGGN(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels', stochastic: bool = False)
Bases: CurvlinopsInterface
, GGNInterface
Implementation of the GGNInterface
using Curvlinops.
Source code in laplace/curvature/curvlinops.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\), via torch.func.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
enable_backprop
(bool
, default:= False
) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js
(Tensor
) –Jacobians
(batch, parameters, outputs)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
-
x
(Tensor
) – -
enable_backprop
(bool
, default:False
) –
Returns:
-
Js
(Tensor
) –Jacobians
(batch, outputs, last-layer-parameters)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
gradients(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor) -> tuple[Tensor, Tensor]
Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
y
(Tensor
) –
Returns:
-
Gs
(Tensor
) –gradients
(batch, parameters)
-
loss
(Tensor
) –
Source code in laplace/curvature/curvature.py
_get_mc_functional_fisher
#
Approximate the Fisher's middle matrix (expected outer product of the functional gradient)
using MC integral with self.num_samples
many samples.
Source code in laplace/curvature/curvature.py
CurvlinopsEF
#
CurvlinopsEF(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: CurvlinopsInterface
, EFInterface
Implementation of EFInterface
using Curvlinops.
Source code in laplace/curvature/curvlinops.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\), via torch.func.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
enable_backprop
(bool
, default:= False
) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js
(Tensor
) –Jacobians
(batch, parameters, outputs)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
-
x
(Tensor
) – -
enable_backprop
(bool
, default:False
) –
Returns:
-
Js
(Tensor
) –Jacobians
(batch, outputs, last-layer-parameters)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
gradients(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor) -> tuple[Tensor, Tensor]
Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
y
(Tensor
) –
Returns:
-
Gs
(Tensor
) –gradients
(batch, parameters)
-
loss
(Tensor
) –
Source code in laplace/curvature/curvature.py
CurvlinopsHessian
#
CurvlinopsHessian(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: CurvlinopsInterface
Implementation of the full Hessian using Curvlinops.
Source code in laplace/curvature/curvlinops.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\), via torch.func.
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
enable_backprop
(bool
, default:= False
) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js
(Tensor
) –Jacobians
(batch, parameters, outputs)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
-
x
(Tensor
) – -
enable_backprop
(bool
, default:False
) –
Returns:
-
Js
(Tensor
) –Jacobians
(batch, outputs, last-layer-parameters)
-
f
(Tensor
) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
gradients(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor) -> tuple[Tensor, Tensor]
Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
on compatible device with model. -
y
(Tensor
) –
Returns:
-
Gs
(Tensor
) –gradients
(batch, parameters)
-
loss
(Tensor
) –
Source code in laplace/curvature/curvature.py
diag
#
Compute a diagonal Hessian approximation to \(H\) and is represented as a vector of the dimensionality of parameters \(\theta\).
Parameters:
-
x
(Tensor
) –input data
(batch, input_shape)
-
y
(Tensor
) –labels
(batch, label_shape)
Returns:
-
loss
(Tensor
) – -
H
(Tensor
) –vector representing the diagonal of H