Module tiresias.core.regression
Expand source code
import numpy as np
import diffprivlib.models as dp
from tiresias.core.mechanisms import approximate_bounds
class LinearRegression(dp.LinearRegression):
def fit(self, X, y, sample_weight=None):
# TODO: concat X and y for norm, specify ranges
if not self.data_norm:
self.epsilon /= 2.0
row_norms = np.linalg.norm(X, axis=1)
_, max_norm = approximate_bounds(row_norms, self.epsilon)
self.data_norm = max_norm
for i in range(X.shape[0]):
if np.linalg.norm(X[i]) > self.data_norm:
X[i] = X[i] * (self.data_norm - 1e-5) / np.linalg.norm(X[i])
return super().fit(X, y, sample_weight=sample_weight)
Classes
class LinearRegression (epsilon=1.0, data_norm=None, range_X=None, range_y=None, fit_intercept=True, copy_X=True, **unused_args)
-
Ordinary least squares Linear Regression with differential privacy.
LinearRegression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation. Differential privacy is guaranteed with respect to the training sample.
Differential privacy is achieved by adding noise to the second moment matrix using the :class:
.Wishart
mechanism. This method is demonstrated in [She15], but our implementation takes inspiration from the use of the Wishart distribution in [IS16] to achieve a strict differential privacy guarantee.Parameters
epsilon
:float
, optional, default1.0
- Privacy parameter :math:
\epsilon
. data_norm
:float
, default:None
-
The max l2 norm of any row of the concatenated dataset A = [X; y]. This defines the spread of data that will be protected by differential privacy.
If not specified, the max norm is taken from the data when
.fit()
is first called, but will result in a :class:.PrivacyLeakWarning
, as it reveals information about the data. To preserve differential privacy fully,data_norm
should be selected independently of the data, i.e. with domain knowledge. range_X
:array_like
-
Range of each feature of the training sample X. Its non-private equivalent is np.ptp(X, axis=0).
If not specified, the range is taken from the data when
.fit()
is first called, but will result in a :class:.PrivacyLeakWarning
, as it reveals information about the data. To preserve differential privacy fully,range_X
should be selected independently of the data, i.e. with domain knowledge. range_y
:array_like
- Same as
range_X
, but for the training label sety
. fit_intercept
:bool
, optional, defaultTrue
- Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).
copy_X
:bool
, optional, defaultTrue
- If True, X will be copied; else, it may be overwritten.
Attributes
coef_
:array
ofshape
(n_features
, ) or (n_targets
,n_features
)- Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.
rank_
:int
- Rank of matrix
X
. singular_
:array
ofshape
(min
(X
,y
),)- Singular values of
X
. intercept_
:float
orarray
ofshape
of (n_targets
,)- Independent term in the linear model. Set to 0.0 if
fit_intercept = False
.
References
.. [She15] Sheffet, Or. "Private approximations of the 2nd-moment matrix using existing techniques in linear regression." arXiv preprint arXiv:1507.00056 (2015).
.. [IS16] Imtiaz, Hafiz, and Anand D. Sarwate. "Symmetric matrix perturbation for differentially-private principal component analysis." In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2339-2343. IEEE, 2016.
Expand source code
class LinearRegression(dp.LinearRegression): def fit(self, X, y, sample_weight=None): # TODO: concat X and y for norm, specify ranges if not self.data_norm: self.epsilon /= 2.0 row_norms = np.linalg.norm(X, axis=1) _, max_norm = approximate_bounds(row_norms, self.epsilon) self.data_norm = max_norm for i in range(X.shape[0]): if np.linalg.norm(X[i]) > self.data_norm: X[i] = X[i] * (self.data_norm - 1e-5) / np.linalg.norm(X[i]) return super().fit(X, y, sample_weight=sample_weight)
Ancestors
- diffprivlib.models.linear_regression.LinearRegression
- sklearn.linear_model.base.LinearRegression
- sklearn.linear_model.base.LinearModel
- abc.NewBase
- sklearn.base.BaseEstimator
- sklearn.base.RegressorMixin
Methods
def fit(self, X, y, sample_weight=None)
-
Fit linear model.
Parameters
X
:array
-like
orsparse
matrix
,shape
(n_samples
,n_features
)- Training data
y
:array_like
,shape
(n_samples
,n_targets
)- Target values. Will be cast to X's dtype if necessary
sample_weight
:ignored
- Ignored by diffprivlib. Present for consistency with sklearn API.
Returns
self : returns an instance of self.
Expand source code
def fit(self, X, y, sample_weight=None): # TODO: concat X and y for norm, specify ranges if not self.data_norm: self.epsilon /= 2.0 row_norms = np.linalg.norm(X, axis=1) _, max_norm = approximate_bounds(row_norms, self.epsilon) self.data_norm = max_norm for i in range(X.shape[0]): if np.linalg.norm(X[i]) > self.data_norm: X[i] = X[i] * (self.data_norm - 1e-5) / np.linalg.norm(X[i]) return super().fit(X, y, sample_weight=sample_weight)