Tiresias is an open source platform for enabling privacy preserving, peer-to-peer, machine learning.
Disclaimer: Do not use this platform to handle actual user data. This is currently an experimental system for privacy and machine learning research and should not be used in production settings.
Installation
To install this library, make sure you are running a supported version of Python and run the following command:
git clone <https://github.com/DAI-Lab/Tiresias>
cd Tiresias
pip install -e .
To make sure your installation succeeded, you can execute the unit tests by running pytest
. If all the tests pass, congratulations, you are now ready to start using the Tiresias platform.
Quick Start
The Tiresias platform consists of a server which is responsible for keeping track of tasks and one or more clients that run on the data contributor's device. When a requestor wants to create a task, they submit it to the server who relays it to the clients; the clients can then choose whether to contribute their data to each task on a case-by-case basis.
Running the Server
To launch the tiresias.server
, run the following command and specify an open port:
tiresias-server –port 3000
If you navigate to <http://127.0.0.1:3000/
> in your web browser, you'll see a list of open
tasks. This list will initially be empty - we will demonstrate how you can submit tasks to
the platform in a later section.
Running the Client
Now that your platform server is up and running, you can ask your data contributors to launch the user client and point it at your server with the below command.
tiresias –server http://127.0.0.1:3000/
The user client comes with a few built-in dummy datasets that you can play with; however, it's up to the data contributors to choose what additional data collection applications they want to install (i.e. an app that tracks screen time).
The user client automatically opens the user interface in your default web browser. The user interface will show you any open tasks that you can choose to contribute to, as well as a list of the columns that are being collected in your personal data store.
Note that although the user interface is displayed in your web browser, it will not communicate with the outside world unless you explicitely consent to accepting a task.
Submitting a Task
Now that you have both the platform server and one or more user clients up and running, we can start writing tasks using the Python API. Here's an example of a basic task which estimates the median age of the user:
from tiresias.server import remote
task_id = remote.create_task('http://127.0.0.1:3000/', {
"name": "Median Age",
"type": "basic",
"epsilon": 2.0,
"delta": 1e-5,
"min_count": 10,
"featurizer": "SELECT age FROM profile.demographics",
"aggregator": "median"
})
The task_id
is an unique identifier which you can use to check on the status of the query:
print(remote.fetch_task("http://127.0.0.1:3000/", task_id))
Note that this particular task requires a minimum of 10 users to complete.
Advanced Usage
Exploring Task Types
Basic Task
This task would like to access your data by running [SQL]. Your data will be sent to the Tiresias server, where it will be combined with at least [COUNT] other users data and aggregated into a single [AGGREGATOR] value which will have noise added to make it ([EPSILON], [DELTA]) differentially private to reduce the risk that anyone can figure out your contribution to this task. Only this differentially private value will be released to the data requester.
{
"type": "basic",
"epsilon": [FLOAT],
"delta": [FLOAT],
"min_count": [INT],
"featurizer": [SQL],
"aggregator": [AGGREGATOR]
}
Integrated Task
This task would like to access your data by running [SQL]. Your data will be sent to the Tiresias server, where it will be combined with at least [COUNT] other users data and used to train a [MODEL] model to predict [OUTPUT]. This model will have noise added to it to make it ([EPSILON], [DELTA]) differentially private to reduce the risk that anyone can figure out your contribution to this task. Only this differentially private model will be released to the data requester.
{
"type": "integrated",
"epsilon": [FLOAT],
"delta": [FLOAT],
"featurizer": [SQL],
"model": [MODEL],
"inputs": [[VARNAME], [VARNAME], ...],
"output": [VARNAME]
}
Bounded Task
This task would like to access your data by running [SQL]. Before sending your data to the server, we will add noise to make it ([EPSILON], [DELTA]) differentially private to reduce the risk that anyone can figure out your contribution to this task.
{
"type": "bounded",
"epsilon": [FLOAT],
"delta": [FLOAT],
"featurizer": [SQL],
"bounds": {
"[VARNAME]": {
"type": "[BOUND_TYPE]",
"default": [FINITE_VALUE],
"values": [[FINITE_VALUE], [FINITE_VALUE]],
}
}
}
Gradient Task
This task would like to access your data by running [SQL] and the computing the gradients for a model. Note that your actual data will not be sent to the server, only the gradients, and even then, we add noise to the gradients before sending it to make it ([EPSILON], [DELTA]) differentially private to reduce the risk that anyone can figure out your contribution to this task.
{
"type": "gradient",
"epsilon": 10.0,
"delta": 1e-5,
"lr": 0.01,
"min_sample_size": 100,
"featurizer": "SELECT x1, x2, y FROM profile.example",
"model": torch.nn.Sequential(
torch.nn.Linear(2, 10),
torch.nn.ReLU(),
torch.nn.Linear(10, 1),
),
"loss": torch.nn.functional.mse_loss,
"inputs": ["x0", "x1"],
"output": ["y"],
}
Writing Data Collectors
Coming soon…
Expand source code
"""
.. include:: ../README.md
"""
Sub-modules
tiresias.benchmark
tiresias.client
tiresias.core
tiresias.server