Digital Identity, Trust and Privacy on the open Internet

Archive for the ‘Personal Data’ Category

Query Smart Contracts: Bringing the Algorithm to the Data

without comments

One paradigm shift being championed by the MIT OPAL/Enigma community is that of using (sharing) algorithms that have been analyzed by experts and have been vetted to be “safe” from the perspective of privacy-preservation. The term “Open Algorithm” (OPAL) here implies that the vetted queries (“algorithms”) are made open by publishing them, allowing other experts to review them and allowing other researchers to make use of them in their own context of study.

One possible realization of the Open Algorithms paradigm is the use of smart contracts to capture these safe algorithms in the form of executable queries residing in a legally binding digital contract.

What I’m proposing is the following: instead of a centralized data processing architecture, the P2P nodes (e.g. in a blockchain) offers the opportunity for data (user data and organizational data) to be stored by these nodes and be processed in a privacy-preserving manner, accessible via well-known APIs and authorization tokens and the use of smart contracts to let the “query meet the data”.

In this new paradigm of privacy-preserving data sharing, we “move the algorithm to the data” where queries and subqueries are computed by the data repositories (nodes on the P2P network). This means that repositories never release raw data and that they perform the algorithm/query computation locally which produce aggregate answers only. This approach of moving the algorithm to the data provides data-owners and other joint rights-holders the opportunity to exercise control over data release, and thus offers a way forward to provide the highest degree of privacy-preservation while allowing data to still be effectively shared.

This paradigm requires that queries be decomposed into one or more subqueries, where each subquery is sent to the appropriate data repository (nodes on the P2P network) and be executed at that repository. This allows each data repository to evaluate received subqueries in terms of “safety” from a privacy and data leakage perspective.

Furthermore, safe queries and subqueries can be expressed in the form of a Query Smart Contract  (QSC) that legally bind the querier (person or organization), the data repository and other related entities.

A query smart contract that has been vetted to be safe can be stored on nodes of the P2P network (e.g. blockchain). This allows Queriers to not only search for useful data (as advertised by the metadata in the repositories) but also search for prefabricated safe QSCs that are available throughout the P2P network that match the intended application. Such a query smart contract will require that identities and authorizations requirements be encoded within the contract.

A node on the P2P network may act as a Delegate Node in the completion of a subquery smart contract.  A delegate node works on a subquery by locating the relevant data repositories, sending the appropriate subquery to each data repository, and receiving individual answers and collating the results received from these data repositories for reporting to the (paying) Querier.

A Delegate Node that seeks to fulfill a query smart contract should only do so when all the conditions of the contract has been fulfilled (e.g. QSC has valid signature; identity of Querier is established; authorization to access APIs at data repositories has been obtained; payment terms has been agreed, etc.). A hierarchy of delegate nodes may be involved in the completion of a given query originating from the Querier entity. The remuneration scheme for all Delegate Nodes and the data repositories involved in a query is outside the scope of the current use-case.

Written by thomas

August 6th, 2016 at 1:21 pm

New Principles for Privacy-Preserving Queries for Distributed Data

without comments

Here are the three (3) principles for privacy-preserving computation based on the Enigma P2P distributed multi-party computation model:

(a) Bring the Query to the Data: The current model is for the querier to fetch copies of all the data-sets from the distributed nodes, then import the data-sets into the big data processing infra and then run queries. Instead, break-up the query into components (sub-queries) and send the query pieces to the corresponding nodes on the P2P network.

(b) Keep Data Local: Never let raw data leave the node. Raw data must never leaves its physical location or the control of its owner. Instead, nodes that carry relevant data-sets execute sub-queries and report on the result.

(c) Never Decrypt Data: Homomorphic encryption remains an open field of study. However, certain types of queries can be decomposed into rudimentary operations (such as additions and multiplications) on encrypted data that would yield equivalent answers to the case where the query was run on plaintext data.


Written by thomas

August 30th, 2015 at 5:51 pm

Towards a Trustworthy Digital Infrastructure for Core Identities and Personal Data Stores

without comments

So that was the title of my paper at the ID360 conference at UTexas in April. A copy of the PDF paper is here: hardjono-greenwood-coreid04C-ID360



Written by admin

May 22nd, 2013 at 5:42 pm

Transparency of usage of personal data: the need for a HIPAA-like regime

without comments

Ray Campbell hits the ball out of the park again with his awesome suggestion in his blog: we need a HIPAA-like regime for the privacy of personal data.  As a mental exercise, Ray has gone through the HIPAA document and substituted “individually identifiable health information” to “individually identifiable personal information“. The red-lined doc can also be found on his site.

The at the heart of his proposal is the notion of shifting the thought paradigm from the person as the absolute owner of his/her personal data to one where the person is seeking the right to know about who has his/her personal data, how they obtained it, what are they doing with it and to whom have they sold the data (the 4 questions).

Following on from Ray’s post and from Professor Sandy Pentland’s view on the New Deal on Data, I believe there should be a new market in the digital economy where individuals can meet directly with buyers of their personal data, and where individuals can opt-in to make more data about themselves available to these buyers.  Cut out the middleman — the big data corporations that are not contributing to the efficiency of free markets.


Written by thomas

March 6th, 2013 at 9:02 pm

The 4 questions on transparency in personal data (disclosure management)

without comments

MIT Media Lab – 2013 Legal Hack-a-thon on Identity

Ray Campbell argues quite elegantly and convincingly that the “data ownership” paradigm is not the correct paradigm for achieving privacy and control over personal data. The notion that “I own my data” can be impractical especially in the light of 2-party transactions, where the other party may also “own” portions of the transaction data and where they might be legally bound to keep copies of “my data”.

Instead, the better approach is to look at “transparency” and visibility into where our data reside and who is using it. Here are the four questions that Ray poses:

  • Who has my data
  • What data they have about me
  • How did they acquire my data
  • How are they using my data

Transparency becomes an important tool disclosure management of personal data. These questions could be the basis for the development of a  trust framework on data transparency, one which can be used to frame Terms of Service that both myself and the Relying Party must accept.

Written by thomas

January 30th, 2013 at 9:16 pm