So our new books are out and available now on Amazon. The first one is on Fin Tech, and the second on data, privacy and identity.
Continuing from the previous post (Part I of the Core Identity series), the goal of a Core Identity Issuer (CoreID Issuer) is to collate sufficient data – aggregate data and non-PII data — from members of a given Circle of Trust in order to create a Core Identity and Core Identifier for a given user (see Figure).
The Issuer performs this task as a trusted member of the Circle of Trust, governed by rules of operations (i.e. legal contract) and with the consent of the user. Architectures and techniques such as MIT OPAL/Enigma can be used here in order for the CoreID Issuer to obtain privacy-preserving aggregate data from the various sources who are members of the Circle of Trust.
The goals of the Core-ID Issuer within a Circle of Trust are as follows:
- Onboard a member-user: The Issuer’s primary function is to on-board users who are known to the CoT community, and who have requested and consented to the creation of a Core Identity.
- Collate PII-free data into a Core Identity: The Issuer obtains aggregate data and other PII-free data regarding the user from members of the CoT. This becomes the core identity for the user, which is retained by the Issuer for the duration selected by the user. The Issuer must keep the core identity as secret, accessible only to the user.
- Generate Core Identifier (unlinkable): For a given user and their core identity, the Issuer generates a core identifier (e.g. random number) that must be unlikable to the core identity. Note that a core identifier must not be used in a transaction. The core identifier value may be contained as a signed certificate or other signed data structure, with the Persona Provider as its intended audience (see Figure).
- Issue Core Assertions regarding the Core Identifier: The purpose of the Issuer generating a core identifier is to allow PII-free core-assertions regarding the user to be created. These signed core assertions must retain the privacy of the user, and must declare assertions about the core identifier.
- Interface with Persona Providers: The Issuer’s main audience is the Persona Provider, who must operate with the Issuer under legal trust framework that calls-out user privacy as a strict requirement. The Issuer must make available the necessary issuance end-points (i.e. APIs) as well as validation end-points to the Persona Provider. In some cases, from an operational deployment view the Issuer and Persona Provider may be co-located or even tightly coupled under the same provider entity, although the functional difference and boundaries are clear.
Etymology: Middle French identité, from Late Latin identitat-, identitas, probably from Latin identidem repeatedly, contraction of idem et idem, literally, same and same (Merriam-Webster Dictionary).
Identity is about trusted data — trusted personal data. Human beings live within social constructs and communities. People who know me can vouch for me. Organizations that know me can issue assertions or attestations about me.
At the heart of all this is the notion of the core identity, something that is inherent as part of me and inalienable from me.
There are a number of key concepts and principles underlying the notion of core identities, core identifiers and personas (see Figure).
A fundamental concept is that of derived identities and derived identifiers which provides not only privacy to its user (the person or organization that it represents) but also a degree of defense in the case of attacks against identity providers (e.g. identity theft) and for a safety net in the rare case of weaknesses within the underlying cryptographic implementations.
I would argue that transaction identities and transaction identifiers are the forms of identities that should be used on the Internet and that they must be derived (e.g. cryptographically) from a core identity which itself must be kept as private or secret. Should a transaction identity be compromised or stolen, it can be placed on a public “blacklist” and a new one be derived for the user. The derivation process or algorithms must maintain the privacy of the user and the secrecy of the user’s core identity.
We define identity as the collective aspect of the set of characteristic or features by which a thing (e.g. human; device; organization) is recognizable and distinguishable one from another. In the context of a human person, individuality of a person plays an important role in that it allows a community of people to recognize the distinct characteristics of an individual person and consider the person as a persisting entity.
- Core identity: The collective aspect of the set of characteristics (as represented by personal data) by which a person is uniquely recognizable, and from which a unique core identifier may be generated based on the set of relevant personal data.
Thus, for example, the set of transaction data associated with a person can be collated and be used to create a core identity that distinguishes that person from others. Out of this set of transaction data, a core identifier may be generated and be held as a joint secret by its issuer and the person. The core identity pertaining to a person must be kept secret, and not be used in transactions.
- Core Identifier: A secret data (e.g. string) or secret mechanism (e.g. crypto function) that uniquely identifies a person or entity. The core identifier must be immutable, must be kept secret, and never be used directly in transactions.
- Persona: A persona is defined by and created based on a collection of attributes used in a given context or a given relationship. Thus, a person may have a work-persona, home-persona, social-persona, and others. Each of these personas is context-dependent and involves only the relevant subset of the core identity characteristics of that person. A person may have one or more personas.
- Transaction Identity and Identifier: When an individual seeks to perform a transaction (e.g. on the Internet, on a blockchain or other transacting mediums) he or she chooses a relevant persona and derives from that persona a transaction identity (and corresponding digital identifier) to be used in the transaction. A transaction identity maybe short-lived and may even be created only for that single transaction instance. A useful analogy is that of credit card numbers, which may be used at a Point of Sale (POS) locations without the user needing to provide any additional identifiers (i.e. reveal data from their core identity such as Social Security Number) and which may be replaced at any time without impacting the user’s core identity.
One paradigm shift being championed by the MIT OPAL/Enigma community is that of using (sharing) algorithms that have been analyzed by experts and have been vetted to be “safe” from the perspective of privacy-preservation. The term “Open Algorithm” (OPAL) here implies that the vetted queries (“algorithms”) are made open by publishing them, allowing other experts to review them and allowing other researchers to make use of them in their own context of study.
One possible realization of the Open Algorithms paradigm is the use of smart contracts to capture these safe algorithms in the form of executable queries residing in a legally binding digital contract.
What I’m proposing is the following: instead of a centralized data processing architecture, the P2P nodes (e.g. in a blockchain) offers the opportunity for data (user data and organizational data) to be stored by these nodes and be processed in a privacy-preserving manner, accessible via well-known APIs and authorization tokens and the use of smart contracts to let the “query meet the data”.
In this new paradigm of privacy-preserving data sharing, we “move the algorithm to the data” where queries and subqueries are computed by the data repositories (nodes on the P2P network). This means that repositories never release raw data and that they perform the algorithm/query computation locally which produce aggregate answers only. This approach of moving the algorithm to the data provides data-owners and other joint rights-holders the opportunity to exercise control over data release, and thus offers a way forward to provide the highest degree of privacy-preservation while allowing data to still be effectively shared.
This paradigm requires that queries be decomposed into one or more subqueries, where each subquery is sent to the appropriate data repository (nodes on the P2P network) and be executed at that repository. This allows each data repository to evaluate received subqueries in terms of “safety” from a privacy and data leakage perspective.
Furthermore, safe queries and subqueries can be expressed in the form of a Query Smart Contract (QSC) that legally bind the querier (person or organization), the data repository and other related entities.
A query smart contract that has been vetted to be safe can be stored on nodes of the P2P network (e.g. blockchain). This allows Queriers to not only search for useful data (as advertised by the metadata in the repositories) but also search for prefabricated safe QSCs that are available throughout the P2P network that match the intended application. Such a query smart contract will require that identities and authorizations requirements be encoded within the contract.
A node on the P2P network may act as a Delegate Node in the completion of a subquery smart contract. A delegate node works on a subquery by locating the relevant data repositories, sending the appropriate subquery to each data repository, and receiving individual answers and collating the results received from these data repositories for reporting to the (paying) Querier.
A Delegate Node that seeks to fulfill a query smart contract should only do so when all the conditions of the contract has been fulfilled (e.g. QSC has valid signature; identity of Querier is established; authorization to access APIs at data repositories has been obtained; payment terms has been agreed, etc.). A hierarchy of delegate nodes may be involved in the completion of a given query originating from the Querier entity. The remuneration scheme for all Delegate Nodes and the data repositories involved in a query is outside the scope of the current use-case.
I often get asked to provide a brief explanation about MIT Enigma — notably what it is, and why it is important particularly in the current age of P2P networking and blockchain technology. So here’s a brief summary.
The MIT Enigma system is part of a broader initiative at MIT Connections Science called the Open Algorithms for Equity, Accountability, Security, and Transparency (OPAL-EAST).
The MIT Enigma system employs two core cryptographic constructs simultaneously atop a Peer-to-Peer (P2P network of nodes). These are secrets-sharing (ala Shamir’s Linear Secret Sharing Scheme (LSSS)) and multiparty computation (MPC). Although secret sharing and MPC are topics of research for the past two decades, the innovation that MIT Enigma brings is the notion of employing these constructions on a P2P network of nodes (such as the blockchain) while providing “Proof-of-MPC” (like proof of work) that a node has correctly performed some computation.
In secret-sharing schemes, a given data item is “split” into a number of ciphertext pieces (called “shares”) that are then stored separately. When the data item needs to be reconstituted or reconstructed, a minimum or “threshold” number of shares need to be obtained and merged together again in a reverse cryptographic computation. For example, in Naval parlance this is akin to needing 2 out of 3 keys in order to perform some crucial task (e.g. activate the missile). Some secret sharing schemes possess the feature that some primitive arithmetic operations can be performed on shares (shares “added” to shares) yielding a result without the need to fully reconstitute the data items first. In effect, this feature allows operations to be performed on encrypted data (similar to homomorphic encryption schemes).
The MIT Enigma system proposes to use a P2P network of nodes to randomly store the relevant shares belonging to data items. In effect, the data owner no longer needs to keep a centralized database of data-items (e.g. health data) and instead would transform each data item into shares and disperse these on the P2P network of node. Only the data owner would know the locations of the shares, and can fetch these from the nodes as needed. Since each of these shares appear as garbled ciphertext to the nodes, the nodes are oblivious to their meaning or significance. A node in the P2P network would be remunerated for storage costs and the store/fetch operations.
The second cryptographic construct employed in MIT Enigma multiparty computation (MPC). The study of MPC schemes seeks to address the problem of a group of entities needing to share some common output (e.g. result of computation) whilst maintaining as secret their individual data items. For example, a group of patients may wish to collaboratively compute their average blood pressure information among them, but without each patient sharing actual raw data about their blood pressure information.
The MIT Enigma system combines the use of MPC schemes with secret-sharing schemes, effectively allowing some computations to be performed using the shares that are distributed on the P2P. The combination of these 3 computing paradigms (secret-sharing, MPC and P2P nodes) opens new possibilities in addressing the current urgent issues around data privacy and the growing liabilities on the part of organizations who store or work on large amounts of data.
Here are the three (3) principles for privacy-preserving computation based on the Enigma P2P distributed multi-party computation model:
(a) Bring the Query to the Data: The current model is for the querier to fetch copies of all the data-sets from the distributed nodes, then import the data-sets into the big data processing infra and then run queries. Instead, break-up the query into components (sub-queries) and send the query pieces to the corresponding nodes on the P2P network.
(b) Keep Data Local: Never let raw data leave the node. Raw data must never leaves its physical location or the control of its owner. Instead, nodes that carry relevant data-sets execute sub-queries and report on the result.
(c) Never Decrypt Data: Homomorphic encryption remains an open field of study. However, certain types of queries can be decomposed into rudimentary operations (such as additions and multiplications) on encrypted data that would yield equivalent answers to the case where the query was run on plaintext data.
One important news item this week from the IoT space is the support by Atmel of Intel’s EPID technology.
Enhanced Privacy ID (EPID) grew from the work of Ernie Brickell and Jiangtao Li based on previous work on Direct Anonymous Attestations (DAA). DAA is very relevant because it is built-in into the TPM1.2 chip (of which there are several hundred million in PC machines).
Here is a quick summary of EPID:
- EPID is a special digital signature scheme.
- One public key corresponds to multiple private keys.
- Private key generates a EPID signature.
- EPID signature can be verified using the public key.
Interesting Security Properties:
- Anonymous/Unlinkable: Given two EPID signatures one cannot determine whether they are generated from one or two private keys.
- Unforgeable: Without a private key one cannot create a valid signature.
This is terrific news: a couple of students want to give all undergrads $100 worth of Bitcoin. Here is the news in MIT’s The Tech.
- “While the specific properties of bitcoin have some real problems, getting everyone at MIT to start playing with bitcoin … will prompt the MIT community to begin thinking seriously about how we can live in an all-digital future.” (Sandy Pentland)
- Rubin and Elitzer want to see a bitcoin “ecosystem” develop at MIT in which people are not only exchanging bitcoins but also experimenting with related technologies.
(NB. I love it when people get it. Hal Hodson definitely gets it. Many folks at the MIT-KIT conference this week got it.)
03 October 2013 by Hal Hodson
Magazine issue 2937.
Software like openPDS acts as a bodyguard for your personal data when apps – or even governments – come snooping
Editorial: “Time for us all to take charge of our personal data”
BIG BROTHER is watching you. But that doesn’t mean you can’t do something about it – by wresting back control of your data.
Everything we do online generates information about us. The tacit deal is that we swap this data for free access to services like Gmail. But many people are becoming uncomfortable about companies like Facebook and Google hoarding vast amounts of our personal information – particularly in the wake of revelations about the intrusion of the US National Security Agency (NSA) into what we do online. So computer scientists at the Massachusetts Institute of Technology have created software that lets users take control.
OpenPDS was designed in MIT’s Media Lab by Sandy Pentland and Yves-Alexandre de Montjoye. They say it disrupts what NSA whistleblower Edward Snowden called the “architecture of oppression”, by letting users see and control any third-party requests for their information – whether that’s from the NSA or Google.
If you want to install an app on your smartphone, you usually have to agree to give the program access to various functions and to data on the phone, such as your contacts. Instead of letting the apps have direct access to the data, openPDS sits in between them, controlling the flow of information. Hosted either on a smartphone or on an internet-connected hard drive in your house, it siphons off data from your phone or computer as you generate it.
It can store your current and historical location, browsing history, content and information related to sent and received emails, and any other personal data required. When external applications or services need to know things about you to provide a service, they ask openPDS the question, and it tells them the answer – if you allow it to. People hosting openPDS at home would always know when entities like the NSA request their data, because the law requires a warrant to access data stored in a private home.
Pentland says openPDS provides a technical solution to an issue the European Commission raised in 2012, when it declared that people have the right to easier access to and control of their own data. “I realised something needed to be done about data control,” he says. “With openPDS, you control your own data and share it with third parties on an opt-in basis.”
Storing this information on your smartphone or on a hard drive in your house are not the only options. ID3, an MIT spin-off, is building a cloud version of openPDS. A personal data store hosted on US cloud servers would still be secretly searchable by the NSA, but it would allow users to have more control over their data, and keep an eye on who is using it.
“OpenPDS is a building block for the emerging personal data ecosystem,” says Thomas Hardjono, the technical lead of the MIT Consortium for Kerberos and Internet Trust, a collection of the world’s largest technology companies who are working together to make data access fairer. “We want people to have equitable access to their data. Today, AT&T and Verizon have access to my GPS data, but I don’t.”
Other groups also think such personal data stores are a good idea. A project funded by the European Union, called digital.me, focuses on giving people more control over their social networks, and the non-profit Personal Data Ecosystem Consortium advocates for individuals’ right to control their own data.
OpenPDS is already being put to use. Massachusetts General Hospital wants to use the software to protect patient privacy for a program called CATCH. It involves continuously monitoring variables including glucose levels, temperature, heart rate and brain activity, as well as smartphone-based analytics that can give insight into mood, activity and social connections. “We want to begin interrogating the medical data of real people in real time in real life, in a way that does not invade privacy,” says Dennis Ausiello, head of the hospital’s department of medicine.
OpenPDS will help people keep a handle on their own data, but getting back information already in private hands is a different matter. “As soon as you give access to that raw data, there’s no way back,” says de Montjoye.
So its only 2 weeks away to annual conference. Its beefing-up to be a solid conference, with some stellar speakers. Really excited about it!