Data Privacy among Organizations: Channel and Private Data in Hyperledger Fabric

Overview

Permissioned blockchain platforms differ in various aspects from permissionless ones. One of them is on data privacy. While data privacy is a broad topic, in this article we only focus on how to avoid organizations accessing unauthorized data (or avoid unauthorized organizations access certain data).

There are two ways in Hyperledger Fabric to achieve this: using Channel at network level, and using Private Data at chaincode (application) level. These have different mechanisms and can be used based on the actual business needs.

We will first perform demonstrations on both mechanisms, and in closing we will make a discussion on them.

Consortium

Hyperledger Fabric is always used in a consortium environment. Here consortium is a number of business entities coming together with common business goals. We can immediately think of partners forming a consortium. By bringing partners together, certain workflows can be streamlined, or a trustworthy resource of information can be made available to all members. Consortium can be formed even among competitors. In such cases, the business goal must bring value as incentives such that it is better to have a consortium than not to have. For example, banks can form a consortium to reduce risk when making loans.

Hyperledger Fabric considers a consortium as a group of organizations. It is a business network of member organizations.

Everything begins with a consortium, and blockchain networks and blockchain applications are built upon all members or a subset of them depending on the business natures and needs.

For demonstration purpose, we build a consortium of three organizations, org1, org2 and org3. Each of which we deploy a peer node (peer0) which a couchdb is created for state observation. As usual we will have an orderer and a command line interface (CLI) for chaincode interaction, which are now shown in the coming diagrams.

The demonstration code for this article can be found here.

Channel

Overview

Channel is an important concept in Hyperledger Fabric. It is a group of organizations sharing business goals by implementing the same business application and maintaining a same and consistent data store.

In a consortium, we can define as many channels as needed. Organization can join one or more channels depending on the business requirement. For example, in a consortium for insurance claim, insurance companies, clinics and regulators are major members. There can be one channel for all of members, and another is just for clinics, etc.

Channel is identified by a channel ID and defined with member organizations. After a peer joins a channel, a ledger (composed of a blockchain of transaction records and a world state database) is created and running on the peer. When a peer joins more than one channel, these ledgers are running independently.

At application level, chaincode is instantiated (deployed) at channel level. During the execution of chaincode functions, the correct ledger is accessed if the peer joins more than one channel.

You can get a more complete picture about channel here.

Demo Setup

In our demonstration, we will create two channels

  • channel-all: org1, org2 and org3
  • channel-12: org1 and org2

For the purpose of demonstration, I have prepared the following scripts. It is just the peer commands issued from shell or from CLI for various purposes. They are,

  • network-up.sh and network-down.sh: use docker composer to bring up the components (containers) and tear down everything.
  • channel-all-up.sh and channel-12-up.sh: create channel and join those member organizations to the channel.
  • deploy-sacc.sh and deploy-personalinfo.sh: install and instantiate the chaincode, with proper initial value or setting.

We deploy the same chaincode sacc to both channels. This sacc comes with fabric-samples. Basically it provides a key/value storage in the ledger.

Here are the steps for bringing things up.

  1. bring up all containers: ./network-up.sh
  2. bring up channel-all: ./channel-all-up.sh
  3. bring up channel-12: ./channel-12-up.sh
  4. deploy sacc on both channels: ./deploy-sacc.sh
  5. chaincode invoke/query and observation
  6. clean up: ./network-down.sh

Demo and Observation

Perform steps 1–4 above. Note that in the deploy-sacc.sh, an initial value is provided when instantiating the chaincode. For channel-all, it is [“name”:”alice”], and for channel-12, it is [“name”:”bob”].

We first inspect which channels the peer has joined.

Now we get the value for name from org1 on both channels.

We do the same from org3 on both channels. Note that environment variables is provided as the default CLI is pointing to org1.

We get back result from channel-all, but a message of access denied from channel-12. It is because org3 is only a member of channel-all, but not channel-12.

For observation, we zoom into peer0.org1.example.com and see how a peer handles ledger if it joins two channels. We see two separate ledgers, one for each channel, working independently. As mention, ledger is composed of blockchain and world state.

In blockchain portion,

These are two separate blockchains, of different blockchain height and hash.

In world state portion

Though in the same couchdb node, the two channels are represented by two separate databases (channel-all_* and channel-12_*).

This is how our demonstration setup looks like showing channels on the consortium.

Private Data

Overview

We have seen in the demonstration above that all peer nodes in the same channel keep the same set of state data. It is by all means a nature of distributed ledger technology, that every participant keeps a set of data with assurance of consensus algorithm behind. Channel provides a mechanism to limit both applications and data to a designated group of organizations in Hyperledger Fabric.

If the limitation is only at data level for some members, Private Data provides another way. Private Data can allow peer nodes of a specified group of organizations keeping the actual data, while others outside this group only keep a proof of such data, but not the actual data.

Back to our example of insurance claim above. Of course we can create a channel for all participants, and a separate one for clinics keeping their client information. Using Private Data can simplify the whole process with one channel.

Collection is defined to apply policy to specific organizations who can keep and access private data.

You can always refer to the official document about private data. Here let’s first setup a demonstration and make some observation on how things work.

Demo Setup

In our demonstration, we just set up channel-all, and all organizations join channel-all.

A simple chaincode personalinfo is created for demonstration purpose. The personal information to be stored are name, email and passport number. Among them passport number is personal identifiable information (PII) which in our case is only accessible by org1 and org2.

Three chaincode functions are defined for this demonstration

  • createRecord: create a record with personal information provided, including passport number
  • queryRecord: return the personal record excluding PII
  • queryPiiRecord: return the PII of the a personal record

To specify the privacy of data, a file collections_config.json is created and this collection collectionPrivate is accessible only by org1 and org2. This collection file is specified when chaincode is instantiated.

We skip the discussion on the collection file and chaincode. Here we bring up our demonstration setup.

  1. bring up all containers: ./network-up.sh
  2. bring up channel-all: ./channel-all-up.sh
  3. deploy personalinfo: ./deploy-personalinfo.sh
  4. chaincode invoke/query and observation
  5. clean up: ./network-down.sh

Demo and Observation

Perform steps 1–3 above. Note that we have specified the collection configuration when chaincode is instantiated (see the file deploy-personalinfo.sh).

We first work on peer0.org1 node (default node) and create a new personal information record.

After that we query the record with two different chaincode functions.

We can perform the similar queries on peer0.org2 node, which is also part of the collection and therefore the PII is also visible.

Finally on peer0.org3 node. We can see PII record is not available in peer0.org3 as org3 is not in the collection.

It is interesting to see that the error message saying that private data matching public hash version is not available. This is exactly how Private Data is implemented. The private data hash is in the public storage, but the actual corresponding data is not in the private storage.

Let’s take a look on the world state on all peer nodes and see how things are working.

It is taken from the world state of three peer nodes (combined by three screen shots for easy comparison). We focus on the last three items:

  • channel-all_mycc, which is the world state for public data. Those states without using private data collection are stored here. Therefore every organization has the same data.
  • channel-all_mycc$$hcollection$private, which keeps the hash of the private data. This information is also seen in all organizations in a channel, serving as a proof of “some data exist”.
  • channel-all_mycc$$pcollection$private, which keeps the actual private data. The data is only seen in org1 and org2.

This hash record is seen in all peer nodes, while the actual data is seen in designated peer nodes (org1 and org2).

This is how our demonstration setup looks like showing Private Data inside a channel.

A Note about Sending Private Data in Proposal

You may notice in personalinfo chaincode the private data “passport” is passed in createRecord proposal as an argument. In Hyperledger Fabric, this proposal is included in a transaction the client sends to orderer, and finally included in a block. This block is broadcast to all organizations, including org3, the organization who is outside the collection. As a result, though the actual data is not found in world state in org3, org3 can still see the data in this transaction in the blockchain.

A block fetched from peer0.org3 shows the data:

This is for sure not a good way to send private data when invoking a chaincode function. A better way is to use transient data. By specifying transient data when invoking a chaincode function, this information is not included in the transaction and finally not seen in unauthorized organization. More detail can be found here.

Discussion: Channel vs. Private Data

Channel is a concept in the infrastructure level (network level). A channel is created and joined by peer nodes of organization. A ledger is dedicated to each channel created. Besides, we can deploy one or more chaincodes at channel level. That means we have the great flexibility to design chaincode and application specific to channel. Of course, we still can deploy the same chaincode on different channels.

However, in some scenarios these may not be a preferred approach.

If the privacy is only at the data level but not at chaincode or application level, the channel approach means a waste of resource. Think of that a channel is needed per every subgroup of organizations. This represents separate ledger for each subgroup, and same set of chaincode is instantiated for each subgroup.

This is exactly what Singapore Monetary Authority of Singapore (MAS) found in their Project Ubin Phase 2. Think of a bilateral arrangement of a consortium of 10 banks (that is, each two banks form a channel). There are more than 40 channels established. (The MAS report can be found here. They also mentioned that Private Data is a viable solution which was under development then.)

Suppose we are taking channel approach. If we need a business logic interacting with two subgroups of organization, cross-channel chaincode invoke is required. For example, if a chaincode in small group (a channel) needs to refer to data in a large group (another channel), the design is rather complex.

Finally, adding a new subgroup is the process of adding a new channel, from channel artifacts generation, channel creation and joining, to chaincode instantiation. This is again a cumbersome process,

Whenever possible, these shortcomings can be addressed by Private Data. In Private Data, one can deploy multiple collections inside a channel, each of which serves different business objectives for a subgroup of organizations. As far as the privacy requirement is at data storage level, we do not need to deploy a large number of channels per business needs.

The scalability is by all means better than using Channel. In case cross subgroup data is needed, it can be done inside one chaincode, simplifying cross-chaincode business logic.

Finally, the modification is also done at chaincode level. A chaincode update with proper collection configuration can meet the new business requirement.

The introduction of Data Privacy addresses certain drawbacks in Channel when the privacy is only at the data storage level. In the case when different set of chaincodes (applications) are needed for a group of organizations, we have to use Channel.

Written by

Happy to share what I learn on blockchain. Visit http://www.ledgertech.biz/kcarticles.html for my works. or reach me on https://www.linkedin.com/in/ktam1/.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store