Private Data: Implicit Data Collections in Hyperledger Fabric v2.0

Overview

Private data is an important feature in Hyperledger Fabric, allowing certain data to be stored only inside selected organizations. This provides data privacy within a channel, and reduces the number of channels required if the privacy is needed at data level. In Fabric v2.0, an implicit collection is introduced for each organization, and application chaincode can use it without a collection is defined explicitly. This article is to explore the implicit collection, and see how it is being used in both lifecycle chaincode and an application chaincode.

Private Data at a Glance

You can find introduction of Private Data here in Fabric documentation. Here is a quick overview.

In a typical setup, after the organizations joining a channel, their peer will maintain a local copy of ledger, including a world state database and a blockchain of transactions. The world state of each ledger is the same, maintaining a source of truth across a blockchain platform. This implies that any data stored in the ledger are stored in all peers.

There are scenarios where certain data privacy is needed. Some data shall only be accessed by a subgroup of organizations. In such a case, those organizations outside this subgroup should not have a copy of that data. This is addressed by Private Data, introduced since release 1.2. You can refer to my previous article (link) for a comparison between channel and private data collection when dealing with data privacy.

Private data is implemented as data collections. Inside a collection there are two types of information: the actual private data and its data hash. Collection is defined with a policy showing the subgroup of organizations (which organizations can have the actual data). While private data is stored only in this subgroup, all peers (including those outside the subgroup) keep the data hash, as an evidence for transaction validation.

Here is a quick illustration the various data stored inside the world state database on each peer.

Collections are defined in collection definition, which is a JSON file showing the properties of all the collections. This file is specified when chaincode definition is approved and committed in Fabric v2.0 (or instantiated in previous releases). Here is an example of collection definition, taken from marbles02_private example in fabric-samples.

You can refer to link for detail description on the properties. Here we just focus on the policy. In this example, collectionMarbles is a collection which both Org1 and Org2 will keep, while collectionMarblePrivateDetails is a collection only for Org1.

The way chaincode interacting with the collection is through two APIs: PutPrivateData and GetPrivateData. It works just like the PutState and GetState. What we need is to specify the collection to or from which the private data is stored or retrieved, respectively.

Prior to v2.0, we have to define the collection before we can use it. As it is quite common to have a data collection for each organization, v2.0 introduces implicit collection. The name tells us what it means: it is a collection predefined in each peer, corresponding to a private data collection for an organization. We do not need to define one explicitly in the collection definition before using it.

The collection name is _implicit_org_<MSP>. We specify this collection in PutPrivateData or GetPrivateData if we are using it. And when it is being used in chaincode, it is prefixed with the channel name and the chaincode name.

Interesting enough, this implicit collection is also used by lifecycle chaincode, a system chaincode for deploying an application chaincode in a fabric network. During the operation certain information is written and kept in the implicit collection.

In this article, we mainly focus on the implicit collection, and observe how both the lifecycle chaincode and application chaincode are using it. We are not going through the lifecycle chaincode in detail. Those who are interested can refer to my previous articles (link, link).

Demonstration Setup

Our demonstration is built on First Network of fabric-samples. This brings up a fabric network with a raft-based orderer cluster, two peer organizations, each of which has two peers (total four peers). A channel mychannel is created and all peers join it. In order to observe the state, we use CouchDB for each peer as the state database and use Fauxton to observe the databases inside CouchDB.

We are using a modified SACC chaincode. SACC is a sample chaincode to store a key/value pair into ledger. We include two functions: setPrivateOrg1 and getPrivateOrg1. The name is self-explanatory and we use this to work on the implicit collection for Org1 for demonstration purpose .

Here is the modified chaincode.

As you can see, lines 51–54, 80–90 and 109–122 are added for the two new functions. And you can see the PutPrivateData being used in line 85, and GetPrivateData in line 114. Both refer to the implicit collection for Org1.

Demonstration

The overall demonstration is grouped into three parts. First we bring up the environment, which is the First Network and prepare the application chaincode sacc_private. Then we deploy chaincode to this fabric network, using lifecycle chaincode. Our focus is mainly on the use of implicit collection during the chaincode operation. Finally we will interact with our application chaincode and see how we can refer to the implicit collection.

Bring Up Environment

Step 1: Bring up all components and setup mychannel

We see this when the script is completely executed.

Step 2: Prepare browsers to show CouchDB for each peer

We need four browsers, one for each peer

  • peer0.org1.example.com: http://<host_ip>:5984/_utils/
  • peer1.org1.example.com: http://<host_ip>:6984/_utils/
  • peer0.org2.example.com: http://<host_ip>:7984/_utils/
  • peer1.org2.example.com: http://<host_ip>:8984/_utils/

For convenience they are arranged in sequential tabs. From the port in the URL we know which peer we are inspecting.

Step 3: Create a new chaincode directory and place the new chaincode

For simplicity, just copy the sacc directory to a new one sacc_private.

Then replace the sacc.go with the code provided (or just insert the part added).

Step 4: Load the module for the first time

Deploy Application Chaincode in First Network

We go through the complete lifecycle process to deploy our application chaincode.

Step 5: Lifecycle stage 1: Packaging chaincode

Note that chaincode directory is mapped correctly to CLI container. Therefore we can package this newly created chaincode.

Step 6: Lifecycle stage 2: Install chaincode package to peer0.org1 and peer0.org2

Note the package ID, as we will use it when approving chaincode definition.

Step 7: Observe State in Peers

There is no change on state in all peers yet. With this as a baseline, we can compare what is added in the coming steps.

Step 8: Approve Chaincode Definition for Org1

We first have Org1 approve chaincode definition.

Step 9: Observe State in Peers

First we observe in peer0.org1.

The same is seen in peer1.org1.

Then we observe in peer0.org2.

And the same is also seen in peer1.org2.

As summary, we can see that new databases are created in all peers. But there is difference between organizations. When we approve chaincode for Org1, we see

It is a private data collection setup, and this collection is an implicit collection specific to Org1 (the name tells) used in channel mychannel and chaincode lifecycle. The database with $$p is the actual data, while the one with $$h is the hash of the actual data. We see that for implicit collection for Org1, the actual data is kept in peers of Org1, while data hash is kept in all peers within a channel (in both Org1 and Org2 as well).

The content inside this data collection is out of scope in this article. You are encouraged to take a look what is inside, and related them to the process of lifecycle chaincode.

Step 10: Approve chaincode definition for Org2

Step 11: Observe State in Peers

First we observe peer0.org1 (and same is found in peer1.org1)

Then we observe peer0.org2 (and same is found in peer1.org2)

We see that new databases are created in all peers in a similar manner. When we approve chaincode for Org2, we see

The result is that another implicit collection, specific to Org2 this time, is being used by lifecycle chaincode.

Step 12: Commit Chaincode

With approval for both organizations made, we now commit the chaincode and now the application chaincode is ready for use.

If you take a look on the state databases, there is no update on the implicit collections. The updated part after chaincode commit is on mychannel__lifecycle, which is again out of scope in this article.

After the commit is done, the application chaincode is ready to use.

Use Application Chaincode

We use invoke and query chaincode to interact with the deployed chaincode. As a comparison, we first use the original set and get, as a reference to compare our new functions working on data collection.

Step 13: Invoke set and query from peers in both orgs with get

We first invoke set and make a query from both peer0.org1 and peer0.org2.

The data is available in both peer0.org1 and peer0.org2.

Step 14: Observe State in Peers

As we are using PutState to write data into ledger, the data is kept in mychannel__mycc, and this happens in all peers joining the channel.

Step 15: Invoke setPrivateOrg1 and query from peers in both orgs using getPrivateOrg1

The data is only available in peer0.org1 but not in peer0.org2. There is an error message as peer0.org2 is unable to get asset. It is because this data is written on the implicit collection for Org1.

It is clearer when we observe the state database.

Step 16: Observe State in Peers

As we are using PutPrivateData and specifying implicit collection _implicit_org_Org1MSP to write data into ledger, the data is kept in mychannel_mycc$$p_implicit_org_$org1$m$s$p, and a corresponding data hash file is also created. This applies in both peers of Org1 (peer0.org1 and peer1.org1).

If we observe peers in Org2 (peer0.org2 and peer1.org2), we see empty in the data portion but a data hash. This is expected as the data collection is an implicit one for Org1.

Here we see while implicit collection is for a specific organization, it is a database designated by a channel name and a chaincode name. As we see in the state there are two databases: one for mychannel-lifecycle, and one for mychannel-mycc. They are independent databases, serving for different channel-chaincode combination.

This ends our overall demonstration.

Summary

In this article we have demonstrated and observed how the implicit collection looks like and is being used.

The implicit collection gives us a predefined data collection specific to one organization. A separate collection is created per channel-chaincode combination. Besides being used in our application chaincode, this implicit collection is also used by lifecycle chaincode when an application chaincode is deployed.

Written by

Happy to share what I learn on blockchain. Visit http://www.ledgertech.biz/kcarticles.html for my works. or reach me on https://www.linkedin.com/in/ktam1/.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store