Private Data and Transient Data in Hyperledger Fabric

12 min readMar 11, 2020

Overview

This article is inspired by a discussion on the nature of private data in Hyperledger Fabric. The feature we focus here is Private Data, while another concept of Transient Data is also introduced when using Private Data. Technically these are two different concepts. While Private Data is about keeping data inside a subgroup of organizations defined in collection definition, Transient Data is an input method when using Private Data. Interesting enough, they do not have direct relationship, though in real life, as shown below, we should use Transient Data when we need Private Data for a certain security level.

I have built total four demonstrations for each combination. Through observing the ledger, both the transaction and the worldstate database, we will know more detail about the mechanism when private data is used, and why we should use Transient Data when we are using Private Data.

Concept Review

Review on some key concepts helps us when sailing in the demonstration.

Ledger: In Hyperledger Fabric peers are maintaining a copy of ledger after joining a channel. There are two parts in the ledger. The first part is a blockchain data structure holding the blocks (of transactions). The second part is a worldstate database, keeping the latest state after a block is committed. When a new block is received from ordering service, upon successful validation, peer commits the block into ledger. This include placing the block into the blockchain, and updating the worldstate according to the RWSet inside each transaction.

Most part of the ledger are identical among the peers within a channel, thanks to the overall consensus mechanism. There is an exception, Private Data, in which only specified organizations are storing in the worldstate.

Private Data: Within a channel there are scenarios that only a subgroup of organizations keep the data while those outside that subgroup do not. It is always due to requirement of data privacy between organizations. Hyperledger Fabric introduces Private Data to address this need. Through data collection definition, we can define the subgroups as collections where private data is implemented. As a proof of data existence or audit purpose, all peers (within and outside the subgroup) will keep a record of private data hash.

The use of private data is made through chaincode API. For our interest we are using PutPrivateData and GetPrivateData in our demonstration. As a comparison, we use PutState and GetState when we write and read data from public state.

In Fabric v2.0, an implicit collection is prepared for each organization. More detail can be found in my previous article. In this demonstration we are using this implicit collection, and therefore you will not see my collection definition file.

Here is a quick illustration about the two parts of a ledger, and where the private data is located. Over the demonstration we will take a look the content inside the transaction and inside the worldstate database.

What is inside a ledger and where and how private data is stored

Transient Data: Many chaincode functions, when being invoked, require external data input. In most cases we are supplying a list of arguments when invoking a function, and the chaincode is well-coded to process the arguments. The chaincode arguments, including function name and arguments associated to that function, are kept as part of a valid transaction inside a block, and therefore will stay in the ledger forever. If for some reasons we do not wish to keep the argument list permanently in blockchain, we can use Transient Data. Transient Data is a data input method such that the input data can reach the chaincode but will not stay in transaction record. A special chaincode API GetTransient is needed when Transient Data is used, with a proper format. We will see this in our demonstration.

The chaincode designer therefore can design chaincode according to the business need, determining which data should be input as normal arguments which will be recorded in the transaction, and which should be input as Transient Data such that the input data will not be kept.

Relation between Private Data and Transient Data: We see here Private Data and Transient Data are not directly related: we can use Private Data without using Transient Data as input method, while we can also use Transient Data but the data is not stored in Private Data. As a result I have come up the four scenarios in the demonstration and we can observe the result in each of them.

Here is an illustration about Private Data and Transient Data.

The selection of input methods and use of private data depends on the business requirement, as chaincode functions reflect true business transactions. One can choose argument list, transient data, or both together, and write data in public state, private data collection, or both together. What we need is just to use the right chaincode APIs.

Demonstration Setup

Scenarios

As said, I am creating a 2x2 combination for using Private Data and using Transient Data. Here is a quick summary.

Scenario 1: Not using Private Data and Not using Transient Data

Data is written into the public state part inside the ledger, and all peers will have this same piece of data in their ledger. In the chaincode, PutState and GetState API are used for this part of data. When we invoke chaincode, we specify the input data as arguments.

This is widely used for data that all organizations need a copy.

Scenario 2: Using Private Data without using Transient Data input

In this scenario data is written into the private data part inside a ledger, and only those peers of organizations defined in collection definition will keep this data. In the chaincode, PutPrivateData and GetPrivateData API are used with collection specified. When we invoke chaincode, we specify the input data as arguments.

This can be used when data privacy is needed in storage, while the input data is not sensitive as it will be kept inside the ledger (permanently in blockchain).

Scenario 3: Using Private Data with Transient Data input

Similar to Scenario 2, data is written into the private data part inside a ledger, and only those peers of organizations defined in collection definition will keep this data. In the chaincode, PutPrivateData and GetPrivateData API are used with collection specified. When we invoke chaincode, we specify the input as transient data, not as arguments. And we use GetTransient in the chaincode to process the input data.

This can be used when data privacy is needed in storage, while the input data is sensitive such that they should not be kept anywhere.

Scenario 4: Use Transient Data as input method but no Private Data

This is an imaginary scenario, just showing that it is possible to use input data using Transient Data without storing them into Private Data. We use PutState and GetState to store data in public state, while data is input as Transient Data. As before, we use GetTransient in the chaincode to process the input data.

Setup

In these demonstrations I am using the Fabric v2.0. First Network is used as the fabric network. We bring up the network with CouchDB option such that we can inspect the worldstate database. In particular we will inspect peer0.org1.example.com (couchdb port 5984) and peer0.org2.example.com (couchdb port 7984) to observe the behaviour of the peers of two organizations.

In Private Data part, I am using the built-in implicit collection for Org1 (_implicit_org_Org1MSP). Only peers of Org1 will keep the data, while both Org1 and Org2 will keep a data hash.

I have modified the SACC chaincode in fabric-samples. SACC comes with two functions set and get. In order to show Private Data and Transient Data, we are creating the following functions

setPrivate: use the same argument list and the data is stored in the implicit collection for Org1
setPrivateTransient: use transient as input and the data is stored in the implicit collection for Org1
setTransient: use transient as input, and the data is stored in public state
getPrivate: retrieve the value stored in the implicit collection for Org1

Here is the chaincode

After each chaincode invoke, we will inspect the newly committed block. How the block is fetched and decoded is omitted in this article.

Demonstration

Setup

Bring up First Network without deploying default chaincode. Also CouchDB is needed.

cd fabric-samples/first-network
./byfn.sh up -n -s couchdb

After we see all the containers up and running (5 orderers, 4 peers, 4 couchdb, and one CLI)

Then create a new chaincode directory.

cd fabric-samples/chaincode
cp -r sacc sacc_privatetransientdemo
cd sacc_privatetransientdemo

And replace the sacc.go with the chaincode above.

For the first time, we will load the module required.

GO111MODULE=on go mod vendor

Finally we deploy this chaincode with lifecycle chaincode. I omit the detail here. You can refer to my previous articles (link, link) about the whole process.

Scenario 1

Scenario 1 can be seen as the most common one: data are input through argument list, and stored in the public state such that all peers have a copy of the data. We invoke functions set and get.

docker exec cli peer chaincode invoke -o orderer.example.com:7050 --tls --cafile /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/ordererOrganizations/example.com/orderers/orderer.example.com/msp/tlscacerts/tlsca.example.com-cert.pem --peerAddresses peer0.org1.example.com:7051 --tlsRootCertFiles /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/peerOrganizations/org1.example.com/peers/peer0.org1.example.com/tls/ca.crt --peerAddresses peer0.org2.example.com:9051 --tlsRootCertFiles /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/peerOrganizations/org2.example.com/peers/peer0.org2.example.com/tls/ca.crt -C mychannel -n mycc -c '{"Args":["set","name","alice"]}'docker exec cli peer chaincode query -C mychannel -n mycc -c '{"Args":["get","name"]}'

We first inspect the worldstate. This data is stored in public state (mychannel_mycc) on both peer0.org1.example.com and peer0.org2.example.com.

When inspecting the transaction recorded in the blockchain, we see the Write Set the key-value pair is name and alice (base64 encoded).

In Scenario 1, the RWSet contains data written to public state in the ledger.

And we see the argument list when we invoke the chaincode. The three base64-encoded arguments are set, name, alice, respectively.

In Scenario 1, input data can be seen in proposal.

So we see the public state is updated with the RWSet, and input arguments are recorded inside the transaction record. Everything works properly.

Scenario 2

In Scenario 2 data are input through argument list, and stored in the private data collection of Org1. We invoke functions setPrivate and getPrivate.

docker exec cli peer chaincode invoke -o orderer.example.com:7050 --tls --cafile /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/ordererOrganizations/example.com/orderers/orderer.example.com/msp/tlscacerts/tlsca.example.com-cert.pem --peerAddresses peer0.org1.example.com:7051 --tlsRootCertFiles /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/peerOrganizations/org1.example.com/peers/peer0.org1.example.com/tls/ca.crt --peerAddresses peer0.org2.example.com:9051 --tlsRootCertFiles /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/peerOrganizations/org2.example.com/peers/peer0.org2.example.com/tls/ca.crt -C mychannel -n mycc -c '{"Args":["setPrivate","name","bob"]}'docker exec cli peer chaincode query -C mychannel -n mycc -c '{"Args":["getPrivate","name"]}'

We first inspect the worldstate. In peer0.org1.example.com we see the data is stored as private data, and two databases are created: one for the actual data, and one for the hash. In peer0.org2.example.com we only see the hash file.

The content of hash is the same in both peers (organizations). Besides, from the actual data in peer0.org1.example.com, we see the data we input when invoking the chaincode.

When inspecting the transaction recorded in the blockchain, we see no RWSet. Instead we see the data is applied to the implicit collection for Org1, referring to some data already inside the peer somewhere and protected by the hash.

In Scenario 2, there is no RWSet recorded in transaction.

And we see the argument list when we invoke the chaincode. The three base64-encoded arguments are setPrivate, name, bob, respectively.

In Scenario 2, the input data is visible and kept permanent in transaction.

This may be a problem if we consider the data privacy. On one side the data is stored in the Private Data such that only designated organizations and peers can have. On the other, the data input of the same piece of data is visible and kept permanently in the blockchain by all peers. If this is not a desired situation, we need to make the input data not shown in the blockchain. That is the reason we are using Transient Data as input.

Scenario 3

Scenario 3 is the recommended one if we wish to make sure the data input is not stored inside the blockchain. Now data are input through transient data, and stored in the private data collection of Org1. We invoke functions setPrivateTransient and getPrivate.

In our chaincode we code the function such that the Transient Data are in a specific JSON format {“key”:”some key”, “value”: “some value”} (line 134–137 in our chaincode). We also require the transient data coming with a key called keyvalue (line 149 in our chaincode). To use Transient Data in CLI, we first encode it into Base64.

export KEYVALUE=$(echo -n "{\"key\":\"name\",\"value\":\"charlie\"}" | base64 | tr -d \\n)docker exec cli peer chaincode invoke -o orderer.example.com:7050 --tls --cafile /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/ordererOrganizations/example.com/orderers/orderer.example.com/msp/tlscacerts/tlsca.example.com-cert.pem --peerAddresses peer0.org1.example.com:7051 --tlsRootCertFiles /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/peerOrganizations/org1.example.com/peers/peer0.org1.example.com/tls/ca.crt --peerAddresses peer0.org2.example.com:9051 --tlsRootCertFiles /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/peerOrganizations/org2.example.com/peers/peer0.org2.example.com/tls/ca.crt -C mychannel -n mycc -c '{"Args":["setPrivateTransient"]}' --transient "{\"keyvalue\":\"$KEYVALUE\"}"docker exec cli peer chaincode query -C mychannel -n mycc -c '{"Args":["getPrivate","name"]}'

Again we first inspect worldstate. It is similar to what we saw in Scenario 2. The actual data is only in the peer0.org1.example.com, while the hash is in both peers. Note that the revision in the value is now 2, and the first revision in Scenario 2 was 1. The data is being revised with this chaincode invoke.

Similar to Scenario 2, in the transaction recorded in the blockchain, we see no Write Set.

In Scenario 3, there is no RWSet recorded in transaction.

And we see no data in the argument list when we invoke the chaincode. The only argument is the function name, setPrivateTransient. The data {“key”:”name”, “value”:”charlie”} cannot be found in the blockchain.

In Scenario 3, we do not see the data as input arguments.

We see the combined Private Data and Transient Data provide a certain level of data privacy. (It is arguably whether the way fabric network handles the hash is secured enough. It is out of the scope of this article.)

Scenario 4

Finally we come to an imaginary scenario. In Scenario 4, data are input through transient data, and stored in the public state of the ledger. We invoke functions setTransient and get.

export KEYVALUE=$(echo -n "{\"key\":\"name\",\"value\":\"david\"}" | base64 | tr -d \\n)docker exec cli peer chaincode invoke -o orderer.example.com:7050 --tls --cafile /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/ordererOrganizations/example.com/orderers/orderer.example.com/msp/tlscacerts/tlsca.example.com-cert.pem --peerAddresses peer0.org1.example.com:7051 --tlsRootCertFiles /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/peerOrganizations/org1.example.com/peers/peer0.org1.example.com/tls/ca.crt --peerAddresses peer0.org2.example.com:9051 --tlsRootCertFiles /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/peerOrganizations/org2.example.com/peers/peer0.org2.example.com/tls/ca.crt -C mychannel -n mycc -c '{"Args":["setTransient"]}' --transient "{\"keyvalue\":\"$KEYVALUE\"}"docker exec cli peer chaincode query -C mychannel -n mycc -c '{"Args":["get","name"]}'

The public state is being updated, and both peers are keeping the same data. Note the revision is updated to 2 as the previous revision 1 is for Scenario 1.

We see the Write Set the key-value pair is name and david (base64 encoded), which we made using transient data.

In Scenario 4, the RWSet contains data written to public state in the ledger.

And we do not see the input data in arguments. The argument we see is the function name setTransient.

In Scenario 4, we do not see the data as input arguments.

This ends our demonstration.

Summary

While in a common use case of Private Data we use Transient Data, this article tries to make it clear that they are different concepts. Private Data is how and where a piece of data being stored in the worldstate database, while Transient Data is an input method in case we do not wish to keep some sensitive input data in the blockchain as permanent record. Through the demonstration, we inspect both the transaction and the worldstate database. This gives us a clearer picture what happens after we invoke chaincode functions. And our demo chaincode also shows how to use Private Data and Transient Data in the chaincode.

Private Data and Transient Data in Hyperledger Fabric

Overview

Concept Review

Demonstration Setup

Scenarios

Setup

Demonstration

Setup

Scenario 1

Scenario 2

Scenario 3

Scenario 4

Summary

Written by KC Tam