Improvement of Symbol Testnet Blockchain Stability through NEMTUS Community Stress Tests
Ken-ichi Tezuka* and Yuta Takahashi
Department of Tissue and Organ Development, Gifu University Graduate School of Medicine, *; NEMTUS.
Background:
Blockchain is a distributed ledger which can provide transparency and singularity. It can prove the unity of the data with connectivity of blocks assured by the sophisticated strategy of cryptography. ShizuiNet is a private blockchain designed as a traceability solution for biological resources. Barcodes printed on the tubes and containers are recorded in the blockchain. ShizuNet is now tested on Symbol testnet (Symbol), and, a private chain provided by Techbreau Holdings as Mijin-Catapult v.2 (Mijin). We are using both systems side by side to evaluate the pros and cons of two types of blockchains for our purpose. In this report, we used ShizuiNet devices to evaluate the performance drop caused by heavy load to the Symbol blockchains.
Methods:
We used devices designed for ShizuiNet to send constant and sequential flow of transactions to monitor the performance of blockchains. Each device was made with a raspberry pi 4 board connected with a barcode reader. Forty eight barcoded tubes were scanned and used to generate 2500 transfer transaction data for each tube, resulting in total of 120k ShizuiNet transactions. These transactions were temporarily stored in a single database file using sqlite3.
An example of a message recorded in ShizuiNet transaction:
2021–01–28 13:34:18.18:NA0001605506:prod00002161
These transactions were stored in database files in ShizuiNet devices and sent one by one to the nodes running Symbol (symbol0.10.0.5, with harvesting strategy of ‘oldest’) or Mijin (fushicho-version5) during the first stress test (stress test #1 held by NEMTUS on January 28 19:00–22:00 JST, 2021). A Symbol 0.10.0.6 server provided by NGL (ap01-.ap-northeast-1.testnet.symboldev.network) was used for second testing (stress test #2 held on February 22 20:00–21:45 JST, 2021). Symbol server ‘ibone73’ was equipped with 32-core dual Xeon processors and 64GB memory. Mijin ‘ibone61’ was equipped with a 4-core Core i5 processor and 32GB memory. The data transfer rate was 1 transaction/second (tps) for Symbol and 0.8 tps for Mijin, respectively.
Each transaction was sent with either one testshizui:consumption or one testshizui:production mosaic with maximal transaction fee 0.1XYM. When a transaction was generated and sent to a node, a time stamp was recorded in the sqlite3 database file. When the server returned ‘SUCCESS’ message or an ‘ERROR’ code, they were also recorded into the same database file with timestamps to measure the time required for responses by blockchains. Under the stress testing, nodes did not return any response for some of the transactions, and, in such case we treated them as orphan transactions.
Daoka-cannons and mini-Daoka-cannons were large and small set of aggregate transactions designed for NEMTUS stress tests. Each aggregate transaction contained 90 complicated series of nested transactions, that caused approximately 6000 and 1000 transactions/block, respectively.
Results:
For Mijin private blockchain, more than 98% of transactions were incorporated into the blockchain revealing the high fidelity of the catapult-symbol blockchain (Table1). Same level of fidelity was observed using Symbol testnet servers under non-stress conditions (data not shown). When transactions were sent under stressed condition in the test #1, Symbol server (0.10.0.6) returned a number of errors, and, as a result only 78% of the transactions were incorporated. A significant number of orphan transactions (2.8%) were also observed in Symbol network.
In the test #1, most of the error messages were divided into two groups. One third of the errors were returned shortly (< 300 seconds) after announcement of the transactions. In these cases, the error messages were “Failure_Chain_Unconfirmed_Cache_Too_Full (cache_full)”. Two thirds of the errors took more than 300 seconds (in most cases 2 hours after transaction announcement), reporting error messages “Failure_Core_Past_Deadline (deadline_timeout)”. Most of the cache_full errors were observed under the severe stress, but, the deadline_timeout errors were continuously reported even after the stress test period (Figure1).
Daoka-cannons showed effect one hour after the stress test causing significant delay in synchronization of a number of nodes, and, finally many low-performance nodes stopped working. This effect was observed as a increase in deadline_timeout errors in Figure 1. Interestingly, the ibone73 node equipped with relatively large amount of memory did not drop instantly, however, two days after the stress test its api-gateway stopped functioning. So, the api-gateway was restarted on the next day (node restart in Figure 1), and, we found that the number of errors kept relatively high levels even after the NEMTUS test and Daoka-cannons until the node dropped, however, after restarting the node, the error rate came back to the normal level. This tendency was also observed in the increased incorporation time of each transaction (data not shown).
In the second stress test, all of the error messages reported were “Failure_Core_Past_Deadline”. There was a remarkable improvement in the success rate (10.0.1.7 in Table 1). There seemed to be intensive bug fixes and fine tunings made by core-developers in this version. Even the Daoka-cannons and newly designed mini-Daoka-cannons did not show obvious effects this time (Figure 2).
Discussion:
Our Symbol 10.0.0.6 node survived the first NEMTUS stress test and Daoka-cannons, but, sustained high error rate was observed and the server was eventually down after 2 days. This suggests that even a node equipped with high-spec hardware configuration, excess network load generated by flood of transactions and/or some types of aggregated transactions consisting of complicated inner transactions (Daoka-cannons) can continuously affect to the functions and the stability of nodes. Therefore, it will be recommended that the stability of the Symbol network nodes should be watched for at least 2 days after obstacle events, to assess the stability of the network. The delayed and sustained effects of overload onto the node can be monitored with slightly increased error rate and/or slightly longer time period required for each transaction request to be processed and incorporated into the blockchain. During and after the test #2, such phenomena were not observed, and therefore, causes of the malfunctioning of the nodes seemed to be completely removed in updated version 0.10.0.7.
Conclusion:
Monitoring a blockchain by sending a continuous flow of transactions is useful to detect troublesome behaviors of nodes and monitor the health of network. The Symbol testnet (version 0.10.0.7) seems to be satisfactorily stable for mainnet launch.