Using Geth fast sync mode vs full sync

For months, I have been using Geth in full mode, where it downloads the entire Ethereum blockchain. I originally tried to do this using a mechanical drive (HDD) but it was so slow it became obvious it would never actually download the entire chain in my lifetime. Some simple checking on the web noted that a SSD is the only way to get adequate speed to store the chain data in reasonable time. Even with that it took well over two weeks to sync the whole chain. Then when I did not run Geth for a number of weeks, it would take multiple days typically to catch up to the head of the chain. Ethereum mines a new block every 15-20 seconds or so making it hard to catch up if it takes 5-6 seconds per block to sync.

Some time back I purchased a 500GB SSD for the purpose of hosting Litecoin, Bitcoin and Ethereum block chain data. Lately I have noticed that the Ethereum data has been growing VERY rapidly. When the free space went below 100GB, I became concerned, and when it dipped below 50GB in the last week I decided to look at alternatives.

Geth comes with a “fast” mode. You used to have to force it to use that mode as it was not very stable in its early days apparently. Now it is the default mode if you do not already have a chaindata folder. So today I renamed the full chaindata folder (did not delete in case I wanted to go back to full mode), and restarted Geth. As the articles said, it started in fast mode, which means it just downloads headers and state data, not the full blocks. The Ethereum blockchain is around 4.9M blocks as of today. I started the download at around 11:25AM. 20 minutes later it had reached the 1 millionth block already. Wow, this was going to be easy!

Well, as you might expect, things slowed down a bit. By 2PM, we were close to the 4 millionth block. it also started to hit glitches where it rejected the downloaded blocks data, or could not find peers with the data. At around 4:30PM I stopped Geth and restarted it, which seemed to get it back moving fairly fast again.

However, it regularly will find chain data that it rejects and drops back a fair number of blocks, sometimes 20K blocks or more. When it restarts with a new peer or data it likes, it takes a few minutes to get back to where it supposedly was before the glitch. This is a typical error message when this happens:

WARN [01-27|20:04:05] Stalling state sync, dropping peer peer=a6684f6d59396aa6
WARN [01-27|20:04:05] Stalling state sync, dropping peer peer=83383bee5293a537
WARN [01-27|20:04:17] Stalling state sync, dropping peer peer=f168da4907a2b0f2
WARN [01-27|20:04:17] Node data write error err=”state node 649212…53a470 failed with all peers (0 tries, 0 peers)”

We are now around 7:55PM and it still has around 175,000 blocks to go. It should finish in another hour, so 9 hours to download a chain that previously took multiple weeks. Thats a pretty good improvement, although not as good as the articles online that suggested it would do a full sync in under 2 hours!

As for disk storage, I will add the final numbers when its done, but here is the summary so far:

old chaindata folder                  408.7GB

new fast chaindata folder           50GB+

Once it finishes and all wallets report the correct amount of ether, I will do a test transfer from my main wallet to the ledger wallet, and then delete the old chain files.

One bad interaction noted: When I tried to run windirstat (folder sizes tool) on the folders where Geth was storing its data, both programs went into a bad state. I had to use the 10 ctrl-C’s to stop Geth, and crash windirstat through task manager. I will avoid doing this in the future!

Next Morning: It was still running when I went to bed. Came down this morning to see ethereum wallet still reporting 200 blocks to go and not changing. The Geth screen is running an endless series of “Imported New State Entries”. I am assuming this is normal and will let it go for a while.

Each line reported looks like this

“INFO [01-28|07:38:28] Imported new state entries count=1851 elapsed=13.983ms processed=1967020 pending=23290 retry=2 duplicate=151 unexpected=1752”

The “processed” value continues to increase monotonically. The “pending” value varies up and down. My guess is the pending is how many state entries are in the queue, and processed is how many its pulled together into the state database. Not sure of the value of processed matches the block number its up to or not. We’ll see when and if it finishes. in the meantime, the wallet still shows the same number of blocks yet to go and is not changing, so obviously this state update process does not help get to the final blocks. Also, I should note that the wallet values have not changed from zero. When I used the full chaindata, when it got to blocks where I had transactions, the wallets would get updated to follow their values at that block in time. This does not seem to do that.

So at this point we are up to 18 hours and counting even in fast mode. Also, the new chaindata is now approaching 50GB and I am out of disk space, so I have to delete some of the old chaindata. Lets see if that helps the process any… ( I freed up 345GB but the process seems to be running the same, but at least its proceeding).

An hour later: The “processed” value just shot through 5 million which is larger than the blockchain size so its a measure of state values, not block count. Its still growing rapidly with no signs of slowing. we’ll see how long this takes! I have to leave for the airport in 6 hours. Wonder if it will be done by then?

10:30AM update: Almost 20 million now processed. Every once in a while it hits a glitch with the message:

WARN [01-28|10:27:17] Stalling state sync, dropping peer peer=a99c272ecbd35d7b

but then after some minutes it picks up and continues state processing. The chaindata folder is now up to 55GB large and continues to grow as state gets processed.

3 days later… : Gone for 3 days for work. While I was away, it appeared to finish doing the state updates and is now operating normally, fetching new blocks as they are published on the blockchain. After all this, the new chaindata folder is just shy of 75GB. I will update this blog as we see its growth rate over time.

The actual value of ether held in each wallet is also now correct.

Summary: The fast mode of Geth is a major improvement in disk space storage for the ethereum blockchain, close to a factor of 10! however, it takes longer than expected to finish the process, with state updates taking much of that time.

Addendum: If you shut Geth down and restart it some hours or days later, it needs to catch up with the blockchain. It appears that fast mode does not help you here. It goes back to downloading blocks at about 1 per second, which is about the rate i was getting when downloading in full mode. So the advantages of fast mode are only seen when you initially start using Geth. I am betting that the storage will also grow as it did in full mode. This would imply that to reduce storage once that grows too large, is to delete the old chain and start over in fast mode, to force it to download the compact version of the chain. But I will watch and see how this grows going forward.

 

 

Leave a comment