Validate some GETH behaviour & assumptions please

Cryptocurrency News and Public Mining Pools

Validate some GETH behaviour & assumptions please

I wanted to validate some GETH behaviour & assumptions please.
Host: Windows 10, multiple SSDs, 32 GB RAM, 3 GHz CPU & 50 Mbits bandwidth
Geth/v1.10.8-stable-26675454/windows-amd64/go1.16.4

1) I note when syncing (full) on Ropsten, the first 1 – 2 million blocks come in less than an hour, then a progressive slow down occurs, to the point the final million blocks from 10 – 11m (highestBlock: 11053360 at the time of writing) can take days.

Is this because my GETH node is building the blockchain (rather than simply downloading a flat file or DB structure) as such, each new block is progressively more workload to add to the chain?

As I type block 5 million is processing, resource usage is: 60MB/s SSD IO, 6.5 GB+ RAM, 50% CPU, 20 Mbits bandwidth. Which would imply this is significantly more than just a download.

This seems inefficient. If a snapshot was taken, say every million blocks, my node could simply request the 11 million block database snapshot, then sycnc the final blocks.

Forcing everyone to build a 'frozen' block chain (IE millions of blocks that are immutable) seems unnecessarily resource intensive & significantly slows the process of bringing a new node online Vs a simple p2p style download of a snapshot approach.

To be clear, no disrespect intended to the developers & team.
Am I missing something in the design here??

2) I noted the GETH database corrupts very easily & does not appear to have much in the way of automatic repair/rollback. In my experience, the solution to any corruption seems to be redownload & build the entire blockchain. I have downloaded the Ropsten blockchain about 6 times in the last 2 months, due to various Windows crashes & forced patch re-boots. On a few occasions, GETH shutdown cleanly, then said the DB was corrupt on restart etc.

I have downloaded Ropsten so many times, I am now periodically shutting GETH down every 2 million blocks and taking my own directory snapshots, so I have a manual rollback point and don't need to download the entire blockchain on a regular basis.

I don't understand why blocks are being held in a database in some kind of writable state?? I would have considered if I was accessing a read only file and the host OS went down, the chances of corruption should be low to zero.

I don't understand why every million block or so, the chain isn't moved to a read only database, meaning in a worst case improper shutdown, you have an automatic rollback point to the last million & only need to sync recent blocks.

I believe this may(?) be the concept behind 'ancient' blocks in GETH, I did attempt once deleting the 'chaindata' LDB files and let the blockchain rebuild from the ancient data, though this process took so long (3 days+), I could have likely re-downloaded the entire blockchain in the same time.

It would seem as of today, the entire 11 Million blocks are in a writable db structure, that, at least in my experience, only ever crashes hard.

I understand in a professionally hosted environment with a team of sysadmins you may not have these issues, though I saw GETH more aimed at the hobby end of the market where unexpected things can and do happen.

Again, am I missing something in the database design here??

Thanks for reading & all feedback warmly received.

submitted by /u/ggekko999
[link] [comments]