Hacker News new | past | comments | ask | show | jobs | submit login
ZFS over iSCSI Storage in Proxmox (blog.haschek.at)
41 points by transpute on Jan 30, 2024 | hide | past | favorite | 20 comments



Turn on Jumbograms. I ran iSCSI for ZFS served out via NFS on distinct cards so I had less contention for the disc fetch against the NFS serve, and it worked "ok" but that was FreeBSD (partly) and 9000 MTU definitely made a difference. Possibly right-sizing the MTU to be bigger than the blocksize is a tuning distinction but 9k jumbo definitely improved things.

Why send 4 packets when one will do? Same volume of data, less switch burden to latch it through.


After years in enterprise storage with endless performance testing: there's almost no point. Modern CPUs and NICs barely benefit. In 2005 when we had dual-core CPUs that were constantly buried and NICs with 0 offload, it made a ton of sense - heck we had dedicated iSCSI HBAs (QLA4010 represent!).

That being said, if you've got three servers and two VLANs with no worries about the jumbos ever escaping: I guess? But if you see even a 5% performance increase, I'll be shocked. On the flip side you're one misconfiguration away from endless troubleshooting if those jumbos escape.

Also *jumbo frames.


That's great feedback. If it doesn't help, don't do it.


It'd be great to have an MTU of say 64 KiB or greater.

Although I guess you'd also need a longer than 32-bit CRC to detect all the possible 3 bit errors past 11 kB frame size. A 40-bit CRC would be sufficient, at least up to 188 kB frame size or so.


See perhaps "Best CRC Polynomials":

* https://users.ece.cmu.edu/~koopman/crc/


Thanks. Pretty nice!


If we were redoing Ethernet I wouldn't mind removing the CRC completely. If you want end to end reliability you should do it in the layer above ethernet. If you want link per link packet validation in a network we've already been layering advanced FEC algorithms at the physical layer for high speed Ethernet. The advantage of the latter being it's both optional and replaceable without requiring even more dynamic bits or redundant functionality in the layer 2 packet. Then on MTU make it a 32 bit field instead of a 16 bit field in case anyone wants to make hardware that supports more than 64k in the future.


Some people say for TCP, smaller packets give better acknowledgement pacing. ISCSI is kinda over local, single switch links for me, but general purpose TCP streamed data it may well be "smaller is better" for rate estimates and window management


Nothing prevents you from using a smaller MTU... well "TU" for TCP frames.

Less packets to process would speed things up. Fever headers to process.

Also if your TCP flow bandwidth is counted in tens or hundreds of gigabits per second, there's still going to be plenty of ACKs.


MSS (maximum segment size) is the term at the TCP layer. Each end of a connection can (and usually does) declare its MSS in a TCP option in the first packet it sends.

Advertised MSS, interface MTU, and route MTU can all constrain packet sizing.

Using large-MTU routes for internal destinations can work well.


Right, vaguely aware of that, but been a while. Thanks for correcting!


What amount of packet loss were you experiencing in your setup?


Low enough we had viable mounts before, but the retransmit counts were big. I don't have the host any more, moved to an ix system truenas. Probably I should have looked harder on the provider side.


One can then use local disk as a ZIL to improve IOps.

When "Hybrid Storage Pools" storage pools were first introduced in 2008, when flash was still really expensive, this was a clever way of balancing speed and bulk storage with budget constraints:

* https://ahl.dtrace.org/2008/11/10/hybrid-storage-pools-in-th...

Nowadays flash is cheap/er, so all-flash storage is much more popular, with many storage products able to do tiered storage where (c)older data bits are shuffled from fast-expensive flash to slow-cheaper spinning rust.


I hope the author really does mean "home lab" and not "home production". Having run my own personal disk array for over two decades, this is like the opposite of what I've come to want. The simpler and more straightforward things can be, the better. Otherwise when things fail (and they will fail, despite that redundancy (or even perhaps because of it)), you'll end up with circular dependencies that make diagnosing and fixing things quite painful.


I have a similar setup but for networking I find some really cheap InfiniBand cards (20-40) on eBay and configure iSER in tatgetcli


> iSER in tatgetcli

google only lists your comment as a result. that is that ?



targetcli might be the correct word and iSER seems like RDMA for iSCSI


This is great, setting the same up with truenas wasn't as easy as I had hoped.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: