O_DIRECT seems like overkill, and the lack of write buffering could be a real detriment in some circumstances. Syncing at the end of each operation (from the user's perspective) should be the best mix of throughput and safety, but it makes it hard to do an accurate progress bar. Before the whole batch operation is finished, it may be useful to periodically use madvise or posix_fadvise to encourage the OS to flush the right data from the page cache—but I don't know if Linux really makes good use of those hints at the moment.
On really new kernels, it might work well to use io_uring to issue linked chains of read -> write -> fdatasync operations for everything the user wants to copy, and base the GUI's progress bar on the completion of those linked IO units. That will probably ensure the kernel has enough work enqueued to issue optimally large and aligned IOs to the underlying devices. (Also, any file management GUI really needs to be doing async IO to begin with, or at least on a separate thread. So adopting io_uring shouldn't be as big an issue as it would be for many other kinds of applications.)
Not always. If you're reading from a SSD and writing to a slow USB 2.0 flash drive, you could end up enqueuing in one second a volume of writes that will take the USB drive tens of seconds to sync(), leading to a very unresponsive progress bar. You almost have to do a TCP-like ramp up of block sizes until you discover where the bottleneck is.
On really new kernels, it might work well to use io_uring to issue linked chains of read -> write -> fdatasync operations for everything the user wants to copy, and base the GUI's progress bar on the completion of those linked IO units. That will probably ensure the kernel has enough work enqueued to issue optimally large and aligned IOs to the underlying devices. (Also, any file management GUI really needs to be doing async IO to begin with, or at least on a separate thread. So adopting io_uring shouldn't be as big an issue as it would be for many other kinds of applications.)