Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting. Would this be suitable as a replacement for NFS? In my experience literally everyone in the silicon design industry uses NFS on their compute grid and it sucks in numerous ways:

* poor locking support (this sounds like it works better)

* it's slow

* no manual fence support; a bad but common way of distributing workloads is e.g. to compile a test on one machine (on an NFS mount), and then use SLURM or SGE to run the test on other machines. You use NFS to let the other machines access the data... and this works... except that you either have to disable write caches or have horrible hacks to make the output of the first machine visible to the others. What you really want is a manual fence: "make all changes to this directory visible on the server"

* The bloody .nfs000000 files. I think this might be fixed by NFSv4 but it seems like nobody actually uses that. (Not helped by the fact that CentOS 7 is considered "modern" to EDA people.)



> poor locking support (this sounds like it works better)

File locking on Unix is in general a clusterf*ck. (There was a thread a few days ago at https://news.ycombinator.com/item?id=46542247 )

> no manual fence support; a bad but common way of distributing workloads is e.g. to compile a test on one machine (on an NFS mount), and then use SLURM or SGE to run the test on other machines. You use NFS to let the other machines access the data... and this works... except that you either have to disable write caches or have horrible hacks to make the output of the first machine visible to the others. What you really want is a manual fence: "make all changes to this directory visible on the server"

In general, file systems make for poor IPC implementations. But if you need to do it with NFS, the key is to understand the close-to-open consistency model NFS uses, see section 10.3.1 in https://www.rfc-editor.org/rfc/rfc7530#section-10.3 . Of course, you'll also want some mechanism for the writer to notify the reader that it's finished, be it with file locks, or some other entirely different protocol to send signals over the network.


> In general, file systems make for poor IPC implementations.

I agree but also they do have advantages such as simplicity, not needing to explicitly declare which files are needed, lazy data transfer, etc.

> you'll also want some mechanism for the writer to notify the reader that it's finished, be it with file locks, or some other entirely different protocol to send signals over the network.

The writer is always finished before the reader starts in these scenarios. The issue is reads on one machine aren't guaranteed to be ordered after writes on a different machine due to write caching.

It's exactly the same problem as trying to do multithreaded code. Thread A writes a value, thread B reads it. But even if they happen sequentially in real time thread B can still read an old value unless you have an explicit fence.


> The writer is always finished before the reader starts in these scenarios. The issue is reads on one machine aren't guaranteed to be ordered after writes on a different machine due to write caching.

In such a case it should be sufficient to rely on NFS close-to-open consistency as explained in the RFC I linked to in the previous message. Closing a file forces a flush of any dirty data to the server, and opening a file forces a revalidation of any cached content.

If that doesn't work, your NFS is broken. ;-)

And if you need 'proper' cache coherency, something like Lustre is an option.


It wasn't my job so I didn't look into this fully, but the main issue we had was clients claiming that files didn't exist when they did. I just reread the NFS man page and I guess this is the issue:

> To detect when directory entries have been added or removed on the server, the Linux NFS client watches a directory's mtime. If the client detects a change in a directory's mtime, the client drops all cached LOOKUP results for that directory. Since the directory's mtime is a cached attribute, it may take some time before a client notices it has changed. See the descriptions of the acdirmin, acdirmax, and noac mount options for more information about how long a directory's mtime is cached.

> Caching directory entries improves the performance of applications that do not share files with applications on other clients. Using cached information about directories can interfere with applications that run concurrently on multiple clients and need to detect the creation or removal of files quickly, however. The lookupcache mount option allows some tuning of directory entry caching behavior.

People did talk about using Lustre or GPFS but apparently they are really complex to set up and maybe need fancier networking than ethernet, I don't remember.


I did set up GPFS tadam... almost exactly 20 years ago. I wouldn't say it absolutely required fancy networking (infiniband) or was extraordinary complex to set up, certainly on par with NFS when you hit its quirks (which was the reason we went off experimenting with gpfs and whatnot).


FUSE is full of gotchas. I wouldn't replace NFS with JuiceFS for arbitrary workloads. Getting the full FUSE set implemented is not easy -- you can't use sqlite on JuiceFS, for example.

The meta store is a bottleneck too. For a shared mount, you've got a bunch of clients sharing a metadata store that lives in the cloud somewhere. They do a lot of aggressive metadata caching. It's still surprisingly slow at times.


> FUSE is full of gotchas

I want to go ahead and nominate this for the understatement of the year. I expect that 2026 is going to be filled with people finding this out the hard way as they pivot towards FUSE for agents.


Mind helping us all out ahead of time by expanding on what kind of gotchas FUSE is full of?


It depends on what level of FUSE you're working with.

If you're running a FUSE adapter provided by a third party (Mountpoint, GCS FUSE), odds are that you aren't going to get great performance because it's going to have to run across a network super far away to work with your data. To improve performance, these adapters need to be sure to set fiddly settings (like using Kernel-side writeback caching) to avoid the penalty of hitting the disk for operations like write.

If you're trying to write a FUSE adapter, it's up to you to implement as much of the POSIX spec that you need for the programs that you want to run. The requirements per-program are often surprising. Want to run "git clone", then you need to support the ability to unlink a file from the file system and keep its data around. Want to run "vim", you need the ability to do renames and hard links. All of this work needs to happen in-memory in order to get the performance that applications expect from their file system, which often isn't how these things are built.

Regarding agents in particular, I'm hopeful that someone (which is quite possibly us), builds a FUSE-as-a-service primitive that's simple enough to use that the vast majority of developers don't have to worry about these things.


> you need to support the ability to unlink a file from the file system and keep its data around. Want to run "vim", you need the ability to do renames and hard links

Those seem like pretty basic POSIX filesystem features to be fair. Awkward, sure... there's also awkwardness like symlinks, file locking, sticky bits and so on. But these are just things you have to implement. Are there gotchas that are inherent to FUSE itself rather than FUSE implementations?


These are basic POSIX features, but I think the high-level point that Kurt is trying to make is that building a FUSE file system signs you up for a nearly unlimited amount of compatibility work (if you want to support most applications) whereas their approach (just do a loopback ext4 fs into a large file) avoids a lot of those problems.

My expectations are that in 2026 we will see more and more developers attempt to build custom FUSE file systems and then run into the long tail of compatibility pain.


> just do a loopback ext4 fs into a large file

How does that work with multiple clients though?


tl;dr it doesn't. I'm not sure what they're planning in this capacity (I haven't checked out sprites myself), but I would guess that it's going to be a function of "snapshots" as a mechanism to give multiple clients ephemeral write access to the same disk.


> * The bloody .nfs000000 files. I think this might be fixed by NFSv4 but it seems like nobody actually uses that. (Not helped by the fact that CentOS 7 is considered "modern" to EDA people.)

Unfortunately, NFSv4 also has the silly rename semantics...


AFAIU the NFSv4 protocol in principle allows implementing unlinking an open file without silly rename, but the Linux client still does the silly rename dance.


> NFSv4 but it seems like nobody actually uses that

Hurry up and you might be able to adopt it before its 30th birthday!


How about CephFs?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: