Roughly 0% of a typical desktop’s disk space is used by hardlinked files. You can safely double count them. That’s exactly what every disk space analyzer does!
If you really wanna avoid double counting, just divide the size of every file by st_nlink. Of course you’d have to update the cached sizes of every directory that has a link to that inode so you’d need to cache the mapping from inodes to paths too. Another solution is to cache 2 sizes per directory, one for all files with 1 link and another for files that have 2 or more. The UI could hide the latter when it’s 0GB. But this discussion is academic; Nobody really cares about hardlinks.
> Roughly 0% of a typical desktop’s disk space is used by hardlinked files. You can safely double count them. That’s exactly what every disk space analyzer does!
While that may be a reasonable strategy, it's not what every disk space analyzer does.
I've got a backup setup that uses hardlinks to provide a wide variety of restore points without using a lot of space. du doesn't double count:
$ du -hs daily.0
436G daily.0
$ du -hs daily.1
436G daily.1
$ du -hs daily.0 daily.1
436G daily.0
12M daily.1
Not sure what you consider a "typical desktop", but on Windows, WinSxS has gigabytes worth of hardlinks. If you don't care about them that's another matter I guess.
Also note that the user will be confused when they delete the whole directory and observe 0 bytes get freed. (I guess a similar problem is also there even if you double count.)
The point is, the problem itself is-ill defined. There's no solution to that other than scrapping or redefining the problem itself. And it's hard to define the problem precisely for a non-technical user.
Then you get into semantic arguments: If a directory contains 2 1GB files, does the user care that 99% of their blocks are shared, or that just an under-the-hood implementation detail, and the user wants to know that there are 2GB worth of files in there?
Really, there should be two file sizes: "how much space this will take if I copy it to other filesystem" and "how much space will be freed if I delete this"
Well, three, "how many bytes do I get if I open it and read all the bytes out". Or maybe four, how many bytes do I get if I open it and read all bytes which aren't holes (ie. how many bytes do I need to put into an archive that supports sparse files) :)
I would expect at least one of those cases to be identical to "how much space this will take if I copy it to other filesystem" if you're asking a generic question where the target is a hypothetical and therefore you have to say how many bytes would be taken by the raw files since everything else requires knowing specific details about the target.
> "how much space this will take if I copy it to other filesystem"
This is ambiguous between "how many bytes are all these files in total" and "how many bytes does it take to store a single copy of all these files on such-and-such file system (mostly the current one)". The latter can be different because of transparent compression, which is common on e.g. BTRFS.
If you really wanna avoid double counting, just divide the size of every file by st_nlink. Of course you’d have to update the cached sizes of every directory that has a link to that inode so you’d need to cache the mapping from inodes to paths too. Another solution is to cache 2 sizes per directory, one for all files with 1 link and another for files that have 2 or more. The UI could hide the latter when it’s 0GB. But this discussion is academic; Nobody really cares about hardlinks.