Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Roughly 0% of a typical desktop’s disk space is used by hardlinked files. You can safely double count them. That’s exactly what every disk space analyzer does!

If you really wanna avoid double counting, just divide the size of every file by st_nlink. Of course you’d have to update the cached sizes of every directory that has a link to that inode so you’d need to cache the mapping from inodes to paths too. Another solution is to cache 2 sizes per directory, one for all files with 1 link and another for files that have 2 or more. The UI could hide the latter when it’s 0GB. But this discussion is academic; Nobody really cares about hardlinks.



> Roughly 0% of a typical desktop’s disk space is used by hardlinked files. You can safely double count them. That’s exactly what every disk space analyzer does!

While that may be a reasonable strategy, it's not what every disk space analyzer does.

I've got a backup setup that uses hardlinks to provide a wide variety of restore points without using a lot of space. du doesn't double count:

   $ du -hs daily.0
   436G    daily.0

   $ du -hs daily.1
   436G    daily.1

   $ du -hs daily.0 daily.1
   436G    daily.0
   12M    daily.1


Not sure what you consider a "typical desktop", but on Windows, WinSxS has gigabytes worth of hardlinks. If you don't care about them that's another matter I guess.


Also note that the user will be confused when they delete the whole directory and observe 0 bytes get freed. (I guess a similar problem is also there even if you double count.)

The point is, the problem itself is-ill defined. There's no solution to that other than scrapping or redefining the problem itself. And it's hard to define the problem precisely for a non-technical user.


> Also note that the user will be confused when they delete the whole directory and observe 0 bytes get freed.

Note that's already the case when the user removes files that are opened by some process.


On Linux yeah. On Windows no.


On any non POSIX system.


Modern filesystems that use CoW will share data between files even without any hard links.


Then you get into semantic arguments: If a directory contains 2 1GB files, does the user care that 99% of their blocks are shared, or that just an under-the-hood implementation detail, and the user wants to know that there are 2GB worth of files in there?


Really, there should be two file sizes: "how much space this will take if I copy it to other filesystem" and "how much space will be freed if I delete this"


Well, three, "how many bytes do I get if I open it and read all the bytes out". Or maybe four, how many bytes do I get if I open it and read all bytes which aren't holes (ie. how many bytes do I need to put into an archive that supports sparse files) :)


I would expect at least one of those cases to be identical to "how much space this will take if I copy it to other filesystem" if you're asking a generic question where the target is a hypothetical and therefore you have to say how many bytes would be taken by the raw files since everything else requires knowing specific details about the target.


> "how much space this will take if I copy it to other filesystem"

This is ambiguous between "how many bytes are all these files in total" and "how many bytes does it take to store a single copy of all these files on such-and-such file system (mostly the current one)". The latter can be different because of transparent compression, which is common on e.g. BTRFS.


Don't forget "How much of my free storage/broadband data will be used up when I attach this to an email?"


Windows explorer does that.

They are labelled "Size" and "Size on disk", respectively.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: