Then you get into semantic arguments: If a directory contains 2 1GB files, does the user care that 99% of their blocks are shared, or that just an under-the-hood implementation detail, and the user wants to know that there are 2GB worth of files in there?
Really, there should be two file sizes: "how much space this will take if I copy it to other filesystem" and "how much space will be freed if I delete this"
Well, three, "how many bytes do I get if I open it and read all the bytes out". Or maybe four, how many bytes do I get if I open it and read all bytes which aren't holes (ie. how many bytes do I need to put into an archive that supports sparse files) :)
I would expect at least one of those cases to be identical to "how much space this will take if I copy it to other filesystem" if you're asking a generic question where the target is a hypothetical and therefore you have to say how many bytes would be taken by the raw files since everything else requires knowing specific details about the target.
> "how much space this will take if I copy it to other filesystem"
This is ambiguous between "how many bytes are all these files in total" and "how many bytes does it take to store a single copy of all these files on such-and-such file system (mostly the current one)". The latter can be different because of transparent compression, which is common on e.g. BTRFS.