I'm pretty sure you can change various file formats without rewriting the entire...

layer8 · 2026-01-10T17:15:57 1768065357

You can’t insert data into the middle of a file (or remove portions from the middle of a file) without either rewriting it completely, or at least rewriting everything after the insertion point; the latter requires holding everything after the insertion point in memory (or writing it out to another file first, then reading it in and writing it out again).

PDF is designed to not require holding the complete file in memory. (PDF viewers can display PDFs larger than available memory, as long as the currently displayed page and associated metadata fits in memory. Similar for editing.)

maweki · 2026-01-10T20:23:11 1768076591

While tedious, you can do the rewrite block-wise from the insertion point and only store a an additional block's worth of the rest (or twice as much as you inserted)

ABCDE, to insert 1 after C: store D, overwrite D with 1, store E, overwrite E with D, write E.

mattzito · 2026-01-10T17:12:29 1768065149

No, if you are going to change the structure of a structured document that has been saved to disk, your options are:

1) Rewrite the file to disk 2) Append the new data/metadata to the end of the existing file

I suppose you could pre-pad documents with empty blocks and then go modify those in situ by binary editing the file, but that sounds like a nightmare.

cubefox · 2026-01-10T17:21:17 1768065677

Aren't there file systems that support data structures which allow editing just part of the data, like linked lists?

PhilipRoman · 2026-01-10T19:12:30 1768072350

Yeah there are, Linux supports parameters FALLOC_FL_INSERT_RANGE and FALLOC_FL_COLLAPSE_RANGE for fallocate(2). Like most fancy filesystem features, they are not used by the vast majority of software because it has to run on any filesystem so you'd always need to maintain two implementations (and extensive test cases).

cubefox · 2026-01-10T19:28:23 1768073303

Interesting that after decades of file system history, this is still considered a "fancy feature", considering that editing files is a pretty basic operation for a file system. Though I assume there are reasons why this hasn't become standard long ago.

layer8 · 2026-01-10T20:33:19 1768077199

File systems aren’t databases; they manage flat files, not structured data. You also can’t just insert/remove random amounts of bytes in RAM. The considerations here are actually quite similar, like fragmentation. If you make a hundred small edits to a file, you might end up with the file taking up ten times as much space due to fragmentation, and then you’d need the file system to do some sort of defragmentation pass to rewrite the file more contiguously again.

In addition, it’s generally nontrivial for a program to map changes to an in-memory object structure back to surgical edits of a flat file. It’s much easier to always just serialize the whole thing, or if the file format allows it, appending the serialized changes to the file.

ogurechny · 2026-01-11T04:20:18 1768105218

File systems aren't databases, but journaling file systems use journals just like databases. It can theoretically define any granularity for something that might happen to a file to become an irreversible transaction. I suppose that file systems have to remain “general purpose enough” to be useful (otherwise they become part of the specific program or library), and that's why complex features which might become a pitfall for the regular users who expect “just regular files” rarely become the main focus.

cubefox · 2026-01-11T06:21:06 1768112466

But appending changes is a terrible solution, even if it is "much easier" to implement. Not only because it causes data leakage, as in this case, but also because it can strongly inflate the file size. E.g. if you change the header image of a PDF a few times.

PhilipRoman · 2026-01-10T20:49:17 1768078157

Indeed, also userspace-level atomicity is important, so you probably want to save a backup in case power goes out at an unfortunate moment. And since you already need to have a backup, might as well go for a full rewrite + rename combo.

altfredd · 2026-01-11T00:19:24 1768090764

They are fully supported almost everywhere. XFS, ext4, tmpfs, f2fs and a bunch of misc filesystems all support them.

Ext4 support dates as early as Linux 3.15, released in 2014. It is ancient at this point!

formerly_proven · 2026-01-10T19:54:58 1768074898

What this does on typical extent-based file systems is split the extent of the file at the given location (which means these operations can only be done with cluster granularity) and then insert a third extent. i.e. calling INSERT_RANGE once will give you a file with at least three extents (fragments). This, plus the mkfs-options-dependent alignment requirements, makes it really quite uninteresting for broad use in a similar fashion as O_DIRECT is uninteresting.

cubefox · 2026-01-11T06:25:12 1768112712

Well, better an uninteresting solution than a solution which is actively terrible: appending changes to a PDF, which will inflate its size and cause data leakage.

layer8 · 2026-01-10T17:44:33 1768067073

Look at the C file API which most software is based on, it simply doesn’t allow it. Writing at a given file position just overwrites existing content. There is no way to insert or remove bytes in the middle.

Apart from that, file systems manage storage in larger fixed-size blocks (commonly 4 KB). One block typically links to the next block (if any) of the same file, but that’s about the extent of it.

anthk · 2026-01-11T00:11:03 1768090263

DD should.

formerly_proven · 2026-01-10T18:21:37 1768069297

No. Well yes. On mainframes.

This is why “table of contents at the end” is such an exceedingly common design choice.

bahmboo · 2026-01-10T21:47:30 1768081650

This was 1996. A typical computer had tens of megabytes of memory with throughput a fraction of what we have today. Appending an element instead of reading, parsing, inserting and validating the entire document is a better solution in so many ways. That people doing redactions don't understand the technology is a separate problem. The context matters.