> Ideally this diagram would be automatically generated using tooling... The big...

layer8 · on Oct 22, 2023

Diagrams are similar to textual documentation: You generally can’t auto-generate useful ones from code, unless the code has extra markup that specifies what to generate. Diagrams often present a specific perspective that emphasizes certain features while omitting others. You might have several diagrams for the same entity, each illustrating a different aspect or scenario.

The upshot is that diagrams have to be maintained in conjunction with the code and application architecture, just like textual documentation. There’s just no alternative to simply putting the work in, and making it a regular part of change management.

simpaticoder · on Oct 22, 2023

The correct solution is a synthesis: generate the boxes and arrows from code. Then let a human hide, move, and style those objects. When the code changes, the boxes and arrows will change, and perhaps the style will want to change, but at least the diagram will remain correct. This is precisely the distinction between semantic markup and css, btw. Implying that the html would be auto generated, the css would be hand crafted.

falcor84 · on Oct 22, 2023

I would actually be really interested in a bi-directional workflow, such that I would also be able to perform system redesign by changing the diagram's connections - the tooling will then automatically update the interface and the tests such that I wouldn't be able to commit my changes until the implementation matches the diagram.

jakewins · on Oct 23, 2023

I realize this isn’t what you’re arguing for, but as a cautionary anecdote on this topic:

Paul alludes to it in his post about Healthcare.gov from the other day, but this was apparently one of the major parts of the failure there - endless lines of code generated from UML diagrams, making reading the code hard and things like “can we add a trace statement here” difficult.

Optimising for diagrams-as-source-of-truth has drawbacks for debugging and maintainability of the running code

https://www.pauladamsmith.com/blog/2023/10/the-10-year-anniv...

Sophistifunk · on Oct 23, 2023

IMO the source code to aspects of the system that are best shown as graphs should be the graphs.

bdg · on Oct 22, 2023

This requires a preset architecture standard that explains specifically what boxes and arrows are, and how boxes/arrows interact. Lots of software smears a logical box out over several folders in the code, sometimes even with entirely different names. I don't just mean people write non-cohesive code, I mean frameworks tend to prefer organisation by layer ("the views go in the view folder!") instead of organisation by module ("These things work and change together with a defined boundary").

You can have what you're asking for if you agree to a predefined architecture and everyone agrees to write code that way.

learn_more · on Oct 22, 2023

Check out https://schematix.com/video/?play=schematix-editing

Schematix generates diagrams (models) from code which can be entered via the web interface, or from a remote command line or scripts.

Diagrams are rendered on-the-fly from queries called "topological expressions" run against the model. The model must be updated as IT workers change the environment, but since diagrams are generated from code, they always reflect the most up to date information from the model.

windlep · on Oct 22, 2023

This is how I use the tool the author created, Structurizr. I auto-generate the initial diagrams, then manually fix it to make it more readable. We then check-in the JSON workspace export into our git repo. The first two levels of the C4 model don't change regularly, so this isn't a frequent process to repeat.

mumblemumble · on Oct 22, 2023

Agreed. I experimented with autogenerating C4 diagrams from source a while back, but quickly abandoned the project when I realized that the output was inevitably the flowchart equivalent of

  bool mystery_func(int i) { // define a new function
    int x = (int)(i / 2);    // x is i divided by 2 and truncated
    int y = x * 2;           // y is twice x
    return y == i;           // return true if y == i
  }                          // end of function

nuancebydefault · on Oct 22, 2023

Indeed, the documentation (diagrams/text with particular layout and formatting) should abstract a lot of what the code does. Otherwise, what's the point since the code is there for you to read.

In my experience, auto-generated documentation in general does a poor job.

That said, such automation is improving by virtue of better AI, that understands and cross translates people's languages and computer languages.

mpweiher · on Oct 22, 2023

> You generally can’t auto-generate useful [architecture diagrams] from code,

The question is: why is this the case?

IMNSHO, the reason is that we don't have a way to express architecture in or as code. Instead, we have to compile the actual architecture of the system into one that is expressible using the call/return architectural style that our programming languages support.

That's a lossy process.

When we can program with actual architectural connectors, auto-generating useful architectural diagrams becomes trivial.

https://objective.st

vlovich123 · on Oct 22, 2023

I don’t think that’s the problem. There are so many different ways to look at a system. An architectural diagram shows you how components connect. A procedural diagram shows you what steps a program takes. An entity diagram shows you the major high level entities you’ve selected into your system. I just don’t see how a single language can express all the detail. Maybe a new one can be invented but I don’t see how S expressions solve this (the link you posted don’t have any indication that any diagram generation is part of the language). Also, diagrams can be useful even if they have birotted because discrepancies are a good teaching tool - “why does this diagram say x but the code seems to do y” is a tracing opportunity, an opportunity to update the diagram, and highlights which engineers are paying attention.

Maybe if LLMs get sufficiently advanced they can generate this stuff more automatically with some minor prompting with the code as context, but I doubt it. Not until AI can actually start understanding sentiment from code.

mpweiher · on Oct 22, 2023

> An architectural diagram shows you how components connect.

That's the one we're talking about here.

> A procedural diagram shows you what steps a program takes

Which is useful if the program is procedural.

> An entity diagram shows you the major high level entities you’ve selected into your system.

That's the top-level of your architectural diagram.

> I just don’t see how a single language can express all the detail.

When the program's architectural style is call/return, we can do this just fine. Using stepwise refinement we move up and down the abstraction ladder of our system and thus add/remove detail. No problem.

The problem is that most systems these days are not primarily call/return, but have to be programmed in call/return languages.

> I don’t see how S expressions solve this

I don't either, and have no idea what S expressions have to do with this.

> .... diagram generation is part of the language

It's not part of the language. But it's part of the frameworks, and when the abstractions of your language are architectural in nature, there is a very close correspondence between the code and the diagrams.

> ... discrepancies are a good teaching tool...

Yes, since the diagrams are generated, it's easy to keep older version around and generate new ones on-demand. Then you can check the differences visually, in code, or as the difference between code and diagram. Take your pick.

> Not until AI can actually start understanding sentiment from code.

Again, the point is that architecture is not actually "sentiment". It is structural. The fact that it looks like sentiment in practice is purely a side effect of our programming languages being incapable of encoding architecture in the general case.

rewmie · on Oct 22, 2023

> The question is: why is this the case?

Diagrams capture and represent abstract and/or high level concepts. Their goal is to provide a kind of mental map of how specific aspects of a project are organized in order to help developers form their mental models. This means placing the focus on key constructs and downplaying the importance or relevance of other components. There isn't an objective way to provide a one-to-one mapping between what bits of a software project are of critical importance to form mental models, and what bits are irrelevant.

> IMNSHO, the reason is that we don't have a way to express architecture in or as code.

Not true. We have more than plenty of ways to represent software architectures. That is trivial and a solved problem. So is mapping representations of software architectures to (some) source code, at least in the form of skeleton code. The problem lies in mapping software projects to a software architecture, even when the software project is clean and a textbook example of a very specific software architecture.

mpweiher · on Oct 23, 2023

You are confusing “implement” and “express”

If the architecture were directly expressed in the code, mapping back would be trivial. It is not because we effectively compile the actual architecture in order to express it in code. Mapping back then becomes decompiling, with the added complications of architectural mismatch and manual compilation.

rafaelmn · on Oct 22, 2023

I use diagrams for two distinct use cases :

- planning/outlining a solution

- documenting/insight into the system

I disagree that second can't be autogenerated - I've used class diagrams, database schema diagrams etc. to visualize projects, a lot of the time over the documentation - precisely because I can trust the generated diagrams to reflect current state.

Documentation is nice for context but I'd take good visualisation tools over most documentation I've seen on projects I've worked on.

Tooling to connect/validate documentation against code is non-existent, in my world at least. Maybe LLMs can change that down the line - have PR review against docs run as a part of CI/CD pipeline.

barrysteve · on Oct 23, 2023

You can generate it from code.

Diagrams are a flat map of someone's subjective and structured, interpretation of the code architecture.

Words displayed on a computer screen must be developed into supporting subjective knowledge structures.. free from the usual objective mindset of engineering.

You have to go up to first order concepts to get it. The vast majority of (popular) programming culture generates more of the same. Everybody tries to paper over this truth with metadata and it metaphysically does not work.

Computers are capable of much more, they are politically limited down to a small subset of what's possible.

bdg · on Oct 22, 2023

The code-model gap is why we don't have this.

You don't organize code the way you mentally model it in many projects, and nearly all languages lack a way to solve this. Annotating code is prone to the same issue as keeping a diagram up to date, and the same issue as keeping comments or documentation up to date.

codeflo · on Oct 22, 2023

> nearly all languages lack a way to solve this

The only thing I’ve seen that goes in this direction is Knuth’s literate programming. I’ve tried it. In its current form it’s still clumsy, lacks tool support and IMO doesn’t fully solve the problem of how to deal with documenting a changing piece of software yet. Knuth got his requirements correct on the first try; the rest of us aren’t so lucky.

hcks · on Oct 22, 2023

In my experience diagrams that are pedestrian enough to be automatically generated from the codebase don’t add much value

rewmie · on Oct 22, 2023

> The biggest problem I've seen with architecture diagrams is they fall out of sync with the code base. In my opinion, automatic generation of these diagrams is necessary.

Architecture diagrams document how the software is expected to be organized. They represent the goal, not the current state. The code needs to comply with the diagram, and not the other way around.

The only scenario where it makes sense to generate diagrams from code is when we have people trying to onboard to a project that's not documented, and even then these diagrams are only generated once, polished to remove noise, and from that point onward serve as the reference.

hobofan · on Oct 22, 2023

So how would you expect to insight on whether the current code differs from the planned design documents? By always applying a lot of manual human labor?

rewmie · on Oct 22, 2023

> So how would you expect to insight on whether the current code differs from the planned design documents?

Developers are expected to know what they are doing and how their software project is organized.

> By always applying a lot of manual human labor?

That "manual labor" has a name: software development.

Software only changes if developers submit changes. Changes are reviewed as part of code reviews.

adrianN · on Oct 22, 2023

In all projects that I’ve worked on the code was much too complex for a single developer to have even a surface level understanding of all of it, yet one is regularly required to change unfamiliar pieces.

rewmie · on Oct 22, 2023

> In all projects that I’ve worked on the code was much too complex for a single developer to have even a surface level understanding of all of it, yet one is regularly required to change unfamiliar pieces.

That sounds like a self-inflicted problem, caused by a team failing to develop and maintain their system following basic software engineering principles. I'm not sure how diagrams are relevant.

growse · on Oct 22, 2023

This is not a trivial problem to solve. Some would say that one of the entire points of software engineering is to assure that the code meets the design spec. A more rigorous approach would be to encode your design as a bunch of linting rules that you could run against your codebase (IaC and all).

I'm pretty sure that auto generating a diagram from some code and then trying to work out if it's semantically equivalent to something that was hand drawn is not the answer though. For one thing, the code doesn't contain or implement every single important aspect of the design.

mnahkies · on Oct 22, 2023

Yeah it's hard.

For http API design I like to start with an openapi spec then generate as much of the server and client library implementation from this as possible.

The spec gives a language/implementation agnostic way to describe what you're intending to build that's nicely diff-able over time, and you can generate a lot of the boilerplate that's easy to screw up in a way that's both compile time (static types) and runtime (parsing/validation of inputs & outputs) safe.

I can imagine a world where a similar approach could work for higher level architecture. It's pretty common to have a shared helm chart that (largely) defines each individual logical service in k8s environments.

Taken to the extreme you could provision your data stores, and network policies etc using this approach such that an individual services chart defines exactly what it depends on. Throw in some metadata fields for descriptions and you're well on the way to having something that could generate some useful diagrams / documentation.

Of course the issue with such helm charts is that if you make them flexible enough to suit everybody eventually you'll just reimplement the underlying APIs they are calling - perhaps some approach using direct introspection of k8s resources and cloud resources with a standardized set of metadata to group and describe relationships might be more feasible.

For the moment I'll probably stick to excalidraw

growse · on Oct 22, 2023

Sure, but there’s a whole dimension missing here.

Architecture is more than simply “what", it’s also "why". It binds the context to the requirements and the desired components< their relationships and interactions. Some other comment described architecture to code as a lossy process, and that’s exactly right.

A diagram is not "the architecture", it’s simply a view on it, or a "projection of the model" as the c4 folk like to express it as.

I just find the idea that we should automate diagram production because diagrams are hard to keep up to date a little quaint, because you hardly ever need to update just a diagram when changing the architecture. So your actual problem is that your design documentation is hard to keep up to date, and that’s a process problem. Generating diagrams from code won’t save you there.

taeric · on Oct 22, 2023

On the contrary, I think auto docs are robbing the team of the ability to think in terms of the higher level of abstraction.

High level diagrams should be disposable and rapid to generate. They are as important for what they omit as for what they show.

Waterluvian · on Oct 22, 2023

Some problems are people or process problems and can’t always be waved away with tools. Sometimes the answer is to enforce growth of the professional discipline to update documentation alongside code changes.

A way I addressed this was to add a checklist item automatically to PRs, “did you review and update the docs?” And put the docs in the same repo so that a code change will have documentation updates in the same PR. It’s mostly worked but still relies on discipline.

It’s kind of interesting how hard this is for some. The code change is 5 mins. Testing is 20. Documentation is another 10. I’ve seen lots of people not want to do the testing and really not want to do the documentation.

photonbeam · on Oct 22, 2023

Ive seen testing reluctantly tied to “they’ll make me change everything in code review, so why waste the effort yet”

Waterluvian · on Oct 22, 2023

Yeah. And it’s possible the process is defective if code review can result in that much change. Oftentimes people skip most of the design process. I’ve been guilty of this.

Reminds me of the army mantra, “slow is smooth. Smooth is fast.” Skipping or rushing design has never actually saved time in my experience. And I see veterans repeating this mistake over and over.

BerislavLopac · on Oct 23, 2023

The idea of C4 is to document the higher-level elements of the system - applications, components and the like. The first three Cs stand for "context", "container" and "component", while the fourth level, "code" is deemed optional.

IcePanel [0], a great tool for building C4 documentation, renames "containers" to "applications" for clarity, and instead of code diagrams simply links to the corresponding repos.

[0] https://icepanel.io/

gonzo41 · on Oct 22, 2023

You can't really make complexity go away. It just get's moved about. Auto creating diagrams will either mean specifying a new code artifact that will need to be kept up to date, and or create dependencies that will themselves fall out of sink with the code base. Or they'll be really simplistic and useless.

I think the best way to document a system is to write doco and just specifyc the intent of the system. What was this thing meant to do. That context is really useful for contrasting with the use of the system in a prod environment.

abhishekjha · on Oct 22, 2023

Why is it that compilers don't do this? They have a parse tree for how the sysmbols connect.

Would it not be appropriate to extend the compiler for visualising relationships between software components with zoom-in and zoom-out facilities. Zoom-in takes you to Assembly and zoom-out to the CTO.

nonameiguess · on Oct 22, 2023

I think comments like these are too parochial in scope. Note the first actual example here, which is the system context. In this example, it describes the relationships between a banking customer, an Internet banking system, a backend backend banking mainframe, and an e-mail server.

Yes, your software may explicitly model all of these system components and potentially you can generate a system model from the code, but that would an entirely wrong approach. As a sibling comment says, this system context view describes the real world, not your software. The code is supposed to conform to the model, not the other way around. If the implementation has drifted to not be in-sync with the model, there are a few reasons this may happen:

- The legal or regulatory landscape actually changed. In this case, yes, the code may be more up-to-date and you need to change the model.

- External components your organization doesn't directly control changed. In this case, also, it may be the model that is wrong.

- The model is right and your code needs to change. Maybe you are not correctly handling an external third-party API. Maybe you're not correctly meeting your customer's needs. In the worst case, maybe you're breaking a law.

I would also think the reality at something as expansive as a bank, there is no such thing as the codebase. You don't have a single product. You have the backend data store and transaction processing system. You have kiosk software for your ATMs. You have workstation software for your tellers. You have a public-facing website for your customers. You have a mobile app. You may have an entirely separate set of insurance products, investment products, and so on. You have internal management and accounting system for generating reports. Most likely all of these need to be separate system, at least because one temporally predates others. In part because a bank is formed by mergers, acquisitions, and divestments, so some products may have originally been part of a totally separate organization and some may be destined to be their own totally separate organizations. Strategically as a company, you can't afford to give up that level of financial agility by creating hard software-level couplings between your entire product suite.

So sure, at the level of any single component, you may be able to autogenerate a high-level architecture diagram. But at the level of the entire system, you can't. This is probably most clear and obvious with something like the DODAF: https://dodcio.defense.gov/Library/DoD-Architecture-Framewor...

These are much-maligned and for good reason. They're often incomprehensible. But to the extent you're trying to model something like the operation of a war campaign, you're now involving:

- C2 systems for multiple branches of the military

- ISR systems for those same branches

- Communications systems

- Operational capabilities of all of the various intelligence agencies and foreign allies you interoperate with

- Weapons systems

- Tablet terminal, man-pack, and in-vehicle devices for your forward tactical elements

All of these are software systems, but they're developed on different cadences, by separate contractors on separate contracts, with separate fiscal appropriations bills and lines of accounting. Nonetheless, there is still a need at the strategic level to model the entire system. In order for this system to have any hope of working, it needs to be based on specifications with the expectation being that implementations will conform to the spec, not the other way around. It's more like developing the Internet than developing a web app. You can't autogenerate a diagram of the Internet, at least not one with any authority, by pointing it at the code for a server, a browser, an endpoint networking stack, and the networking stacks for various appliances like core routers, and figuring out some way to link those together, especially given you'd have to cut across arbitrarily many programming languages and code styles.

donutshop · on Oct 23, 2023

I think what Adam Jacob is doing with system initiative addresses this, right off the get go.