Hacker Newsnew | past | comments | ask | show | jobs | submit | ltratt's commentslogin

> If this project would be able to detect the interpreter hotspots itself and completely automate the procedure, it would be great.

I don't think that's realistic; or, at least, not if you want good performance. You need to use quite a bit of knowledge about your context to know when best to add optimisation hints. That said, it's not impossible to imagine an LLM working this out, if not today, then perhaps in the not-too-distant future! But that's above my pay grade.


Thanks for sharing this technology. I hope it gets upstreamed into LLVM.

You're quite right that since we're working with LLVM IR, adapting to other languages is probably not _that_ difficult, though these things always end up taking more time than I expect! Since the majority of real-world problems in this area depend on C interpreters, we put our limited resources to that problem. You're also right that "interpreters" is a pretty vague category, and there are other parts of C (and other) programs that could be yk-ified, though I suspect it would be a fairly specialised subset of programs.

Our fork of LLVM does add a pass, amongst other changes, but we also have to do things like change stackmaps in a way that breaks compatibility. Whether stackmaps in their current incarnation are worth retaining compatibility for is above my pay grade! So some of our changes are probably upstreamable, but some might be considered too niche for wider integration.

I'm assuming you're referring to the Python finaliser example? If so, there's no syntax sugar hiding function calls to finalisers: you can verify that by running the code on PyPy, where the point at which the finaliser is called is different. Indeed, for this short-running program, the most likely outcome is that PyPy won't call the finaliser before the program completes!


We don't exactly want Alloy to have to be conservative, but Rust's semantics allow pointers to be converted to usizes (in safe mode) and back again (in unsafe mode), and this is something code really does. So if we wanted to provide an Rc-like API -- and we found reasonable code really does need it -- there wasn't much choice.

I don't think Rust's design in this regard is ideal, but then again what language is perfect? I designed languages for a long while and made far more, and much more egregious, mistakes! FWIW, I have written up my general thoughts on static integer types, because it's a surprisingly twisty subject for new languages https://tratt.net/laurie/blog/2021/static_integer_types.html


> We don't exactly want Alloy to have to be conservative, but Rust's semantics allow pointers to be converted to usizes (in safe mode) and back again (in unsafe mode), and this is something code really does. So if we wanted to provide an Rc-like API -- and we found reasonable code really does need it -- there wasn't much choice.

You can define a set of objects for which this transformation is illegal --- use something like pin projection to enforce it.


The only way to forbid it would be to forbid creating pointers from `Gc<T>`. That would, for example, preclude a slew of tricks that high performance language VMs need. That's an acceptable trade-off for some, of course, but not all.


Not necessarily. It would just require that deriving these pointers be done using an explicit lease that would temporarily defer GC or lock an object in place during one. You'd still be able to escape from the tyranny of conservative scanning everything.


If you've used Chrome or Safari to read this post, you've used a program that uses (at least in parts) conservative GC. [I don't know if Firefox uses conservative GC; it wouldn't surprise me if it does.] This partly reflects shortcomings in our current compilers and in current programming language design: even Rust has some decisions (e.g. pointers can be put in `usize`s) that make it hard to do what would seem at first glance to be the right thing.


Also most mobile games written in C# use a conservative GC (Boehm).


Not just mobile games - all games made with Unity.


As Koffiepoeder suggests, since the vast majority of content on my site is static, I only have to compress a file once when I build the site, no matter how many people later download it. [The small amount of dynamic content on my site isn't compressed, for the reason you suggest.]


That’s a good point, didn’t know it was cached on top.


As an example, I like to point people at https://doc.rust-lang.org/std/cell/struct.UnsafeCell.html which for many years now has contained this line:

> The precise Rust aliasing rules are somewhat in flux, but the main points are not contentious

I've sometimes found myself in situations where the only way I've been able to deal with this is to check the compiler's output and trawl forums for hints by Rust's developers about what they think/hope the semantics are/will be.

Historically speaking, this situation isn't uncommon: working out exactly what a language's semantics should be is hard, particularly when it has many novel aspects. Most major languages go through this sort of sequence. Some sooner or later than others --- and some end up addressing it more thoroughly than others). Eventually I expect Rust to develop something similar to the modern C spec, but we're not there yet.


Excellent - thank you for the example and the clarification. This is exactly what I was looking for.


Because Morello is an experimental platform, only a small number were manufactured. They are/were allocated mostly to people involved in early stages CHERI R&D and, AFAIK, none were made available to the general public. [That said, I don't know whether there are still some unallocated machines!] One can fully emulate Morello with qemu. While the emulator is, unsurprisingly, rather slow, I generally use qemu for quick Morello experiments, even though I have access to physical Morello boards.


You're quite right, I over-simplified -- mea culpa! That should have said "often unify these phases". FWIW, I've written recursive descent parsers with and without separate lexers, though my sense is that the majority opinion is that "recursive descent" implies "no separate lexer".


For what it's worth, in my little corner of the world, all of the recursive descent parsers I've seen and worked with have separate lexers. I can't recall seeing a single recursive descent parser in industry that didn't separate lexing.

However, I do often see a little fudging the two together for funny corners of the language. Often that just means handling ">>" as right-shift in some contexts and nested generics in others.


That's not my impression of the majority opinion, fwiw. (I wrote my first recursive-descent parser in the 80s and I learned from pretty standard sources like one of Wirth's textbooks.)


As another data point in addition to the sibling comments, all IntelliJ language parsers use recursive descent with a separate lexer.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: