Hacker Newsnew | past | comments | ask | show | jobs | submit | ogogmad's commentslogin

Some people appreciate that an LR/LALR parser generator can prove non-ambiguity and linear time parse-ability of a grammar. A couple of examples are the creator of the Oil shell, and one of the guys responsible for Rust.

It does make me wonder though about why grammars have to be so complicated that such high-powered tools are needed. Isn't the gist of LR/LALR that the states of an automaton that can parse CFGs can be serialised to strings, and the set of those strings forms a regular language? Once you have that, many desirable "infinitary" properties of a parsing automaton can be automatically checked in finite time. LR and LALR fall out of this, in some way.


Production compilers must have robust error recovery and great error messages, and those are pretty straightforward in recursive descent, even if ad hoc.

I think even the theory of Regular Languages is somewhat overdone: You can get the essence of what NFAs are without really needing NFAs. You can get O(n) string matching without formally implementing NFAs, or using any other formal model like regex-derivatives. In fact, thinking in terms of NFAs makes it harder to see how to implement negation (or "complement" if you prefer to call it that) efficiently. It's still only linear time!

The need for NFA/DFA/derivative models is mostly unnecessary because ultimately, REG is just DSPACE(O(1)). That's it. Thinking in any other way is confusing the map with the territory. Furthermore, REG is extremely robust, because we also have REG = DSPACE(o(log log n)) = NSPACE(o(log log n)) = 1-DSPACE(o(log n)). For help with the notation, see here: https://en.wikipedia.org/wiki/DSPACE


Quick other one: To parse infix expressions, every time you see "x·y | (z | w)", find the operator of least binding power: In my example, I've given "|" less binding power than "·". Anyway, this visually breaks the expression into two halves: "x·y" and "(z | w)". Recursively parse those two subexpressions. Essentially, that's it.

The symbols "·" and "|" don't mean anything - I've chosen them to be visually intuitive: The "|" is supposed to look like a physical divider. Also, bracketed expressions "(...)" or "{...}" should be parsed first.

Wikipedia mentions that a variant of this got used in FORTRAN I. You could also speed up my naive O(n^2) approach by using Cartesian trees, which you can build using something suspiciously resembling precedence climbing.


An even easier approach is to give all infix operators the same precedence and force the programmer to group subexpressions.

You can always write lisp but most people can read code better that doesnt have that many (((()))))))

I'm sure there's a middle ground which still gives you some of the metaprogramming power of Lisp. OTOH this: https://www.gingerbill.org/article/2026/02/21/does-syntax-ma...

It's not a 2,000 year old dispute. Zionism began in around 1900. It was spearheaded until recently by "secular" Jews, who were borderline atheist. The Jewish religious texts themselves make wishing for a "return to Zion and Jerusalem" sound like wishing for a utopia or world peace. It pretty much reads like a metaphor, not like a political programme. Finally, most highly devout Jews were strongly opposed to Zionism, at least until after WW2.

That comment accurately described what American evangelicals believe.

American evangelicals don't care about 1900, differences between secular and religious Jews or their disputes. They don't care at all. They actually agree with a lot of what loosing side of WWII said and thought. And they in fact do believe the end of times prophecy and their duty to speed it up.

If you are unaware of that, maybe you should not be so arrogant when comment on politics. Because the radical American religious leaders are literally talking to the troops now as minister of war is their disciple.


So America can put other countries' leaders on trial - like the Nazis in Nuremberg, or Saddam Hussein - but not their own, for war crimes.

I had the impression that PEG and Earley/GLR all fully solved the parsing problem, but in different ways. But then recently, I found this guy's blog: https://www.oilshell.org/blog/tags.html?tag=parsing#parsing

Now I don't know what to think. The author's got a ton more experience than me. It seems there's a big enough market out there for people wanting non-ambiguity proofs and linear running-time proofs.

Then again, the more I think about parsing, the more I think it's a completely made-up problem. I'm pretty sure there's a middle ground between Lisp (or even worse, Forth) and Python. Fancy parsing has robbed us of the possibilities of metaprogramming and instead opened up a market for Stephen Wolfram, whose product features a homo-iconic language.

I've been gorging on Formal Language Theory literature recently. I am now fully convinced that Regular Languages are a very good idea: They are precisely the string-search problems that can be solved in constant space. If you had to find occurrences of certain patterns in a massive piece of text, you would naturally try to keep the memory usage of your search program independent of the text's size. But the theory of Regular Languages only actually gets off the ground when you wonder if Regular Languages are closed under concatenation or not. It turns out that they are closed under concatenation, but this requires representing them as Non-Deterministic Finite-State Automata - which leads to the Kleene Star operation and then Regular Expressions. This is a non-obvious piece of theory which solves a well-formulated problem. So now I suspect that if history were to repeat again, Regular Expressions would still have been invented and used for the same reasons.

By contrast, I find Context-Free Grammars much more dubious, and LR almost offensive? The problem with LR is I can't find a description of what it is that isn't just a gory description of how it works. And furthermore, it doesn't appear to relate to anything other than parsing. There's no description anywhere of how any of its ideas could be used in any other area.


The issue with Regex for parsing is it can't handle balanced parentheses. https://en.wikipedia.org/wiki/Regular_expression. More generally, they can't handle nested structure. Context free grammars are the most natural extension that can. It adds a substitution operator to Regex that makes it powerful enough to recognize nested structure. So, Regex would be reinvented if history was rerun, but so would Context Free Grammars. Part of the complexity in parsing is attaching semantic meaning to the parse. Regex mostly avoids this by not caring how a string matches, just if it matches or not.

Now, I do agree that LR grammars are messy. Nowadays, they have mostly fallen from favor. Instead, people use simpler parsers that work for the restricted grammars actual programming languages have.

IIRC there is some research into formalizing the type of unambiguous grammar that always uses () or [] as nesting elements, but can use Regex for lexing.


I understand what a CFG is and why Dyck's language (matching parens) is not a regular language. My point was that CFG/CFL is less motivated by a reasonable and uniquely characterising constraint - such as making memory usage independent of the size of an input string - than regex is.

Then again, you are right that CFGs are very natural. And they do admit a few easy O(n^3) parsing algorithms, like Earley and CYK.

I think your last sentence relates to Visible Pushdown Grammars. See also Operator Precedence Grammars.


I'm not sure, but isn't 2 standard deviations a bit low? Especially so for something that can be done in a lab. It seems that 2 SD is the minimum threshold for getting published. Can we be sure that these are properly reviewed?

Could it be possible that you confused the number of standard deviations one needs to falsify something? For instance, if two things are different we may want to be as many SD as we can apart. Here, on the other hand, the data agree _within_ 2S D.

That was the limit of just one experimental approach that was peer reviewed and published in a major journal. As you can see there are many experiments validating the limit and none invalidating it.

The reality is that the Landauer limit is vanishingly small. I would encourage you to review the experiment methodology and see if you can come up with better, fundable methods.


I think I arrived at the same suspicion independently -- it was when I was trying to understand thermodynamic entropy as an instance of Shannon entropy - where the latter is defined abstractly as a property of probability distributions - which left me wondering about where the thermodynamic probabilities came from. I was wondering whether they were supposed to be subjective probabilities, or derived from ensembles. Then I recalled that entropy was originally defined non-probabilistically as dS = (1/T)δQ. Then I started reading about Boltzmann distributions as a bridge between Shannon's entropy and entropy in the earlier sense (Clausius entropy). I then concluded that instead of thinking about bits and bytes, it was much easier to think about gases and machines doing work, like a 19th century engin-eer building, er, engines.

I am pretty ignorant of this field.


The effect has since been experimentally demonstrated.

It might be possible to use a randomised algorithm to estimate the number of matches in only linear time.

if a boycott is a virtue signal then your comment is a vice signal

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: