> Turns out we weren't opposed to bad metrics! We were just opposed to being measured! Given the chance to pick our own, we jumped straight to the same nonsense.
This seems like a distinction without a difference, unless there actually are any good metrics (which also requires them to be objectively and reliably quantifiable). I think most developers don't really want to measure themselves, it's just that pro-AI people think measurement is necessary to put forward a convincing argument that they've improved anything.
The only time metrics have been useful to me in the past is when they are kept private to each team, which is to say that I do think they are useful for measuring yourself, but not for others to measure you. Taken over time, they can eventual give you a really good idea of what you can deliver. Sandbag a bit (ie, undershoot that number), communicate that to ye olde stakeholders, and everybody's happy that you can actually do what you say you'll do without being stressed out (obviously this doesn't work in startups).
> search a document for a pattern and it takes a second. search one a hundred times larger and it doesn't take a hundred seconds - it can take almost three hours.
Most of this is about quadratic time find-all operations where a search operation is linear. But it's also still possible to get quadratic behaviour out of a single search without catastrophic backtracking, more easily than you might expect. In late January to early February, Tim Peters was talking about an example of this on the Python forums (see e.g. https://discuss.python.org/t/add-re-prefixmatch-deprecate-re...) and also related the experience of trying to diagnose the issue with AI (see https://discuss.python.org/t/claude-code-how-much-hype-how-m... and onward). Peters' example was:
\d+\s+
on a string containing only digits, a prefix match takes O(n) time as it considers every possible end position for the digit, and immediately sees no following whitespace. But the search is quadratic because it has to repeat that O(n) work at every position; the regex engine can't track the fact that it's already examined the string and found no whitespace, so it re-tries each digit match length.
(This is arguably "backtracking" since it tries the longest match first, but clearly not in a catastrophic way; if you use `\d+?` instead then of course it only searches forward but is still O(n). It actually is slower in my testing in the Python implementation; I don't exactly know why. As noted in the discussion, the possessive quantifier `\d++` is considerably faster, and of course doesn't backtrack, but still causes O(n^2) searching. The repeated attempts to match `\s+` aren't the problem; the problem is repeatedly looking for digits in places where digits were already found and rejected.)
The way to fix this proposed in the discussion is to use a negative lookbehind assertion before the digits: `(?<!\d)\d+\s+`. This way, the regex engine can bail out early when it's in the middle of a digit string; if the previous character was a digit, then either `\d+\s+` doesn't match here, or it would have matched there.
A simpler idea is to just search for `\d\s+`, or even `\d\s` — since these will be present if and only if `\d+\s+` is. This way, though, you still need to do extra work with the partial match to identify the start and end of the full match. My first idea was to use positive lookbehind for the digits, since the lookbehind match doesn't need to backtrack. In fact lookbehinds require a fixed-length pattern, so this is really just a more complicated way to do the `\d\s+` simplification.
----
> Hyperscan (and its fork Vectorscan) is a true linear-time all-matches regex engine. it achieves this by using "earliest match" semantics - reporting a match the moment the DFA enters a match state, instead of continuing to find the longest one.
Is this not just equivalent to forcing "reluctant" quantifiers (`\d+?`) everywhere?
with all-matches semantics it returns a significantly higher number of matches than leftmost greedy.
eg. /abc*/ and abccccc will return you matches at ab|c|c|c|c|c|
I think it's very common and ok that people reason about other engines in terms of backtracking but it works very differently. And fixed length lookbehinds are more of a Java/Python thing, other engines support all lookbehinds.
The main idea of linear regex and intuitive semantics is that it should be declarative and the engine does whatever is the fastest without you having to worry about it. Instead of describing character by character how to perform the search and where it can blow up, think of it as just a specification. Then you can truly express whatever is the shortest/most convenient to explain.
Something i'm still trying to figure out and perhaps failing to understand is what are the killer features of backtracking regex that you would really miss if you were to use linear regex? It would help me a lot to know, i'm trying to convince others to make the switch
Good to know. Although a lookbehind for `\d+` doesn't really gain anything over a lookbehind for `\d` anyway; they match in the same circumstances, just with different results.
If there's supposed to be a literal asterisk in there somewhere, you can escape it with a backslash. Right now two paragraphs are italic because of mismatched asterisks.
Thanks. There are no asterisks in the regexes; I had simply missed the closing asterisk on some intentional emphasis. (And then I also had to fix some escaping inserted by the system to try to correct for the actual problem.)
That's about when I joined, and all I really remember thinking was that it was cool that I could now share my repo publicly without having to try and run a server from a residential IP.
3150x2210 is sort of a normal resolution for retina displays. It's close to a native panel resolution on iOS, but they do this dumb fractional scaling thing because it was too hard to backport support for high DPI displays to MacOS X. Anyway, that resolution is so unreadable on current macos that they hide it behind a "show all resolutions" toggle. The default is 50% that (1/4 as many pixels), and they only let you go up to about 60-66% of native resolution unless you click the override.
So, the screenshot is probably a semi-upscaled image of a ~ 1920x1200 desktop.
By this logic, all malicious JavaScript (obvious example is cryptominers I guess, assuming no JS sandbox escape) is C&C, yeah? As it "instructs site visitors" to do something harmful locally?
To be clear, if I have JavaScript blocked for archive.today (which is my default with NoScript; and really there is no site functionality that really needs JS on the user's end), then I don't participate in the DDOS, right?
> Unless you have the signatures of foobinade and foobinadd memorized, you have no way to tell that f is a curried function and g is an actual result.
Yes, but the exact FP idea here is that this distinction is meaningless; that curried functions are "actual results". Or rather, you never have a result that isn't a function; `0` and `lambda: 0` (in Python syntax) are the same thing.
It does, of course, turn out that for many people this isn't a natural way of thinking about things.
> Yes, but the exact FP idea here is that this distinction is meaningless; that curried functions are "actual results".
Everyone knows that. At least everyone who would click a post titled "A case against currying." The article's author clearly knows that too.
That's not the point. The point is that this distinction is very meaningful in practice, as many functions are only meant to be used in one way. It's extremely rare that you need to (printf "%d %d" foo). The extra freedom provided by currying is useful, but it should be opt-in.
Just because two things are fundamentally equivalent, it doesn't mean it's useless to distinguish them. Mathematics is the art of giving the same name to different things; and engineering is the art of giving different names to the same thing depending on the context.
Not when a language embraces currying fully and then you find that it’s used all the fucking time.
It’s really simple as that: a language makes the currying syntax easy, and programmers use it all the time; a language disallows currying or makes the currying syntax unwieldy, and programmers avoid it.
But arguably your intent would be much more clear with something like `map (printf "%d %d" m _) ns` or a lambda.
I don't think parent is saying that partial application is bad, far from it. But to a reader it is valuable information whether it's partial or full application.
Not really when reading `iter (printf %"d %d" m) ns`, I am likely to read it in
three steps
- `iter`: this is a side-effect on a collection
- `(printf`: ok, this is just printing, I don't care about what is printed, let's skip to the `)`
- ns: ok, this is the collection being printed
Notice that having a lambda or a partial application `_` will only add noise here.
> But to a reader it is valuable information whether it's partial or full application.
This can be a valuable information in some context, but in a functional language, functions are values. Thus a "partial application" (in term of closure construction) might be better read as a full application because the main type of concern in the current context is a functional type.
Fine, it's a regular type. It's still not the type I think it is. If it's an Int -> Int when I think it's an Int, that's still a problem, no matter how much Int -> Int is an "actual result".
And the compiler immediately tells you that you are wrong: your type annotation does not unify with compiler’s inferred type.
And if you think this is verbose, well many traditional imperative languages like C have no type deduction and you will need to provide a type for every variable anyways.
I spent the last three years on the receiving end of mass quantities of code written by people who knew what they were writing but didn't do an adequate job of communicate it to readers who didn't already know everything.
What you say is true. And it works, if you're the author and are having trouble keeping it all straight. It doesn't work if the author didn't do it and you are the reader, though.
And that's the more common case, for two reasons. First, code is read more often than it's written. Second, when you're the author, you probably already have it in your head how many parameters foobinade takes when you call it, but when you're the reader, you have to go consult the definition to find out.
But if I was willing to do it, I could go through and annotate the variables like that, and have the compiler tell me everything I got wrong. It would be tedious, but I could do it.
Doesn’t that just imply that your tooling is inadequate? In LINQPad (and, I assume VS though I have done it in a while), when you hover over a “var” declaration a tooltip tells you the actual type the compiler inferred.
If 0 and a function that always returns 0 are the same thing, does that make `lambda: lambda: 0` also the same? I suppose it must do, otherwise `0` and `lambda: 0` were not truly the same.
In a non-strict language without side-effects, having a function with no arguments does not make sense. Haskell doesn't even let you do that.
You can write a function that takes a single throw-away argument (eg 0 vs \ () -> 0) and, while the two have some slight differences at runtime, they're so close in practice that you almost never write functions taking a () argument in Haskell. (Which is very different from OCaml!)
Yes, and that becomes more intuitive when you "un-curry" the nested lambdas into a single lamba with twice the number of arguments. The point is that the state of a constant does not depend whatsoever on the state of the (rest of the) world, how much ever of that state piles on.
Sure—but that’s a property of the inferred types moreso than the mere application syntax. It can be hard to revisit or understand the type of JS or unannotated Python expressions, too—but unlike those cases, the unknown-to-the-reader type of the Haskell code will always be known on the compiler/LSP side.
This seems like a distinction without a difference, unless there actually are any good metrics (which also requires them to be objectively and reliably quantifiable). I think most developers don't really want to measure themselves, it's just that pro-AI people think measurement is necessary to put forward a convincing argument that they've improved anything.
reply