Hacker Newsnew | past | comments | ask | show | jobs | submit | danielparks's commentslogin

Similarly, I used to stress about loading the dishwasher when I was a teen. I would spend so much time loading it that I have myself a neck ache from leaning over and I could have saved time by washing the dishes by hand.

I still try to be somewhat efficient about loading the dishwasher, but… if I notice myself stressing I just say “screw it”, run it, and wash the rest by hand.

The other thing I’ve realized is that sometimes things don’t get clean if you load them properly. For example, tall glasses that had smoothies in them. It’s a little gross if you don’t notice it until you’re about to use it, but… you can just look at them and wash them by hand when you unload the dishwasher.

I guess this is all to say that sometimes the best optimization is to not think about it too much.


This is a matter of operator precedence and tokenization. Tokens are single characters in this language, and there is an invisible operator between them.

If the operator were explicit (let’s call it ~), the example would look like this:

    $ echo 'cat' | trre 'c:d~a:o~t:g'
    dog
With unnecessary parentheses:

    $ echo 'cat' | trre '(c:d)~(a:o)~(t:g)'
    dog


That's true. Thank you for elaborating.

There is a hidden operator of concatenation as for usual regular expressions. In the code I denote it as lower dot '.' (as in the old Thompson's implementation).


Cool, I’m interested to see where you go with this.

I found the operator precedence unnatural, and it looks like a lot of other folks in this thread did too. I would naturally assume `cat:dog` would be equivalent to `(cat):(dog)` rather than `ca(t:d)og`.


This is a very interesting idea for a few reasons.

> I would naturally assume `cat:dog` would be equivalent to `(cat):(dog)` rather than `ca(t:d)og`

It was confusing to me too until I remembered that we all kind of use regexes sort of wrong. They're "really" supposed to be considered as generators and not matchers. So IIR cat|dog as a "regular expression" (not a regex) is supposed to formaly expand to

{catog,cadog}

For matching, this set of strings can then be substring matched against some larger text.

The problem is that almost no regex matching engine actually does this, and so now they'll do all kinds of strange things either to meet our expectations, or for efficiency or something.

If you go and try a bunch of different regex tools you'll get variations that either service (cat)|(dog) or (cat)|(dog)|(ca[td]og) or something else.

So from a more formal conceptualization I think cat:dog should produce ca(t:d)og not (cat):(dog). But our experience with "regex" tools has subverted that formalization and now everybody just puts parens around expressions they want to alternate.

My real minor issue with this proposal, as interesting and well thought out as it is, is that it feels like it's just trying to get back at regular expressions as generators, which they actually are and it's coming from a place on the other side of a few decades of how we've been abusing them as regexes for user expectations. In other words, the problem is the tooling, not the syntax.

source: I've worked adjacent to this space in the past and if you've never thought of regexes as string set generators you can toy with the idea here

https://onlinestringtools.com/generate-string-from-regex

but again, how these generator tools works is also very specific. The ones I used to work with had a variety of ways to specify constraints on closures and such to restrict the generators.


There’s no reason to say that “ca(t:d)og” is a “more correct” parsing than “(cat):(dog)”. You did hit the nail on the head insofar as that you realized that we as programmers have built strong habits and make assumptions on the basis of those habits. But you didn’t take it to the logical conclusion and didn’t realize that having a text-based syntax to represent regexes is also such a habit/assumption.

In pure theoretical computer science, regular expressions exist as an abstract concept independent from syntax or parsers. They are an “algebra”, which means they are composed of elements connected with operators, but they are not inherently tied to a syntax. In the most fundamental formulation of regular expressions (the one in the Chomsky hierarchy), the only operators are alteration (which modern syntaxes express as “|”), the Kleene star (“*”) and — notably — concatenation, which modern syntaxes simply omit, in a way comparable to how modern mathematics notation omits the multiplication operator when you write “2x”.

In the same way that maths needs rules to define whether “2x²” means “(2x)²” or “2(x²)”, regex syntax needs such rules too. This is called operator precedence. I’m sure you’ve heard that before, but you just might not have realized that the regular expression “ab” has an operator in it because it is typically not written.

Now I’m not going to argue that the operator precedence in maths notation is haphazard or without reason — but it is arbirary. It was arbitrarily chosen to be the most useful to mathematicians using the notation. And it turns out that giving exponentiation higher precedence than (invisible) multiplication (meaning: “2x²” means “2(x²)” rather than “(2x)²”) is more useful.

So coming back to the original example, whether “cat:dog” means “ca(t:d)og” or “(cat):(dog)” is simply a matter of defining the precedence of the “:” operator relative to the concatenation operator. You can argue (and I would agree with you) that one is more useful than the other, and therefore preferable (in the same way that “(cat)|(dog)” is more useful than “ca(t|d)og”), but neither of them is more fundamentally correct or primal or, as you put it, “supposed to formally expand to”.


I agree with the point that precedence is arbitrary. The current version looks like this:

1 Escaped characters

2 []

3 ()

4 * + ? {m,n}

5 :

6 . (implicit concatenation)

7 |

I have some reasons to put it that way. I want : to be somewhat 'atomic'. If you think about '*' or '+' they can be lower in the table as well. Anyway, I will try to put : lower in the next version and see how it goes.


Thank you for the feedback. Yes, the precedence is a question for me. Maybe I will change this.

If I shift it behind concatenation there could be another problem. E.g. with non-associative : should be illegal. And I am not sure how to treat this:

cat:dog:mouse

In the current version I inject the epsilon (empty string). It looks natural E.g. to remove every second letter I could run '..:' which is technically '.(.:eps)':

echo 'abcde' | ./trre '..:'

result: 'ace'

actually ':' association could have a meaning as a composition of regular relations; but I found it too complicated for now.


I do not understand the rules by which you inject the epsilon and I think this is a source of confusion for many people. I had thought that an epsilon could be injected anywhere REGEX can appear (effectively allowing epsilon as a REGEX) but of course that just leads to infinite number of parses. Manually injecting epsilon is a highly hacky thing to do; better to consider that when you design the grammar.

I would not worry about "cat:dog:mouse" because intuitively it is clearly correct and it means replacing cat with mouse. With parentheses it could be written as "((cat:dog):mouse)".


Epsilon injection appears whenever right or left side of ':' has no operand. E.g.

(:a)

(a:)

a:|b

a|b:

etc

I will try to change the precedence and see how it works. Btw what do you think about explicit operators '>' '<' where '<' works as usual regex matcher, and '>' as a generator. For example to change 'cat' to 'dog' there could be something like '<cat>dog' where '<cat' part is a parser and '>dog' is a generator. Thanks.


I think your epsilon injection rule is trying to achieve this kind of production:

    TRRE <- TRRE ':' REGEX | ':' TRRE | TRRE ':' | REGEX | ...
I think this would work better, but ':a:' is still ambiguous: it has two parse trees.


Yeah. Similarly, for the range transformations, instead of `[a:A-z:Z]`, I would suggest `[a-z:A-Z]`; and instead of `[a:b-y:zz:a]`, something like `[a-y:b-z;z:a]`, perhaps.


I would suggest simply [a-z]:[A-Z], inspired by tr.

Then there is no syntactic special case. This is just EXPR:EXPR; the special case is that both EXPR are character class syntax, and so the tr-like range mapping applies.


[a-z] is equivalent to 'a|b|...|z' in the normal regex language.

So if we do [a-z]:[A-Z] it should be expanded to:

(a|b|...|z):(A|B|...|Z)

which is pretty legal in trre but has different meaning of mapping any a-z to ALL the A-Z (generating A-Z on each occurrence of lowercase letter).


[a-z] is a semantically equivalent regex to a|b|..|z, but the two are not equivalent syntactic forms.

Distinct syntactic forms can be given distinct semantics, as long as there is rhyme and reason.

Moreover, the right side of the colon is not the normal regex language, it only borrows its syntax. So there, we may be justified in denying that the usual equivalence holds between character class syntax and a disjunction of the symbols denoted by the class.


The right side is a normal regex language syntactically. Semantically it is a generator instead of a parser (consumer).

But I got your point. Maybe there could be some ways to do it in consistent way. Just straight tr-like syntax won't work, e.g I really want it something like this to be valid:

[a-b]:(x|y) (pairs a:x, b:x, a:y, b:y)

and I prefer not handle these in some ad-hoc way.


I also go your point. The right side is a regular expression because it denotes a regular set.


I just sent a feature request[1] to Signal with the following text:

    I understand that Signal does not consider this
    https://gist.github.com/hackermondev/45a3cdfa52246f1d1201c1e8cdef6117 to be
    a valid security bug, but it would be helpful to at least be able to
    mitigate it.

    Please add an option in settings to disable automatically downloading
    attachments.

    That should be enough to change the attack from 0-click (just opening the
    conversation) to 1-click (click the attachment). Most people won’t care
    about this, but for some every little bit of privacy is important.
[1]: https://support.signal.org/hc/en-us/requests/new


Hold on, someone else in this thread noted this does exist

" You can disable the auto-download. Settings > Data and storage > Media auto-download, you can choose what to auto download for mobile data/wifi/roaming."

So, that part is there, but my question is, it's still aissue when they manually download the image, right? Unless something never accepts images from someone they aren't expecting, who 's number or unique created ID has never been seen before


Oh, nice. I looked under Settings > Privacy and didn’t see anything. For me it was under Settings > Data Usage.

Yes, this still an issue if you manually download an attachment, but that’s a lot better than automatically when you open a conversation.


From the page:

> I decided it was finally time to build a file server to centralize my files and guard them against bit-rot. Although I would have preferred to use OpenBSD due to its straightforward configuration and sane defaults, I was surprised to find that none of the typical NAS filesystems were supported.

OpenBSD does not support ZFS.


Theo is, perhaps rightfully so, against importing what is effectively a paravirtualized Solaris kernel into the OpenBSD source code in order to run a file system.


Too sad, because the partitioning of OpenBSD is why i don't use it, with ZFS you could just do a dataset throw x^w,nosuid etc on them and give them a quota, with ffs one can bet that you run out of space (earlier or later), in one of the partitions (Workstation NOT Server).


You can use your own partitioning though? One for /, one for swap. Done.


Yes you can but you cant set stuff lime nosuid etc on /


I doubt it. Even for ports you can still symlink /usr/ports to $HOME/ports, for Scummvm with --enable-all-engines or Eduke32 (Build/GPLv2 license clash, can't be shared as a binary).

/usr/local is not small at all by default.


[flagged]


I think they're trying to say is you can just link stuff to $HOME if some filesystem runs out of space (not an endorsement of that view, just an explanation).


They also have a YouTube channel that adds images and video to the same audio track. A lot of it is just stock footage, but I like the images of ruins and the renders of what cities might have looked like in the past. It’s especially helpful when they’re discussing artwork.

https://youtube.com/c/FallofCivilizationsPodcast

(This particular episode is not out with images yet.)


Just so you know, the last two (or so) videos have had expensive custom images, video, reconstructions, illustrations, self shot on-location imagery, etc. He mentioned it in an intro awhile back. That's the reason for the long delay between recent videos.

The stock stuff earlier really adds value though. It's not some generic background pan and zoom. If it's in the background it's probably being activity talked about and being used as a visual aid. Having a map of an area or seeing the ruins as they are today or quotes from sou fe out of books etc really elevates the experience.

I honestly don't listen to the non-video ones they are such an essential element.


Awesome tip. Thank you.


Given the number of services that turned out to use plaintext or trivial password hashing (e.g. MD5), I would bet there are a bunch of services out there that do not effectively limit OTP attempts.

It’s been a long time since I did any work on a real authentication system — since before TOTP was common, anyway. I appreciated the post and found it interesting.


He posted a follow up about error handling in Zig that I thought was interesting: https://ayende.com/blog/194466-A/looking-into-odin-and-zig-m...

I’ve seen the “clean up errors yourself” argument before, but, like the author, I don’t think it holds water. Often the correct response to errors is to panic() or pass it up the stack so the caller can deal with it or—more likely—panic() itself.


I think it's nice to have two mechanisms for errors like in ML: sum types for errors that the calling function can realistically recover from, and exceptions for the rest.


You can do that in zig with @panic and even convert from sum type error to an irrecoverable exception: `catch @panic("oops");`


That's nice. Are the sum types checked by the compiler to ensure that you didn't miss any case?


Yes, and you can even handle some errors, and then capture an error whose type is the subset of unhandled errors, and return that, limiting the error set of the current function to not include the handled ones. Here are a couple examples: https://github.com/ziglang/zig/blob/7d0de54ad44832589379a4bc...

You get compile errors if you try to handle an impossible error, or don't include an `else` prong (in which case the compiler tells you the full set of unhandled errors).


That's a great feature. I'm constantly impressed by what Zig can do.


If I understand you correctly, then yes.

    fn canError(foo: bool) !void {
        if (foo) {
            return error.Foo;
        } else {
            return error.Bar;
        }
    }

    test "canError" {
        canError(true) catch |e| switch (e) {
            error.Foo => @panic("foo!"),
        };
    }
Running `zig test` on that file would give:

    ./tmp/errors.zig:10:30: error: error.Bar not handled in switch
        canError(true) catch |e| switch (e) {
                                 ^


You did understand it correctly. That's a really nice feature.


This is Rust's approach, too: realistically recoverable errors are handled with Result<T, E>, and other are handled through panics. (Panics are generally implemented as exceptions, i.e. they unwind the stack, but the compiler can also be configured to make them just abort.)


C has always had Asserts for that reason.


Is it idiomatic to assert things and recover from them the way people sometimes do with exceptions?


I agree with the author that errdefer seems like the winning feature of zig. Proper error handling is the most important part of any language or system.

Unfortunately most common languages does not prioritize this. As a contractor I have dug myself thru a lot of awful code in different languages and the most common denominator is improper error handling.


Off Topic:

>He posted a follow up about error handling......

When I was reading the original blog I was thinking if he had any follow up, so I decide to click on archive and the homepage. And this article, somehow doesn't show up in both list. As a matter of fact I couldn't even find how to get to this blog post without your direct linking. Then it turns out it is a "FUTURE POSTS".

How does that work and why? Is this suppose to be some sort of preview before it is officially published?


Author here - I typically just throw things into the post queue and they drop every day. In this case, I replied to a question and it turned into a full blown post.


>I typically just throw things into the post queue and they drop every day.

That is a great idea. Thank You I might start doing that as well.


Yeah, it’s weird. I tried stripping the key= parameter from the link when I commented, but it’s required.

The author posted the link in the comments of the original post.


I dont panic nearly as much as you it seems!

Handling Errors causes us developers so confusion, because we are so used to exceptions and how poorly they were conceived.

Errors should be called Failures. It's just when a function fails to so what it says it does. Nothing more, nothing special about it!

If you call fopen() and it doesn't find the file, that's a Failure! No reason to panic! Just go create the file!


Not that I doubt you or anything, but… would you video this and post it somewhere?

Seriously, I would love to have a quicker way to cook onions.


There’s more information, including a copy of the image, in the forum discussion. The forum seems a bit pokey for me at the moment, though.

https://forums.flyingmeat.com/t/memory-consumption-when-open...

Gus Mueller (the author of the linked tweet) is the author of the Acorn image editor.


Thanks for the link. The issue seems to be the IOAccelerator framework:

> I’ve narrowed it down quite a bit (and submitted it to Apple as FB9112835).

> What’s going on is that the IOAccelerator framework has some sort of massive leak in it, where it’s using up 35GB of ram, 25 of which is going to swap (which is why you’re seeing kernel_task flake out).

> On intel, the same image only uses 1480K from IOAccelerator.


Wow. This can explain some of excesive swap usage on M1.


Seems like a poor conclusion to draw with this little information.


People have reported their SSDs filling up (in terms of total writes) much faster on Apple Silicon machines. If IOAccelerator can leak like this then it would definitely explain it. 25GB swap for one image is absurd. Multiply that by a few months of usage. It may not be a smoking gun but it is a fingerprint in the pool of blood.


If this were the cause people would have noticed having 0% free memory way before their SSDs started dying.


That’s been debunked.


It hasn't been debunked if there's no source for the claim...


Apple said that the kernel interface used by smartctl is emitting invalid data, which invalidates all conclusions drawn from it, such as “there is/isn’t a problem with SSD wear”.


Can you provide a link please?

I compared smartctl output with activity monitor on disk writes and it's exactly the same number. You can do the same.


https://appleinsider.com/articles/21/02/23/questions-raised-...

> "While we're looking into the reports, know that the SMART data being reported to the third-party utility is incorrect, as it pertains to wear on our SSDs" said an AppleInsider source within Apple corporate not authorized to speak on behalf of the company. The source refused to elaborate any further on the matter when pressed for specifics.

We'll likely never hear anything else about this again from Apple officially or unofficially, so I don't expect anyone who believes there's an SSD wear issue to stop believing that there is. Either the combination of "smartmontools is emulating SMART access, but doesn't actually have it" and "a source at Apple said that smartmontools is incorrect" is enough to make this a non-issue, or it's not — and since most people who think that there is a wear issue don't realize the part about smartmontools faking that it has access to SMART data in this scenario (hint: nope!), I don't expect to find common ground.

So as far as I'm concerned, this is all irrelevant until someone's SSD wears out, and no one's reported that, so everyone is all tempest-in-a-teapot over some numbers that an open source tool is handcrafting from a macOS kernel API based on assumptions about Apple's proprietary hardware that are probably wrong. Wake me up when someone's SSD wears out.


That Apple comment is bullshit. We've confirmed that the TBW numbers from smartctl match the actual quantity of data written. You can also see the excess I/O in Apple's own Activity Monitor. The lifetime usage numbers smartctl reports are in line with what a high-end SSD would report, and there is no way for smartctl to "make up" the data. It's real data coming from the NVMe controller. There's no possible way to fake anything like that. That person is not authorized to speak for the company and is probably making stuff up.

Apple are aware of it, the bug is fixed in 11.4, and once the corresponding XNU source code drops I'll be happy to diff it and show you exactly what they changed in the swapper to fix it and debunk your "debunking".


Excellent! Glad to hear it.


Perhaps, but if you read the Twitter thread others are suggesting the same thing and people seem excited/happy that there is possibly a potential fix that may come from Gus discovering this.

So, maybe premature to get too hopeful but certainly not too soon to look in that direction?


People are suggesting the same thing because it sounds nice to be able to correlate them, but the evidence just does not exist yet. At the moment is seems somewhat likely that they are correlated at all, really.


And yet it's relatively performant, says a bit about the quality of components in apple hardware (SSD in this case)!


Note that IOAccelerator memory usage doesn’t necessarily mean a bug in IOAccelerator. If my memory is correct, when an app allocates buffers for hardware-accelerated graphics, that memory is attributed to IOAccelerator. So it’s still likely to be a bug in some system framework that’s allocating all these buffers (especially since we only see the issue on one platform) — but an application bug is still a possibility.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: