For the local part, it requires [\w-\.]+, which excludes many valid characters like everyone’s favourite, +.
For the domain part, it tries to allow IPv4 addresses as well as normal domain labels (not IPv6 addresses, though), but it ends up tangling it up in a way that a human never would, allowing things like [12.34.56.com], [987.654.321.000, example.com] and example.123], while disallowing things like example.studio (the last label only allowing 2–4 letters) and IDN TLDs (which start with xn-- and must allow hyphen and numbers, not just [a-zA-Z]).
The author makes no comment on how hideously bad it is, which makes me suspect he didn’t notice, which… yeah, shows the problems of the whole thing.
When i made the video I didn't really notice the code part of regex, since I am really new to regex but in the conclusion part of my video I did mention that most of the code is not efficient
This is exactly the problem. The regex issue isn’t that it’s not efficient, it’s that it’s wrong. Using this tool to generate code in a problem area you are not qualified to double-check and validate yourself is dangerous.
> Using this tool to generate code in a problem area you are not qualified to double-check and validate yourself is dangerous.
I would like this message to be amplified as much as possible. Never write code you do not understand. I am excited about copilot, but also wary of the programming culture these tools will bring in. Businesses, especially body-shopping companies will want to deliver as much using tools in this category and end up shipping code with disastrous edge cases.
Zing! But well, depends on the algorithm. Some aren't that complicated to understand, like linear regression. Others, like DNN are basically impossible. But with ML you're at least always testing the code you don't understand in the process of training the parameters. That's better than the minimum effort when using copilot code. And many will just make that minimum effort and release untested code they don't understand.
Well, I think this overestimates people outside the HN echochamber again. Most senior ML people we see in big corps have no clue what they are doing: they just fiddle with knobs until it works. They would not be able to explain anything: copy code/model, change parameters and train until convergence, test for overfitting. When automl was coming a bit I hoped they would be fired (as I do not think they are doing useful work) but nope: they have trouble hiring more of them.
No. You could use copilot to generate code you do understand and double check it before committing. It’s similar to just copying and pasting from stack overflow.
I think there are a scary amount of programmers (this is their job and they get hired: often they are already seniors) who cannot explain what they copied or even wrote themselves. I have asked guys with 5 years or more job experience why they wrote something they wrote and I get 'because it works'. Sometimes I have trouble seeing why it works and usually it means there are indeed those disastrous edge cases. Copilot will make this worse and has the potential to make it far worse.
2. generating code in a programming idiom with which you are unfamiliar - which regex as a DSL is also a pretty classic example. I don't think most programmers are good at regex, I know I'm definitely in the 'now you have two problems' in the regex camp.
So to summarize it generated code written in a way the programmer could not understand what it even claimed to be doing, using a technology that many programmers are not especially good at; and it generated code that did not handle the problem domain correctly, and the problem domain is one that most programmers don't actually know that well either.
the more I think of this thing the more disastrous it seems.
aye - the biggest risk here is that rather than pulling in some standard lib for validating email addresses an engineer may use the copilot suggestion and validate it against a few simple test cases.
Curiously an attacker could probe services for use of the invalid suggestions that copilot generates....
This seems a bit overblown. If someone’s using GitHub Copilot to write code in a place it could be dangerous without any kind of quality control, then the odd wrong regex flag is the least of their problems.
ML applications face a frightening problem of diminishing returns on investment. The first prototype often happens in days or weeks, the next iteration months, after that years.
It's more analogous to clearing a foundation for a house by progressively picking up the boulders, then the rocks, then the grains of sand one at a time.
I don't think it's possible for copilot to improve on this problem. It doesn't actually understand the code, it's just statistical models all the way down. There's no way for copilot to judge how good code is, only how frequently it's seen similar code. And frequency is not the same thing as quality.
Copilot acts like a search engine, you search, you find, then you judge. It was never the case with search engines that you could just copy some code you found without verifying it. Also, it has the same copyright problems as if you used Google to find the code.
Nice theory. Won’t work out in practice, because this produces code that will run, and it’s AI, so it must be good, right?
When you found code on the internet, it was presented in a context that let you make better judgement (e.g. on Stack Overflow this regular expression would have had a score of roughly −∞ and multiple highly-voted comments saying “do not use this, it’s catastrophically bad”), and where you have to put in more effort to plug it in and shuffle things around a bit as well. With Copilot, you get given ready-to-go code without any sanity checking at all.
See even how, a few seconds later in the video, the author does test it out—but not thoroughly enough.
I think Copilot should report the matching source URL to allow the user to visit the page and see the context and license. This move would also placate some copyright questions because it would be like searching StackOverflow or Github for inspiration.
The problem of content attribution (exact and fuzzy match) has been studied before under the task of plagiarism detection for student essays. Funny thing is that a plagiarism detection Copilot would also disclose past cases of copyright violation and cause attribution disputes because code sitting unchecked in various repos would suddenly become visible.
If you can't trace the source then it's transformative use. If it matches training data then it needs to report the source like a search engine and place all responsibility on the user.
And fuzzy code matching could be easily implemented by using the a model similar to CLIP (contrastive) to embed code snippets.
The judge is still out on the licensing issue. And given it can, at least sometimes, output verbatim copy paste, including comments, of well-known GPL code[0], well, let's just say the issue is not clear-cut.
Copilot's api and suggestions could easily (and maybe was?) implemented as a SBQA style model. Using a search engine to find promising examples/context followed by a transformer model to synthesize the final output.
Attribution would clearly be required in such a search derived model.
Regardless of the licence, does the produced code even quality for copywrite protection, or does it fall under fair use?
what licence, if any, is there for unique code generated by co-pilot etc etc etc.
its a great big ball of who knows, however I expect that noting your only getting snippets you would be highly unlikly to get code that dosent fall under the fair use provisions, that said IANAL
So if we classify AI as search.. and then claim fair use.. we can launder dirty viral code.. how could this go wrong?
But really, if thispersondoesnotexist is just a really good per-pixel search against a corpus of human faces where each “page” of result pixels is organized in a grid presented as a new image with its own metadata..
I mean I guess Google really was an AI company all along.
No, it doesn't. If my understanding of it is correct, it's an autoencoder, then a few more bits of AI. The MINST dataset is a collection of hand written digits used in many early machine learning classes. Usually they are used to train a classifier, which returns the correct digit given an image. They can also be used to train an autoencoder, which will take an image in, compress it down to far fewer channels, and put out an image that quite closely matches the original.
Once you have an autoencoder, it is easer to input data, and train a neural network to do something with the compressed output. There is no way the autoencoder knows which samples were used to generate the resulting output, it's just optimized at compression.
Thus, Copilot isn't search. You could take the entire corpus it was trained on, and log all the compressed outputs. You could then take a given output before the autoencoder expands it back out, tell which few source code fragments were closest, but there are no guarantees.
TLDR; A far closer analogy: Copilot acts like a Comedian who has stolen a lot of jokes, and can't even remember where they came from.
Search engines give you a link where you can (usually) see the code in context, who wrote it, when, license, etc. And often more, like who is using it where, how often it's updated, contact info, test suites, and so on.
Is this true? From what I remember reading, the code was uniquely created (but I could be wrong). If that's the case, then does it tel you what license the generated code is under?
"Email address validation cannot be solved adequately with a regex" is something we need to start teaching somewhere. The RFC spec for emails is just way to permissive to make validating email addresses a winning move.
I swear I wind up having a battle over email validation at every company I go to. There is inevitably a business person that says "Well what about this site, they do it" and then I have to dig into whatever that site is actually doing and likely find a valid email address that breaks their validation to prove it.
And probably some junior dev (or senior who swears they did email validation flawlessly somewhere else and same story. I have to break their regex a bunch with valid emails they don't permit.
And of course then it's an uphill battle convincing them that what I'm using are in fact valid email addresses. Or you get the "Well no one ever actually does weird things in their email addresses so it's fine" or "gmail doesn't let me register that address so you're wrong"
I like the HTML spec’s version, used for <input type=email> validation: https://html.spec.whatwg.org/multipage/input.html#valid-e-ma.... It allows all realistic inputs except for IP addresses (of dubious realism) and email address internationalisation (internationalised domain names are supported, but the local part is still currently stuck with ASCII, which is in keeping with its still-quite-limited support, though https://github.com/whatwg/html/issues/4562 progresses in fits and starts).
That’s actually roughly len(arr) times, not three, since two of the calls are inside the loop.
But the far bigger red flag there is that that it doesn’t just use arr.reverse(), which does the same thing and is typically 8–10× as fast in some simple testing (assuming a list), or arr[::-1], which makes a shallow copy rather than modifying the object in-place.
This matches what I’ve been seeing in code examples: Copilot likes to implement things from scratch rather than using libraries or even standard library functionality.
It’s possible that the word “array” tripped it up here and that it would have done something saner had it been told “list”, but I doubt it. (Python’s built-in array module is very seldom used; if you talk of arrays, you’re probably dealing with something like numpy’s arrays instead. But it’s far more likely that the built-in list type was what was desired here.)
There’s also one other significant point of bad and dangerous style in the code generated: the reverse function mutates its argument and returns it. Outside of fluent APIs (an uncommon pattern in Python, and not in use here), this is generally considered a bad idea in most languages, Python certainly included. It should either mutate its argument and return None, or not mutate its argument and return a new list.
> That’s actually roughly len(arr) times, not three, since two of the calls are inside the loop.
You're right! I noticed it a few minutes ago and changed the wording accordingly. Thanks for pointing out how shockingly bad that algorithm actually is! :D
> (...) the reverse function mutates its argument and returns it.
Yeah, that mutate + return is confusing. It's also worth noting that, as a result of the mutation, the function doesn't work on immutable types like strings and tuples.
It's not shockingly bad to call len(arr) each time through the loop. It would be if we were talking about something like strlen() in C, but in Python it's just one more constant-time† operation each time through the loop, and not a very expensive one at that. Caching it in a local variable would still be better.
______
† Nothing is really constant-time in CPython, but it's pretty close.
Well, to be fair, the `len` operation on lists in Python is a constant time operation. What makes the example particularly bad is using a cast on the result of a floating-point division, rather than just using Python 3 integer division (i.e. the `//` operator). Copilot was clearly just spitting out Python 2 code here.
I think the difference is that redundant re-calculation is a code smell in any language, at all times, whereas the float/int division issue requires knowledge of Python 2/3 syntax quirks.
It's context-specific whether recalculation is a code smell. I've definitely gotten feedback from senior developers to just do len(x) repeatedly instead of "littering" (their words) the code with `xCount=len(x)` assignments (because they knew len was constant-time).
Sure, it makes sense if you find something like `len_x` or `x_count` significantly less readable than `len(x)` and if you don't have to worry about resources. Can't say I've been there though.
I did a quick test of this `reverse` function (which probably shouldn't exist in the first place) and, unsurprisingly, it became ~30% faster when `len(arr)` was only called once.
When I'm that resource constrained, I don't use Python. The changes necessary to make regular Python code performant defeat the goal of making it readable.
There's a reason NumPy's innards aren't Python code.
Both sides of this "readability vs. performance" debate can be argued ad absurdum but that's not my intention. All I know is that I try to conserve resources no matter what level of the stack I'm working on and those generated snippets certainly don't!
Indeed, but they're not intended to. Nothing about Copilot's specs even claims it's generating optimally-performant code. And the Python example indicates why that may not even be a desirable output (most performant and most readable are orthogonal metrics).
> It generates a nastily complex regular expression that is hopelessly wrong.
> [...]
> The author makes no comment on how hideously bad it is[...]
I mean, it's coming up with a solution that's about as good as the average programmer who's going to validate E-Mail with regexes would, so as a crowd-sourced machine learning solution it's not too bad if you think about it.
In other words, Having a co-pilot doesn't mean you're guaranteed to get Chuck Yeager.
Here’s the difference imo, an average programmer with some experience would seek out well tested and used library to help them do something like this, not use sausage meet spat out of a cannon to validate email addresses.
I’d hope so, but then again, this is what their ML model spat out, which suggests people have written stuff like this, though hopefully not the wonkiness around the bit after the last dot.
But on the brighter side, the material the user provided to Copilot in this case was pretty much “I want to implement email validation from scratch” rather than “I want to validate an email address”, which is where hopefully people would look more to existing libraries. And they’ll commonly already such libraries or functions in their code base, e.g. under Django you’d use… uh oh, searching found https://stackoverflow.com/q/3217682/ first which looks frighteningly familiar here in half of the errors it contains; but anyway, you should use https://docs.djangoproject.com/en/3.2/ref/validators/#emailv.... I suspect the Copilot approach as used will be unintentionally biased much more towards boilerplate and implementing things from scratch, rather than using libraries.
>about as good as the average programmer who's going to validate E-Mail with regexes would
IMHO, the average programmer is not even aware of regexs, which is a problem, but here would lead the programmer to a simpler to read solution.
You need to be at a very particular point (good enough to be proficient with regexs, but bad enough to use them for everything and bad enough to not test your regex), and I strongly doubt that's the average.
P.S. The video question was to validate emails, not 'validate using regex'.
This has been discussed at length several times before and the answer to why you use a regex like that to validate emails is because you aren't trying to validate against a standard, but against a subset of email formats. You want a "normal" simple email. No "+" domains etc. Because it doesn't matter if you annoy the 0.01% of your users that would be negatively affected by that, it's better to have their simple/canonical emails.
On the right side of the @ I agree completely, e.g. you must allow longer tlds. The regex is shit. But the same "simplification" thing would apply for IPv4: I'd probably want to have 4 groups of {0-9} even if a valid ipv4 address could be written in a lot more creative ways than that. The normal/simple/canonical way to write the address is a smaller scope than the set of allowed ways.
The regex to parse any valid email and the regex to parse the info I want, (perhaps from the user subset I want!) can be very different.
Edit: don't shoow the messenger - there is just zero chance you want a db that is more likely to contain user errors, has less valuable emails in it.
"Enter your email" doesn't mean "Enter a string that can be considered valid according to the RFC"!
No one cares whether "foo/baz=frob@example.com" is a valid email adress or not. In some cases you want RFC-compliant addresses, in which case it's a perfectly good idea to parse strictly to the spec. But in most cases you want a user identifier you know you can also contact with 100% certainty using some email SaaS. Or one you can cross reference to some other source. That's stricly a different purpose than parsing RFC compliant addresses. And allowing "foo@bar"@baz.com is just not a good idea.
The point being missed here is that no algorithm can tell you whether a string is a valid email address, because that's not a property of the string in the first place, it's a property of the world.
Having a fairly low entropy Gmail address, I get an intermittent drizzle of messages of the form of 'thank you for signing up to Acme!' whose content is such as to make it clear that an actual Acme customer typo'd their email address, and Acme thought you could validate it by checking the form of the string.
The only way to validate an email address is to send email to that address, asking the person behind it, are you the one who just signed up for Acme. And once you are doing that, there is no point checking the string for anything other than containing an @.
> The point being missed here is that no algorithm can tell you whether a string is a valid email address, because that's not a property of the string in the first place, it's a property of the world
"Valid" can mean at least 3 different things:
a) Conformant to a spec
b) Can actually receive email
c) Looks like a nice, simple "canonical" standard email address.
If you validate to the RFC (a) you still might fail b) and c). (The value of c is debated at length in a separate subthread but let'sjust say that there are more or less shady reasons why this is often a business goal).
Since you'll probably validate b) anyway - the validation of either b) or c) is a convenience, because validating b) isn't instant. So you validate to prevent errors and frustration. The question is merely: do I as a business want to have an address with quotes, spaces and backslashes in it, in my database just because it's possible according to the specification?
> And once you are doing that, there is no point checking the string for anything other than containing an @.
I think there is a legitmate case for a service to simply think "I'd rather lose the business of 1 customer out of a million than worry about backslashes in email addresses". It's not user friendly, and it's not "correct", but it's one of those "good enough" scenaroios.
And what does this solve? A user mistakenly entering an extra plus sign gets a validation error. While user mistakenly entering an extra alphanumeric character (far more common) does not.
Maybe they could ban adresses with `q` as well. Most people's names don't contain Q, so might be a user error.
It depends on the use case, but for some business use cases you e.g prefer access to the end users default/canonical inbox and not a specific one the user can use to filter etc. It’s also much less likely to cause downstream problems like distribution issues, be rejection by spam filters and so on. Annoying a tiny fraction of users or losing their business just isn’t a big enough issue to matter.
The only use case you're describing is being able to send spam or sell the user's information to third-parties while preventing the user from identifying that, or from filtering e-mails coming from your service.
Being able to notify users, limit erroneously input emails, or cross reference to existing databases (which invariably have the same kind of validation) are legitimate use cases.
Even if I have no interest in selling emails, it's still a net benefit if leaked data (e.g. after a breach) isn't full of bob.smith+mycompany@... rather than bob.smith@
BS excuse. That's what email confirmation, confirmation link, smtp inbox validation, etc, are for.
> cross reference to existing databases
Not being able to be cross referenced is a feature for the user, not a bug.
> Even if I have no interest in selling emails, it's still a net benefit if leaked data (e.g. after a breach) isn't full of bob.smith+mycompany@... rather than bob.smith@
It's benefit for the company, not for the user. I'd prefer to know which company leaked my email.
Yes. Absolutely 100% agree. What I'm arguing is: if you are ready to annoy a tiny fraction of your users, you will get away with a simpler validation, that is better FOR YOU AS A COMPANY, because it has some benefits ranging from shady to just half-shady. This is why companies do this. Not just a small share of them, and not only because developers didn't understand the RFC.
I'm not arguing this is in any way good for end users. I'm saying it can be a good idea despite being horrible towards some users.
You keep arguing from the users' perspective when I'm saying "This is being an asshat to users, but it's worth it." The argument "That's bad for users!" isn't a counterargument to that
Sure. But my whole point is that all the other reasons are not legitimate, the whole point of blocking + is to cover the company's ass, there's no legitimate potential issues with notification, or even to prevent errors. It's just BS rationalisation.
One problem is the lack of context: unlike Stackoverflow, you don't get additional infos from other users who tried the same code.
The only feedback Copilot receives is whether you keep it or not, you can't tell it a few days later that it wasn't a good fit after all (whereas you can comment on a Stackoverflow answer).
In its current form, it amplifies bias whereas code needs accuracy.
A fun issue I keep hitting with Github Copilot in Python is that it's a coin flip whether it will give me a Python 3-style print statement or a Python 2.7 style print statement.
Given that no one should be _actively_ writing 2.7 code, and that GH is trying to monetize this, I would think they could retrain the main model that excludes 2.7 code, and then allow people who need 2.7 to "pay for supporting 2.7" just like I'm guessing folks stuck on old platforms always do
You raise a fascinating point about older platforms of other languages, too; Java has a super backward compat story, but woe be unto the coder who tries to name a variable "enum" nowadays
This seems to be another side of the problem that it gives code that doesn't actually compile for the target language. They should have a check for that, that should be the baseline.
For me, it sounds like it's closer to dumb copy paste than a smart code generator. AlphaZero wouldn't play a chess move that was against the rules.
> Developers with experience can definitely handle this out but what if a newbie directly starts with the help of AI, he will spend more time on stack overflow than writing actual code. (oh wait that's how most of the developers are lmao)
Spontaneous Ask HN: How much truth is there in this "devs be copy-pasting from SO all day" trope?
Personally, I have used SO quite heavily in its early years, circa 2010-2014, including posting my own questions and sometimes posting answers to others' questions. But now, I don't use it actively anymore. Sure, when I search for a concrete question and SO happens to be in the search results, it's sometimes a valuable resource. But it's not the go-to for programming questions that it once was for me.
I'm honestly not sure if that's indicative of my own growth as a developer, or caused by outside factors. I have a vague feeling that developer documentation in general got better in the last decade, at least for the technologies that I'm using... but then again it could be also a sign of personal growth that I'm more comfortable with the upstream documentation. Finding answers for webdev questions on MDN is another sort of game than finding answers for webdev questions on SO, after all. What do you all think?
Based on my own experience, I remembered back in the day when I first started programming, in PHP, I copied many code from other commenters on PHP.net a lot (their online manual allows others to post code underneath the main docs). So I expect some new programmers to do the same but with StackOverflow.
Of course, I stopped doing that once I got hold on the real knowledge. In fact, now I look back the code that I copied from the others, it's just like watching a horror movie, and I rather rewrite the whole thing in my own term.
I guess the fairer statement to put in is "some programmer copy those code to 'get started'".
Until a court decides what it does and doesn't do I'm staying clear
Here's it outputting Quake code, including handy comments it came up with for each line and even an entire line of commented out code. Maybe it decided it was a good choice to comment it out but still include it I guess
Being word for word from the original is just a weird coincidence too
I truly wanted it to be as good as it was sold to us too, but it isn't
Ah I see. I think my perception of your point was altered by you having been downvoted by someone making the text grey (can't read text = bad comment). Sorry about that.
I'll try to avoid letting downvotes bias me in future
Well, my understanding is that your point is that it's not useful and only regurgitates code it already saw. That's super false, it's useful tool that generates code in a very context sensitive manner. Not always perfect, and in alpha, but still very useful.
The full copying only results from people actively probing the model to output copies of the code, and they made it work for very famous code that's been copied around github a bunch of times. That doesn't make it a non-useful piece of software.
> that generates code in a very context sensitive manner
That's why I compared it to a markov chain aye
In any case I've come to the conclusion my hater attitude is fueled by disappointment, so maybe it was the worst take from this whole thing. I'll avoid future copilot threads
It's a bit harsh to make sweeping statements along the lines of 'it's just a fancy markov bot' based on a few well-publicised glitches in a technical preview.
I assume you have built something surpassing the scope and ambition of co-pilot before, not just some armchair tech lead throwing shade.
Fair point on the sweeping statement, I'll rein it in
Thinking about it the main factor is an emotional one. I'm disappointed. Butthurt if you will. It sounded great but it tripped over so far away from the finish line that I've turned against it. I'll excuse myself from any further copilot threads
Of course I can never meet the requirements of your post wanting something more impressive than Copilot. Copilot itself falls far short of that. Nothing I give you will be enough as there'll be flaws you will attack to make your point. Bit of a time sink that, lets just assume you're right :)
No, I've never successfully built anything as ambitious as, and definitely nothing surpassing, copilot. Have I tried? Absolutely. Have I failed? So far yes.
Of course by that logic though I still win this conversation if you've not succeeded in making anything more ambitious than my failed projects, is that correct? :P
For the sake of not wanting to come off as blagging (also I want the holes poked in this one tbf) my most ambitious project I've not figured out how to make work yet is a new (afaik) type of business model: cohan.me/profit-share
Not being perfect would be acceptable, but at this point it looks like something with the potential of setting back software quality and security [1], with the added bonus of also breaking copyright/licensing.
[1] I understand that everyone should review code before committing, but even if me and my team do it, there's no way all the proprietary and open-source software I use has teams doing the same. That's why I worry for my security.
There are two types of products: the ones that people complain about and the ones that nobody uses or cares about; based on a famous Bjarne Stroustrup quote.
The fact that people post about it, point out flaws, leave GH out of spite, etc. only goes to show how impactful a system like this can be. It's a big deal, which is why people talk about it.
So far the most WTF thing I've gotten out of it is:
import base64
test = base64.b64decode("""SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t""".encode())
print(test)
# b"I'm killing your brain like a poisonous mushroom"
And the most odd thing:
# The base URL for all API requests
base_url = 'https://api.gdax.com/'
# The base URL for all non-API requests (e.g. static content)
base_url_static = 'https://static.gdax.com/'
Which are URLs that haven't been a thing for 2 years, I think and I can't find any code in github that uses them still.
Controlling the generation for code quality will be extremely hard.
The only thing I see is that they could filter their dataset so that some bad proxy of code quality is taken into account, something like the number of stars (which is clearly a terrible metric, tell me if you think of something else).
The idea would maybe start by training on all of the subset of Github it is ethical and legal to train on, and then filter down to higher code quality towards the end.
Controlling for the time at which the code is emitted would be easier. Something like, retrieving similar contexts, and guiding the model to be more similar to the recent code if there is similar recent code that exists. I'm not sure exactly of how this would be done, but I can see it working.
Being credited for your work is a form of compensation. Nearly all open source licenses require proper attribution. Github is just violating the open source licenses of nearly all people that posted their code to Github itself, which is a huge breach of trust.
I accept google's and Facebooks terms, and I still don't expect to be tracked across the internet to pay for the services they give me. Basically: my expectation is that companies aren't behaving as badly as their terms allow them to.
Huge leap of logic to assume what license another user publishes their code under. A charitable reading here of chovybizzass's comment leads me to the understanding that they are publishing their code under a license that requires compensation if used, this code can be published to GitHub as well. Would it be stupid? Probably. But even more stupid would be to produce MIT-licensed code and then complain about not being compensated.
So does querying Google or Stackoverflow. Looking for “inspiration” should guide you, but you should always be skeptical and not let your brain fall out
I'm all for moving forward, but Copilot just seems to be a bad idea, amplified.
What I want is careful, thoughtful, knowledgeable people, who have learned and honed their skills over years in various areas and can come up with creative, maintainable solutions to complex problems. This is not something you autocomplete. If it would be, we could autocomplete 80% of all jobs tomorrow.
I don't want code monkeys on steroids. But maybe I'm to far off Silicon Valley.
Does typescript really protect one from javascript though?
I imagine it is fine. If it was just myself developing I would be fine, but the insufferable junior devs that learned "=== or die" rear their ugly heads.
Depends how much you care about the equality and how strict. by default == will do type coercion so you can get weird results where things like [] == 0 are true.
If someone doesn't understand this then here's why.
You need to learn the Six Falsey Things In JavaScript. Everything else when cast to boolean will be true. These six things are What and Why and When And How And Where and Who. Wait, no, that's a totally different list of six... anyways
false
undefined
null
NaN
0
"" (empty string)
That's it. Since '0' is neither when the negate operator casts it to boolean it'll become true. And then negating it becomes false. When doing a comparison between '0' and false, the standard says https://262.ecma-international.org/5.1/#sec-11.9.3
> If Type(y) is Boolean, return the result of the comparison x == ToNumber(y).
ToNumber(false) is of course 0. So you are running '0' == 0 which is visibly true...
It generates a nastily complex regular expression that is hopelessly wrong. Visible at https://www.youtube.com/watch?v=9Pw-Roo_duE&t=404, here transcribed:
For the local part, it requires [\w-\.]+, which excludes many valid characters like everyone’s favourite, +.For the domain part, it tries to allow IPv4 addresses as well as normal domain labels (not IPv6 addresses, though), but it ends up tangling it up in a way that a human never would, allowing things like [12.34.56.com], [987.654.321.000, example.com] and example.123], while disallowing things like example.studio (the last label only allowing 2–4 letters) and IDN TLDs (which start with xn-- and must allow hyphen and numbers, not just [a-zA-Z]).
The author makes no comment on how hideously bad it is, which makes me suspect he didn’t notice, which… yeah, shows the problems of the whole thing.