Hacker News new | past | comments | ask | show | jobs | submit | boyter's comments login

Perhaps adding some guide on how to hook this up to... well anything would be good :)

There is a lack of guidance for https://github.com/mark3labs/mcp-go/ which this is using as well so while everything is there, its hard to know how to make it do anything.


Ignoring the open-source vs free software discussions that are bound to come about from this well said. Large companies exploiting developers and abuse towards the maintainers is probably my biggest bugbear when it comes to this.

In fact I have a similar post https://boyter.org/posts/the-three-f-s-of-open-source/ which I redirect people towards if they become aggressive towards me when I am trying to help them. Thankfully I have only had to use it a handful of times.


Crawling, incidentally, I think is the biggest issue with making a new search engine these days. Websites flat out refuse to support any crawler [other] than Google, and Cloudflare and other protection services and CDN's flat out deny access to incumbents. It is not a level playing field.

I wrote the above some time ago. I think its even more true today. Its practically impossible to crawl the way the bigger players do and with the increased focus on legislation in this area its going to lock out smaller teams even faster.

The old web is dead really. There really needs to be a move to more independent websites. Thankfully we are starting to see more of this like the linked searchmysite discussed earlier today https://news.ycombinator.com/item?id=43467541


that's a good point, in search we have google as a monopoly and since a big percentage of sites only want to be crawled by them it reinforces the monopoly. So a lot of people complain about bots not following robots.txt but if you follow them to the letter it's impossible to make anything useful. Also AFAIK robots.txt doesn't have any legal standing


I detest writing CSS and HTML. I just find it boring fiddly and annoying. I have started doing "vibe" coding with LLM's. Giving a decent prompt produces results that are... pretty good.

Almost 100% in lighthouse for both mobile and desktop, responsive, reusable components, dark/light mode and a design that was better than I could do in the 2-3 hours I spent doing it (while sipping wine).

I know its not a solution for everyone, and probably won't work for the prettier designs out there, but you can go a long way with these tools this day.

I know there is a reluctance to not use LLM's for code tasks, and I am one of the largest critics, but for me this solves a real pain point. I don't want to write CSS/HTML anymore and these tools do a good enough job of it that I don't have to.


LLMs are great for building frontends for backend projects and backends for frontend projects


For CRUD I agree. A lot of what I am doing is a bit more complex then that.

I actually would be happy to just "vibe" code my way through most of the problems I deal with if LLM's were able to do it.

That said, they make a great intern or jnr developer you can hand tasks off. You have to review either way, but the LLM does it faster.


I agree. CASS (the library this book was promoting) is actually really great paired with LLMs. If I revisit this project, it'll be along the lines of using it with LLMs.


While old, the nice thing about word2vec models is how easily they can be imported and used by any language to create a vector search.

I have always wondered (but never done anything) about if you could train on code to achieve a similar result. I suspect there could be some value there even if its just to help identify similar snippets of code.


You can do the same with https://sbert.net/ it is not any more work for you to implement except it really gives better results! Vector databases did not become hot in the word2vec era, they became hot once you got embedding that were sensitive to words in context.

It's arguable, for instance, there is any value in making an embedding for a word which can have multiple meaning. Take a word like "bat", at least it is specific to match that with the specific word bat. If you vectorize it you're going to have to blend in mammals and blend in sports equipment. In any given situation you care about one of them and don't care about the other, so anything you gain from matching other mammals means you also get spurious matches having to do with sports equipment.

BERT sees the context so it will match a particular use of the word "bat" with either mammals or sports equipment so it brings in relevant synonyms but not irrelevant synonyms. That made BERT one of those once-in-a-decade breakthroughs in information retrieval (took a whole decade of conference proceedings to get BM25!) whereas Word2Vec is just a dead end that people wrote too many blog posts about.


there is a code2vec as well


TIL had no idea code2vec existed at all. Thanks for pointing it out.


there is basically everything2vec, but if it doesn't exist, then you can train your own embeddings


I doubt its the GC kicking in, but you could run it with the following environment variable set just to ensure that its not doing anything.

    GOGC=-1
EDIT: Trying it out quickly shows a small improvement actually, but so small as to likely be noise as I was doing other things on the machine.

    Summary
      ./mainnogc 1000 ran
        1.01 ± 0.06 times faster than ./maingc 1000


More lines of code usually indicates more bugs so its not an entirely useless metric. Using it to gauge employee productivity is, since as the linked suggests some of the better programmers are able to remove code, which often reduces the potential for bugs.

I personally find some use it in, if nothing else to identify which files probably need to be refactored as they are too large, or complex. Throw in some cyclomatic complexity and you are getting some useful information out the code counts.

Counting lines of code is one metric though. Another I have seen thrown around is ULOC https://cmcenroe.me/2018/12/14/uloc.html where you count the number of unique lines of code,

    sort -u *.h *.c | wc -l
Again another metric that potentially speaks to the health of a project. I personally wrote about this some time ago when someone asked why I wrote a code counting project https://boyter.org/posts/why-count-lines-of-code/ and I liked the idea of ULOC so much I integrated it into the project as well.

Clearly I am not alone in this though, hence there being so many code counters around (forgive me if I missed yours in the following list) and discussion around it.

    - [SLOCCount](https://www.dwheeler.com/sloccount/) the original sloc counter
    - [cloc](https://github.com/AlDanial/cloc), inspired by SLOCCount; implemented in Perl for portability
    - [gocloc](https://github.com/hhatto/gocloc) a sloc counter in Go inspired by tokei
    - [loc](https://github.com/cgag/loc) rust implementation similar to tokei but often faster
    - [loccount](https://gitlab.com/esr/loccount) Go implementation written and maintained by ESR
    - [ployglot](https://github.com/vmchale/polyglot) ATS sloc counter
    - [tokei](https://github.com/XAMPPRocky/tokei) fast, accurate and written in rust
    - [sloc](https://github.com/flosse/sloc) coffeescript code counter
    - [scc](https://github.com/boyter/scc) my own counter
Incidentally the articles author confused me for a while using standard unix tools to count code, as I had them confused for David Wheeler who wrote the very first code counter that I know of sloccount.


> More lines of code usually indicates more bugs so its not an entirely useless metric.

This is conventional wisdom, but it still feels dubious.

Often I see additional lines of code added to remove a bug, without adding a bug. So comparing the before and after more lines means less bugs.

On the flip side it often indicates more capability, which is more lines, more bugs, more complexity, but also a very unfair comparison.

I am not completely sure what I am getting at here; but, it seems like there are alot of confounding factors.


In my experience it is very common for the additional lines of code added to fix a bug to turn out to introduce a new bug of their own.

The ratio of bugs to lines of code is obviously not constant, but fixing known bugs changes it less than it appears to.


I've seen it stated as, "It is a bit of received computer science wisdom that any program has at least one superfluous line and at least one bug. Apply this rule recursively and you can see that any program can be reduced to a single line with a bug in it." And for proof---http://en.wikipedia.org/wiki/IEFBR14#Implementation


While you are right, in that you add lines to remove bugs, I suspect its still true to say given 2 projects, written in the same language by the same person, on average the one with more lines is more likely to contain more bugs.


Yes, I was going to add another paragraph about more skilled developers using less lines and creating less bug. But this is not true unless the less skilled implementation considers corner cases.


I managed to get #69 and then #42 with some Go code I repurposed since I was messing around with HashCash a while ago.

Including the code below if anyone is curious. I managed to get a 9 zero hash in under an hour on a M2 Mac Mini using it. Its only about ~3.3 MH/s which is not very impressive, but it was very easy to write.

https://gist.github.com/boyter/8600199cc6f4073dc9da380f3224f...


Here's mine: https://gist.github.com/grishka/ed84e4bbfacfbcf4ddc5e63cfc96...

I suppose the next step for me would be to make use of the GPU, which is a much better fit for this job and I'm sure would increase my hash rate by orders of magnitude, but I researched it for a bit and running code on the GPU on a Mac is cumbersome to say the least.


I have always wanted to learn about GPU programming. Now might be the time.


I couldn't resist, so here's the GPU hasher: https://gist.github.com/grishka/c1be1c035f39564debfd4b195bdf...

It gets around 232.5 MH/s on the M1 Max. There's also an optimization that exploits the fact that SHA256 is incremental, so the state for the common prefix could be pre-computed and then copied and only updated with the variable part. This yields around 30% performance boost.


I'm able to get 250MH/s with this. M1 Pro


I made some improvements and it's now 340MH/s


Wow... bookmarked. F-15 Strike Eagle II is one of those games I played the hell out of as a child. So much so I wore out two joysticks that my parents bought as a result.

I was a child so it took me a long time to learn to actually do what the game wanted, and then learn after taking out the targets you got more medals by then bombing or strafing other targets. I don't think I ever got the hang of landing back on the aircraft carrier though.


This, and F19 Stealth Fighter. I lost so much time in to them :)


F-19 was my first flightsim (well, some on the Commodore 64 I never got the hang with were the actual first).

I had a pirated copy. Back then, nobody own legit games in my country. It was hilarious that the copy protection gave you a top down view of an aircraft and made you look up the name in the manual.

Yeah... for me, an aircraft-obsessed teenager who knew all the shapes, this was the same as no copy protection at all :) If anything, it trained me in recognizing the shapes of all aircraft of that era!

PS: I was sad when I learned the F-19 was never a real thing. That said, sometime later MicroProse came up with the F-117, which we know was a real thing.


That was my experience with it, too. Microprose could do no wrong.


I owned multiple Microprose games and the manuals that came with the games were simply fantastic.

There would be a section on the gameplay itself and then usually some historical reference about the setting of the game. E.g. if it was a WW1 flight combat simulator, there would be a section on the history of aerial dogfighting and then specs on each plane.

I know that nowadays it's common knowledge that most people don't read the manuals and it's easier to get people into a game with a combination first level /introduction/tutorial. That being said, I feel like we lost something by taking out the manuals with all that rich historical detail.


Yes the Battle of Britain game (not by Microprose but LucasArts) actually came with a whole history book. I read it for English class, my teacher deemed it decent enough quality for that (English is not my primary language so picking high-quality literature wasn't really a priority).

It really added so much to the game. Many other games came with cool things in the box too, like Wing Commander with the blueprints of each fighter.

I guess part of the reason was that the games themselves weren't all that immersive. There were no cutscenes or long narrative. The tech simply wasn't there yet. The included box items made it more interesting and immersive.


I miss the simulation games of that era. They’ve pretty much faded away. At one point I dove back in a bit but everything was so janky trying to run that I quickly lost interest.


Knights of the Sky was that WW1 flight combat game. Also a wonderful chunk of my childhood went into that. Although I found SEII to be more fun.


I've been absolutely thrilled by the Microprose rebirth of the past couple years.


Its a result of trigrams themselves. For example turning searchcode (please ignore plug, this is just the example I had to hand) goes from 1 thing you would need to index into 8.

    "searchcode"   -> [sea, ear, arc, rch, chc, hco, cod, ode]
As a result the index rapidly becomes larger than you would expect.


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: