Hacker Newsnew | past | comments | ask | show | jobs | submit | prettyblocks's commentslogin

I've used chrome devtools mcp successfully to do all kinds of advanced in browser tasks, agents like claude code can write js and inject it into the context in a live browser and do all kinds of neat tricks. I've used this extensively in gemini-cli.

ChatGPT is very happy to help me with offensive tasks. Codex is as well.


Are you somehow prompting around protections or something, or yours is just pretty chill? I've tried a few times with various cybersecurity/secops stuff and it's always basically given me some watered down "I can't talk to you about that, but what I can talk to you about is" and then the is, isn't anything really.


It's pretty chill. I think part of it might be that my context is overloaded with security work, so it doesn't protest this stuff. I also have memories turned on which I don't really keep an eye on at all, and I think having a bunch of stuff in there related to cyber stuff also helps to keep it agreeable with what I'm asking for. Maybe you can hardcode this manually and see if that helps or try to gradually escalate the context by starting a technical conversation and then later on introducing the offensive task you're working on.


I suspected that too, basically your own internal context is strong enough to have it not be concerned you're acting maliciously. That's interesting, I've found mine is very tuned into my work also and folks get much worse results from the same prompts. Thanks for the followup. Interesting times.


I have the same question. I used to be able to get around it by saying things like, "I'm a cybersecurity professional testing my company's applicaitons" or even lying with "I'm a cybersecurity student trying to learn," but that stopped working at least 6 months ago, maybe a year.


Are the puzzles generated algorithmically or manually?


It's a mix of things. For example, there's an algorithm that ensures all valid deductions are allowed (I'm not smart enough to ensure all of them manually!). But a good amount of manual work goes into each daily puzzle.


I don't think tricky niche knowledge is the sweet spot for genai and it likely won't be for some time. Instead, it's a great replacement for rote tasks where a less than perfect performance is good enough. Transcription, ocr, boilerplate code generation, etc.


The thing is, I see people use it for tricky niche knowledge all the time; using it as an alternative to doing a Google search.

So I want to have a general idea of how good it is at this.

I found something that was niche, but not super niche; I could easily find a good, human written answer in the top couple of results of a Google search.

But until now, all LLM answers I've gotten for it have been complete hallucinated gibberish.

Anyhow, this is a single data point, I need to expand my set of benchmark questions a bit now, but this is the first time that I've actually seen progress on this particular personal benchmark.


That’s riding hype machine and throwing baby with bath water.

Get an API and try to use it for classification of text or classification of images. Having an excel file with somewhat random looking 10k entries you want to classify or filter down to 10 important for you, use LLM.

Get it to make audio transcription. You can now just talk and it will make note for you on level that was not possible earlier without training on someone voice it can do anyone’s voice.

Fixing up text is of course also big.

Data classification is easy for LLM. Data transformation is a bit harder but still great. Creating new data is hard so like answering questions where it has to generate stuff from thin air it will hallucinate like a mad man.

The ones that LLMs are good in are used in background by people creating actual useful software on top of LLMs but those problems are not seen by general public who sees chat box.


But people using the wrong tool for a task is nothing new. Using excel as a database (still happening today), etc.

Maybe the scale is different with genAI and there are some painful learnings ahead of us.


And Google themselves obviously believe that too as they happily insert AI summaries at the top of most serps now.


Or maybe Google knows most people search inane, obvious things?


Or more likely Google couldn't give a rat's arse whether those AI summaries are good or not (except to the degree that people don't flee it), and what it cares is that they keep users with Google itself, instead of clicking of to other sources.

After all it's the same search engine team that didn't care about its search results - it's main draw - activey going shit for over a decade.


Google AI Overview a lot of times write wrong about obvious things so... lol

They probably use old Flash Lite model, something super small, and just summarize the search...


Those summaries would be far more expensive to generate than the searches themselves so they're probably caching the top 100k most common or something, maybe even pre-caching it.


I also use niche questions a lot but mostly to check how much the models tend to hallucinate. E.g. I start asking about rank badges in Star Trek which they usually get right and then I ask about specific (non existing) rank badges shaped like strawberries or something like that. Or I ask about smaller German cities and what's famous about them.

I know without the ability to search it's very unlikely the model actually has accurate "memories" about these things, I just hope one day they will acutally know that their "memory" is bad or non-existing and they will tell me so instead of hallucinating something.


I'm waiting for properly adjusted specific LLMs. A LLM trained on so much trustworth generic data that it is able to understand/comprehend me and different lanugages but always talks to a fact database in the background.

I don't need an LLM to have a trillion parameters if i just need it to be a great user interface.

Someone is probably working on this somewere or will but lets see.


Second this.

Basically making sense of unstructured data is super cool. I can get 20 people to write an answer the way they feel like it and model can convert it to structured data - something I would have to spend time on, or I would have to make form with mandatory fields that annoy audience.

I am already building useful tools with the help of models. Asking tricky or trivia questions is fun and games. There are much more interesting ways to use AI.


Well, I used Grok to find information I forgot about like product names, films, books and various articles on different subjects. Google search didn't help but putting the LLM at work did the trick.

So I think LLMs can be good for finding niche info.


Yeah, but tests like that deliberately prod the boundaries of its capability rather than how well it does what it’s good at.


Even if it's just for their internal security initiatives it would make sense given how massive they are. Threat hunting via cert monitoring is very effective.


But it isn’t. Guy posted the fact they sent bot for scraping.

That’s not the intended use for CT logs.


The advice here seems to assume a single .md file with instructions for the whole project, but the AGENTS.md methodology as supported by agents like github copilot is to break out more specific AGENTS.md files in the subdirectories in your code base. I wonder how and if the tips shared change assuming a flow with a bunch of focused AGENTS.md files throughout the code.


Hi, post author here :)

I didn’t dive into that because in a lot of cases it’s not necessary and I wanted to keep the post short, but for large monorepos it’s a good idea



I think this was true 10 years ago too if you consider the global talent pool.


Wouldn't this be trivial to verify unless distribution is highly targeted? Just open up some electronics?


How often do you open up your device to check for bombs?

I don't think the point is that it's not difficult to screen for, the point is that most people will not think to or do not have the means, the time, the knowledge, or the willpower to take apart all their devices, verify what is and isn't an explosive device, and then reassemble it intact. According to this report, there are a non-zero number of devices around the world, possibly in shipping containers, that contain explosives.

At least one of the suppositions I saw this last year, was that Ukraine was likely to be slapped on the hands for using consumer shipping for their military drone deployments. Because presumably, the majority of countries will not take lightly the fact that any given consumer shipping could now contain military equipment that could potentially be deployed against them, and that it is in the interest of every single country to react with prejudice to the mixing of consumer and military shipments.

An agent of the Israeli state has now admitted that since at least 2006 (so the better part of a quarter of a century), they have been planting bombs in consumer-grade electronics and subsequently using them to selectively blow people up in civilian places. How this is not a) worldwide news, and b) taken as an admission of overt terrorist activity, is utterly baffling. Can you imagine the reaction if an ex-NSA, ex-CIA, ex-MI7, or ex-MSS operative admitted that they had been planting explosives in consumer grade electronics?! There would be an international uproar.


> I don't think the point is that it's not difficult to screen for, the point is that most people will not think to or do not have the means, the time, the knowledge, or the willpower to take apart all their devices, verify what is and isn't an explosive device, and then reassemble it intact. According to this report, there are a non-zero number of devices around the world, possibly in shipping containers, that contain explosives.

I think it's fairly unlikely that these devices are being shipped to normal hardware customers as doing so would likely risk exposing the operation. These sort of operations appear to exploit the fact that terrorist organizations themselves are forced to covertly procure hardware without going through typical supply chain channels.

> At least one of the suppositions I saw this last year, was that Ukraine was likely to be slapped on the hands for using consumer shipping for their military drone deployments. Because presumably, the majority of countries will not take lightly the fact that any given consumer shipping could now contain military equipment that could potentially be deployed against them, and that it is in the interest of every single country to react with prejudice to the mixing of consumer and military shipments.

There is a rather wide range of technologies/services that have both military and civilian use cases, drones being the obvious example of dual use hardware and shipping/logistics being an obvious example of a dual use service. Plenty of civilian shipping companies provide services to military customers around the world. I think it's pretty hard to argue that a highly targeted attack using drones transported by enemy civilian logistics is unethical simply because civilian logistics was used as part of the operation.

> An agent of the Israeli state has now admitted that since at least 2006 (so the better part of a quarter of a century), they have been planting bombs in consumer-grade electronics and subsequently using them to selectively blow people up in civilian places. How this is not a) worldwide news, and b) taken as an admission of overt terrorist activity, is utterly baffling. Can you imagine the reaction if an ex-NSA, ex-CIA, ex-MI7, or ex-MSS operative admitted that they had been planting explosives in consumer grade electronics?! There would be an international uproar.

That's a rather disingenuous way to frame an operation which was arguably the most precise coordinated assassination operation against a terrorist organization in history. Virtually all individuals killed/injured by the operation were members of the terrorist organizations being targeted with only a tiny amount of civilian casualties(virtually all civilian casualties were family members of the terrorists that happened to pick up the devices instead of the intended targets AFAIU). These devices appear to have been exclusively sold to the terrorists and never distributed to normal customers. There doesn't appear to be any evidence that any of these devices ended up being sold to normal non-terrorist customers.


Some random person is gonna get killed because their eBay sourced electronics were originally purchased for Iran, were intercepted by Mossad and tampered with, sent on their way, got seized in Jordan for sketchy customs papers, bought at evidince auction 18mo later by a reseller, who imported them (or the importer is gonna lose their life's work after being accused of explosives smuggling by the feds).

There's basically no accountability for these intelligence organizations preventing them from playing fast and loose.

On the other hand, there's various degrees of explosives spot checking all over international boundaries and the like. If random explosives are moving around surely someone would run across them, so it my just be some scumbag spook trying to get people scared.


> Some random person is gonna get killed because their eBay sourced electronics were originally purchased for Iran, were intercepted by Mossad and tampered with, sent on their way, got seized in Jordan for sketchy customs papers, bought at evidince auction 18mo later by a reseller, who imported them (or the importer is gonna lose their life's work after being accused of explosives smuggling by the feds).

I suspect in practice the spicy pagers would tend to be tracked quite closely by the intelligence agencies.

> There's basically no accountability for these intelligence organizations preventing them from playing fast and loose.

Intelligence agencies don't have a lot of accountability in general, but I'd hardly say operation grim beeper was playing fast and loose with how precisely the terrorists were targeted ultimately.

> On the other hand, there's various degrees of explosives spot checking all over international boundaries and the like. If random explosives are moving around surely someone would run across them, so it my just be some scumbag spook trying to get people scared.

Hard to say how easy to detect they would be, but it doesn't seem all that likely that random consumers would run into these sort of devices. Intelligence agencies would certainly not want these devices getting distributed to the general public.


>I suspect in practice the spicy pagers would tend to be tracked quite closely by the intelligence agencies.

Because these people totally wouldn't cut and run and leave someone else holding the bag if things went wrong. /s

Kinda like how the CIA spent the 90s quasi-protecting Al-Qaeda in an effort to penetrate the organization only to cut and run and be all "hey, FBI, y'all might want to look into these guys we really think they're up to something serious" in summer of 2001.

Say an operation was called off. I give it 50% odds between them finding a way to buy the devices to keep them from getting out into public vs 50% chance they just abandon them.

>, but I'd hardly say operation grim beeper was playing fast and loose with how precisely the terrorists were targeted ultimately.

I generally agree but I absolutely foresee some random company in the region having 1/3 of their laptops go boom because they bought tampered shit that "got out". Best case someone opens one up for service, goes WTF, snaps a picture, internet amplifies, it gets back to the OEM and the "questionable" lots are ID'd.

> but it doesn't seem all that likely that random consumers

The angry pagers were being bought under the guise of legitimate companies. I find it very hard to believe that some gravel pit or factory who needed 20 and bought 200 on a "we pay you for 250 basis" didn't have their 20 go poof while sitting on the charging shelf in the office or whatever.

This whole thing is just too "meta targeted" for my taste in the same way that "signature strikes" were. It's not like these organizations lack the capacity to kill people the old fashioned way, heck it might even be cheaper.


> Say an operation was called off. I give it 50% odds between them finding a way to buy the devices to keep them from getting out into public vs 50% chance they just abandon them.

In all likelihood even if they were abandoned, probably nothing would happen since modern explosives tend to be designed to be rather stable unless intentionally detonated(an intelligence organization would want to design them this way to avoid detection due to accidental detonation of course).

> The angry pagers were being bought under the guise of legitimate companies. I find it very hard to believe that some gravel pit or factory who needed 20 and bought 200 on a "we pay you for 250 basis" didn't have their 20 go poof while sitting on the charging shelf in the office or whatever.

From what was reported it looks like Mossad essentially licensed the brand rights for the pagers through shell companies, manufactured the pagers themselves and then distributed them exclusively to the terrorists after infiltrating the terrorists hardware procurement process in some way. One would also likely assume only pagers actually connected to the network of the terrorists would get triggered.

> This whole thing is just too "meta targeted" for my taste in the same way that "signature strikes" were. It's not like these organizations lack the capacity to kill people the old fashioned way, heck it might even be cheaper.

I think there's a bit of a difference here, it's not like these pagers were random consumer devices actually being sold on the open market, they were in reality a highly exclusive device sold only to the terrorists through some sort of supply chain infiltration attack with what appears to be only the marketing material along with fake customer testimonials being distributed publicly to trick the buyers into thinking these were normal consumer devices. For such a precise attack against a terrorist organization it seems unlikely tradition methods would have been more effective for the cost of the operation. Most traditional methods would also likely incur far higher civilian casualties.


The sealed battery itself is the bomb. The tell was that the batteries had lower capacity than they should have given the size.


Which do you suggest?



You should use RTEB instead. See here for why: https://huggingface.co/blog/rteb

Here is that leaderboard https://huggingface.co/spaces/mteb/leaderboard?benchmark_nam...

Voyage-3-large seems like SOTA right now


yep


The Qwen3 600M and 4B embedding models are near state of the art and aren't too computationally intensive.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: