Hacker News new | past | comments | ask | show | jobs | submit | tkgally's comments login

I asked the just-released ChatGPT o4-mini-high to locate four photographs of varying difficulty. It didn’t get any of them right, though the guesses weren’t bad. The reasoning was also interesting to watch, as it cropped sections of the photos to examine them more closely. I put the photos, response, and reasoning trace here:

https://www.gally.net/temp/20250418chatgptgeoguesser/index.h...

Later: I tried the same prompt and photos with Gemini 2.5 Pro. It also got them all wrong, though with a similar degree of reasonableness to its guesses. I had thought that Google’s map and street-view data might lead to better results, but not this time.


Still later: I later read that o3 is supposedly particularly good with this geoguessing, so I tried the same prompt and photos with o3. This time it got one out of four correct: “The view of the canal with cherry blossoms and the green railway viaduct is the Ōoka River in Yokohama, looking north from the little road bridge between Hinodechō and Koganechō stations. The tracks on the left belong to the Keikyū Main Line, and the high‑rises in the distance are the Minato‑Mirai and Kita‑Naka district towers.” Its other three answers were still wrong.

Another useful word in this context is “sycophancy,” meaning excessive flattery or insincere agreement. Amanda Askell of Anthropic has used it to describe a trait they try to suppress in Claude:

https://youtube.com/watch?v=ugvHCXCOmm4&t=10286


The second example she uses is really important. You (used to) see this a lot in stackoverflow where an inexperienced programmer asks how to do some convoluted thing. Sure, you can explain how to do the thing while maintaining their artificial constraints. But much more useful is to say "you probably want to approach the problem like this instead". It is surely a difficult problem and context dependent.

XY problem

Interesting that Americans appear to hold their AI models to a higher standard than their politicians.

Different Americans.

Lots of folks in tech have different opinions than you may expect. Many will either keep quiet or play along to keep the peace/team cohesion, but you really never know if they actually agree deep down.

Their career, livelihoods, ability to support their families, etc. are ultimately on the line, so they'll pay lip service if they have to. Consider it part of the job at that point; personal beliefs are often left at the door.


Four-decade resident of Japan here. While small towns might have their appeal, for me the real Japan is the Yamanote Line between Shinjuku and Ikebukuro on a Friday evening in summer, the cars packed with sweaty drunk and sober people of all kinds, talking and laughing and reading and looking at their phones and dozing off as they sway against each other when the train stops at Takadanobaba.

I traveled there in 2008, and had that precise experience! I always wondered whether I'd just been there on a particularly excellent night, or if it was a normal happening. I'm so glad to hear it's the latter!

I visited for the first time last year and I loved people watching at night in this area. It felt so alive.

Takadanobaba! Home of the Big Box, and that little underground jazz kissa/bar.

> Gemini 2.5 Pro in Deep Research mode is twice as good as OpenAI’s Deep Research

That matches my impression. For the past month or two, I have been running informal side-by-side tests of the Deep Research products from OpenAI, Perplexity, and Google. OpenAI was clearly winning—more complete and incisive, and no hallucinated sources that I noticed.

That changed a few days ago, when Google switched their Deep Research over to Gemini 2.5 Pro Experimental. While OpenAI’s and Perplexity’s reports are still pretty good, Google’s usually seem deeper, more complete, and more incisive.

My prompting technique, by the way, is to first explain to a regular model the problem I’m interested in and ask it to write a full prompt that can be given to a reasoning LLM that can search the web. I check the suggested prompt, make a change or two, and then feed it to the Deep Research models.

One thing I’ve been playing with is asking for reports that discuss and connect three disparate topics. Below are the reports that the three Deep Research models gave me just now on surrealism, Freudian dream theory, and AI image prompt engineering. Deciding which is best is left as an exercise to the reader.

OpenAI:

https://chatgpt.com/share/67fa21eb-18a4-8011-9a97-9f8b051ad3...

Google:

https://docs.google.com/document/d/10mF_qThVcoJ5ouPMW-xKg7Cy...

Perplexity:

https://www.perplexity.ai/search/subject-analytical-report-i...


Matches also my experience that openai fell behind with their deep search product. And that deep search is basically the top tier benchmark for what professionals are willing to pay. So why should i shell out 200 dollar for an openai subscription when google gives me a better top-tier product for 1/10th of the price openai or anthropic are asking. Although i assume google is just more willing to burn cash in order to not let openai take more market share which would get them later on soo more expensive (e.g. iphone market share, also classic microsoft strategy).

It may actually be affordable for Google to charge $20 vs OAI's $200. Google already has an extensive datacenter operation and infrastructure that they're amortizing across many products and services. AI requires significant additions to it, of course, but their economy of scale may make a low monthly sub price viable.

The $20/month Chatgpt subscription has deep research so the comparison should be $20 vs $20, not $20 vs $200.

Gemini Deep Research is available for free, so it's $0 vs $20?

Great stuff. My prompts are falling behind after seeing what you are doing here.

I find OpenAI annoying at this point that it doesn't output a pdf easily like Perplexity. The best stuff I have found has been in the Perplexity references also.

Google outputting a whole doc is really great. I am just about to dig into Gemini 2.5 Pro in Deep Research for the first time.


> My prompts are falling behind....

If you haven’t already, you might want to try metaprompting, that is, having a model write the prompt for you. These days, I usually dictate my metaprompts through a STT app, which saves me a lot of time. A metaprompt I gave to Claude earlier today is at [1]. It’s sloppy and has some transcription errors, but, as you can see, Claude wrote a complete, well-organized prompt that produced really good results from Gemini Deep Research [2]. (I notice now, though, that the report is truncated at the end.)

[1] https://claude.ai/share/94982d9d-b580-496f-b725-786f72b15956

[2] https://docs.google.com/document/d/1np5xdXuely7cxFMlkQm0lQ4j...


Metaprompting is the way to go. Also avoid projects attachments as it uses inferior techniques like rag

> "produce a comprehensive analytical report exploring the conceptual and methodological intersections between Surrealist art techniques, Freudian dream analysis, and the practice of prompt engineering for AI image generation models (such as DALL-E, Midjourney, Stable Diffusion)."

Haha, what a perfect project for AI.


Thanks for sharing your prompting technique. I will try to use that technique in the future as well.

That's a nice collection of photographs.

Here are some written accounts of Japan during that period:

https://www.gally.net/jatsi/index.html


> Their generality of knowledge, and ability to weave completely disparate subjects together sensibly, is beyond any other artifact in existence

Very well said. That’s perhaps the area where I have found LLMs most useful lately. For several years, I have been trying to find a solution to a complex and unique problem involving the laws of two countries, financial issues, and my particular individual situation. No amount of Googling could find an answer, and I was unable to find a professional consultant whose expertise spans the various domains. I explained the problem in detail to OpenAI’s Deep Research, and six minutes later it produced a 20-page report—with references that all checked out—clearly explaining my possible options, the arguments for and against each, and why one of those options was probably best. It probably saved me thousands of dollars.


Very cool. Inspired by the post, I gave the following prompt to Gemini 2.5 Pro with the temperature set to 0.35:

“Create a music synthesizer that I can run in my browser. Give it original functionality, with buttons and other controls that create pleasing but unexpected sounds, melodies, rhythmic patterns, etc. as well as various sliders and dials that can be used to modify the sounds and patterns.”

Here is the result:

https://www.gally.net/temp/202540326geminisynthesizer/index....

Here’s an excerpt from the reasoning trace:

> Brainstorm "Original" and "Unexpected" Functionality:

> Standard Synths: Oscillators (sine, square, saw, triangle), filters (lowpass, highpass), envelopes (ADSR), LFOs. These are common. How to make them unexpected?

> Generative Aspects: Instead of just playing notes the user clicks, what if the synth generates patterns?

> Melody Generation: Use algorithms like random walks, cellular automata, or simple probability rules.

> Rhythm Generation: Euclidean rhythms, probabilistic triggers, evolving patterns. ...

> Interface Concept: How can the controls themselves be part of the "unexpected" experience? Buttons that trigger complex events, sliders with non-linear responses, dials that control abstract concepts.


After sleeping on the above and watching some videos about Gemini 2.5 (especially Sam Witteveen’s at [1]), I decided to ask Gemini for an enhanced version of the synthesizer. Here it is:

https://www.gally.net/temp/202540327geminisynthesizer-v2/ind...

This was the prompt I gave to it (through a spoken interface, thus the length and repetition):

“Attached is a website I had you create for me yesterday based on the prompt that appears in another attached file. In that latter file I've also included your thinking process in response to my prompt as well as your explanation to me of how this synthesizer is supposed to work. I am basically happy with the synthesizer you created for me. It works very well, and the output is fascinating to listen to. But I would like the music produced by it to be more melodical and contrapuntal, that is, with more distinct notes that can be perceived forming melodies while still having the random and unexpected and creative generation of those melodies. I would also like to have a broader frequency range of tones that are being produced. For example it would be nice to have something like a bass line. Continue to make the music unexpected and creative and generative. That was one aspect of the music that was very positive for the first result: the fact that I could keep listening to the produced music for a long period of time and not get bored by it. So try to make the tone soundscape richer, more complex and with more sense of melody and counterpoint. Also add any more controls you can think of to make the, to give the user even more ways in which to affect the output, such as more fine tuning on the degree of tonality vs. atonality, conventional harmonic structures vs. unconventional harmonic structures, clear rhythmic patterns vs. unconventional rhythmic patterns, etc.”

The first result had a lot of digital clipping in the output on my M1 Mac mini. After some back and forth with Gemini about possible causes and solutions, it added a limiter and some more controls. The problem persists on the Mac mini. On my M4 iPad with Safari, the sound is clean. I kind of like it.

[1] https://www.youtube.com/watch?v=B3wLYDl2SmQ


I don't know what it was but this made my dog go nuts.


Is there something like "Claude Code" for Gemini? Or do you have to manually copy/paste the code in files?


Check out Aider [0] or Anon Kode [1] (clone of Claude Code). New models are why I try to build all my tools and infra to be model-independent. On that note, I also prefer to be provider-independent, using OpenRouter [2] or T3 Chat [3] and the like.

[0] https://aider.chat/ [1] https://github.com/dnakov/anon-kode [2] https://openrouter.ai/ [3] https://t3.chat/


OpenRouter is great for trying new models but I wouldn't use it long term since they add their cut on top of the provider's pricing.


You can also use the "fabric" CLI tool with its new "code_helper" functionality:

https://github.com/danielmiessler/fabric?tab=readme-ov-file#...

This is more rudimentary and works on the CLI, but I've had good results with it using both Gemini Pro and local models.


There is an Open-Source tool named Aider that can use Gemini: https://aider.chat/


You can use Gemini from VScode. (Well at least copilot can call it)


WOW! Very cool and original!!


I just tested the "gpt-4o-mini-tts" model on several texts in Japanese, a particularly challenging language for TTS because many character combinations are read differently depending on the context. The produced speech was quite good, with natural intonation and pronunciation. There were, however, occasional glitches, such the word 現在 genzai “now, present” read with a pause between the syllables (gen ... zai) and the conjunction 而も read nadamo instead of the correct shikamo. There were also several places where the model skipped a word or two.

However, unlike some other TTS models offering Japanese support that have been discussed here recently [1], I think this new offering from OpenAI is good enough for language users. I certainly could have put it to good use when I was studying Japanese many years ago. But it’s not quite ready for public-facing applications such as commercial audiobooks.

That said, I really like the ability to instruct the model on how to read the text. In that regard, my tests in both English and Japanese went well.

[1] https://news.ycombinator.com/item?id=42968893


Self-correction: "... good enough for language users" --> "good enough for language learners."


It's a good article, but I kept waiting for a suggestion that never arrived: that perhaps individualized tutoring can be provided effectively by interactive AI.

I don't know how well ChatGPT et al. would work if applied at scale as part of a mastery-based curriculum. I especially wonder if students would be as motivated to learn by chatbots as they are by human interaction. But considering the low cost and ready availability of LLM tutors, it's at least a possibility worth considering.


I wasn’t able to replicate that bug on my Mac, but when I tried to do so I ran into an instance of another bug that has been annoying me for many years: windows that open far away on the screen from where I clicked. Here is where the color picker appeared on my screen after I clicked on the custom color button in the change wallpaper window:

https://gally.net/temp/20250304macoswindowopeningposition.jp...


That’s not a bug. The color picker always opens at the bottom left no matter which app opens it. Has always been like that.


Sounds a bit like “you’re holding it wrong” tbh.

I understand your point, once you have absorbed the Apple logic it is reinforced continually and makes sense. But opening a dialog near the use interaction point is a very reasonable expectation.

For me, I’ve never noticed any logic to it, I just know I need to hunt for it.


Colors is not just a dialog! It's a standalone application opening in a new window which you can open just like all other applications on macOS and also remembers the last opened place.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: