Hacker Newsnew | past | comments | ask | show | jobs | submit | juxtaposicion's commentslogin

Chrisemoody.com

Nice work. I’ve also tinkered on unit pricing! I worked on Popgot.com, which is similar but for the US and tracks non-perishable staples


How accurate are the grocery prices for Kroger or Albertson's owned stores?


I'm not sure I understand. Your model shows that different group buckets (eg 20-24yo vs 25-29yo) peak at different years (in your figure, 2022 vs 2024) despite being driven by the same dynamics. Is that expected? I (naively?) expected the same groups to rise, fall and have peaks at the same times.


One of the dynamics is that people get older so they move into different buckets.

We can make the model way simpler to make it clearer. Say in 2020 we hired 1000 20-24yo, 1000 25-29yo etc and then we didn't hire anyone since then. That was five years ago, so now we have 0 20-24yo, 1000 25-29yo, 1000 30-34yo etc and 1000 retirees who don't show up in the graph.

Each individual year we hired the exact same number of people in each age bracket, and yet we still end up with fewer young people total whenever hiring goes down, because all the people that got hired during the big hiring spike are now older.


Got it, thanks! Yeah, so it makes sense that any age-bucketing like this would have a similar effect


Yeah, agree most daily purchases are humdrum and shouldn’t command all of my attention.

Incidentally, my last project is about buying by unit price. Shameless plug, but for vitmain D the best price per serving here (https://popgot.com/vitamin-d3)


Those "refine your results" buttons is clever UX. I like the Choose your own adventure feel to it. Nicely done.


That’s pretty interesting. I’ve using Airtable’s “field agents” for a similar use case, but would love to use this instead. Does it automatically cache values? (Don’t want to pay for repeat prompts just because one input cell updated)


Yes it does, you can toggle it on and off. Send me an email at kasper at getcellm dot com or sign up to the waitlist on getcellm dot com and I will personally onboard you!


I’m building Popgot (https://popgot.com): compare unit prices (per oz/sheet/lb) across Costco, Walmart, Target, and Amazon. We normalize fuzzy sizes (“family,” “mega,” multipacks) so you see the actually cheapest option for staples.

New: a deep research mode that, on demand, crawls thousands of product pages and uses visual LLMs to read label photos (ingredients, counts, square footage) when the text is messy. First run takes ~60–90s, then it’s cached.

A good torture test: 20×25×1 MERV 13 home air filters—listings mix single/4/6/12-packs and vague claims (“3-month,” “allergen defense”), which wreck per-unit comparisons. I’d love feedback on misses (coupons/Subscribe & Save/region), categories to add, and to collaborate with a grocery-list app, budgeting tool, or anyone in the frugal/deals space. chris@popgot.com


I see paper towels but no toilet paper? I think toilet paper is the most confusing one

edit: also this doesn't seem correct:

Everything above will save you $57.65 on 33 fl oz

https://popgot.com/shampoo?attributes=scalp_concern%3Adandru...


I had to look at that carefully, but I think that "save you $57.65 on 33 fl oz" is both technically and meaningfully correct. It compares our best choice to the most popular choice -- we use the product with the most ratings as a proxy for that. Nizoral 2-in-1 has a crazy 100k reviews, but it is in fact almost 20x more expensive per fluid ounce! And it is the most popular product@

If you hover the text it explains the logic (you can see that in this screenshot https://imgur.com/a/hO7fiWR). But to replay the logic here:

Equate 2 in 1 Dandruff Shampoo 28.2oz is 21¢/fl oz (for 33 fl oz it costs $6.99) is the Popgot choice.

But the most popular (e.g. most reviewed) product is "Nizoral 2-in-1 Anti-Dandruff Shampoo" and that costs a whopping $1.96/fl oz (33 fl oz it costs $64.63)

So yes, the most popular anti-dandruff shampoo (which I used to use, until I saw this shampoo list https://popgot.com/shampoo?attributes=scalp_concern%3Adandru...) is literally 20x more expensive, so you can do a lot better by picking alternatives at the top of that list.

Not sure why you didn't see toilet paper, but it is right here: https://popgot.com/toilet-paper


I always kind of felt like it would be great to compare all kinds of toilet paper (soft, not soft, 1- 2- 3-ply etc) together based on weight. I feel like I use a lot fewer sheets of thicker toilet paper. So if I could see which was cheapest per gram or something I’d be pretty interested.


Interesting idea. Running the LLMs would be expensive. How are you monetising this product? Additionally, since the product is targeted towards frugal customers, do you think you will be able to generate a decent revenue from it?


The LLMs are in fact quite expensive! We run dozen of LLM calls across thousands of products. That's thousands to tens of thousands of calls per search query. The idea is we've got to find the best & the cheapest, and I have spared no expense in doing so. (Plus we have GCP credits.)

Eventually products will overlap between search queries, so we can serve fast and low latency results that have been pre-processed by LLMs. That will be near zero cost. And of course LLM prices will continue to drop quickly.

We monetize via affiliate fees -- you buy something off that list, and we get 1-4% back at no cost to you.


with AR and glasses camera platforms, this will actually be a big deal. pricing and real consumer interest and inventory levels are competitive.

If people are wearing AR glasses into big box stores that are comparing prices in real time, I could see there being a real time auction for CPG pricing the way there are for website ads now.


haha this is fun. shared with my partner.. he spends hours doing it manually


thanks! let me know if y'all have any feedback :)


I’m working on Popgot (https://popgot.com), a tool that tracks unit prices (cost per ounce, sheet, pound) across Costco, Walmart, Target, and Amazon. It normalizes confusing listings (“family size”, “mega pack”, etc.) to surface the actual cheapest option for daily essentials.

On top of that, it uses a lightweight AI model to read product descriptions and filter based on things like ingredients (e.g., flagging peanut butter with BPA by checking every photograph of the plastic or avoiding palm oil by reading the nutrition facts) or brand lists (e.g., only showing WSAVA-compliant dog foods). Still reviewing results manually to catch bad extractions.

Started this to replace a spreadsheet I was keeping for bulk purchases. Slowly adding more automation like alerting on price drops or restocking when under a threshold.


I don't think I have the time to go to different stores to buy different things based on what is cheap. I have one fixed one.

However, what I would like is a product where I upload my shopping receipt for a few weeks/months from the one store I go to. The application figures out what I typically buy and then compares the 4-5 big stores and tells me which one I should go to for least price.


Yeah, I agree. It is a pain to search product by product instead of sticking to one store. Also popgot.com can only do what's online & shipped to you -- so really just the non-perishables / daily essentials that are not fresh groceries. But even when limited to consumables I save ~$100/mo by basically buying by unit price.

Uploading a receipt to see how much you can save... that's a good idea. I think I can find your email via your personal site. Can I email you when we have a prototype ready?


A one time email is fine.

However, I am in Canada. So can only test it once you expand there. Thanks.

I don't know how things are in the US, but it does seem like the grocery store oligopoly is squeezing consumers a lot, so tools like this are valuable for injecting competition into the system.


Shameless plug for my own project (https://grocerytracker.ca/) since you're in Canada. Eventually I'd love for it to do what you're suggesting, but for now the closest thing you can do is create a basket for each store with the same items and then check each week to see which is the cheapest.


This is a great idea. And OCR should be good enough nowadays to parse the receipts. Probably would work best as a mobile app, though.


Have you looked at receipts? They’re narrow, only do one line per item, and every store prints something different for the same product. It usually includes a store specific sku, the price and some truncated text on that single line. Good luck figuring out exactly what someone purchased from a random receipt.


Awesome site. You've probably come across it, but just in case you haven't. In the UK we have trolley.co.uk (plus app) which is handy. The barcode scanner I use a lot when I want to check if the branded product is a good price in the shop i'm standing in or if i'm getting ripped off. They have all products (I assume because online grocery shopping is bigger here?). Personally, I'm looking to start online shopping (new dad so time poor), it'd be great if I could build a shopping list and a site tell me which online grocer to order from for the best value, with basket price breakdown for each.


This is so good I disabled my ad blocker.

Thank you. Seriously.

Note: I searched "Protein bars", and it treated all protein bars equally. The 1st-20th cheapest had <15g of protein per bar. I had to scroll down to the 50th-60th to find protein bars with 20g of protein, which surprised me for being cheaper than Kirkland Signature's protein bars.


My pleasure! Happy you could use it as much as I do. Anyway we can chat in person? I'd love to make more stuff for you. chris@<our site>.com


I like this idea a lot -- feels like there's a lot of room to grow here. Do you have any sort of historical price tracking/alerting?

And/or also curious if there is a way to enter in a list of items I want and for it to calculate which store - in aggregate - is the cheapest.

For instance, people often tell me Costco is much cheaper than alternatives, and for me to compare I have to compile my shopping cart in multiple stores to compare.


> For instance, people often tell me Costco is much cheaper than alternatives, and for me to compare I have to compile my shopping cart in multiple stores to compare.

A few years ago, I was very diligently tracking _all_ my family's grocery purchases. I kept every receipt, entered it into a spreadsheet, added categories (eg, dairy, meat), and calculated a normalized cost per unit (eg, $/gallon for milk, $/dozen eggs).

I learned a lot from that, and I think I saved our family a decent amount of money, but man it was a lot of work.


Glad you guys mentioned Costco -- I happen to have written a blog post on exactly that: https://popgot.com/blog/retailer-comparison Surprisingly, Costco does not win most of the time, and especially if you are not brand loyal. Costco has famously low-margins, but it turns out that when you sort by price-per-unit they're ok, but not great.

@mynameisash I'm curious what you learned... maybe I can help more people learn that using Popgot data.


One thing to call out is that costco.com and in-person have different offerings (& prices) -- but you probably know that already.

I just dusted off my spreadsheet, and it's not as complete as I'd like it to be. I didn't normalize everything but did have many of the staples like milk and eggs normalized; some products had multiple units (eg, "bananas - each" vs "bananas - pound"); and a lot of my comparisons were done based on the store (eg, I was often comparing "Potatoes - 20#" at Costco but "Potatoes - 5#" at Target over time).

Anyway, Costco didn't always win, but in my experience, they frequently did -- $5 peanut butter @ Costco vs $7.74 @ Target based on whatever size and brand I got, which is interesting because Costco doesn't have "generic" PB, whereas Target has much cheaper Market Pantry, and I tried to opt for that.


My family’s favorite experience has been that Costco usually doesn’t have the cheapest option but it has a good value option.

Our main example is something like pasta. Our local grocery stores all carry their own brand of dirt cheap pasta but it’s not as good as the more expensive pasta at Costco. Comparable pasta at the local grocer would be more expensive.

For items that are carried at both stores, Costco is usually no cheaper than the regular retail price and rarely much more expensive.


The quality difference I find between Costco and Walmart is significant, even if the price is not that different.


I'm so glad you like it!

We have historical price tracking in the database, but haven't exposed it as a product yet. What do you have in mind / what would you use it for?


There is a project linked to the Open Food Facts nonprofit of collecting prices of any products (food or other) with bar codes https://prices.openfoodfacts.org/about. They have a system for automatic price detection from labels and working on one from receipts.


I like that you have the ability to exclude on some dimension (eg, I don't use Amazon.com). Do you have or are you considering adding more retailers beyond the four you mentioned? For example, I buy a lot of unroasted coffee from sweetmarias.com, and excluding Amazon from Popgot results eliminates all but one listing (from Walmart).


Ah, hell yeah! My buddy on this project has been itching to add sweetmarias.com ... he just needed this as an excuse.

So yeah, we'll add it. If you shoot me an email (or post it here?) to chris @ <our site>.com I'll send you a link when it's done. Should take a day or two.


Cool project!

I run tech for a reverse logistics business buying overstock from Costco/Target/Walmart and we’re building a similar system for recognizing and pricing incoming inventory. I sent an email a few days ago to see if you might be open to chatting.

It would be great to compare notes or explore ways to collaborate. Totally understand if things are busy!


God tier filtering. Do you mind sharing how you integrated AI into the filter system? Your "flagging peanut butter" example also makes me wonder if the LLM is tagging the product with a large number of attributes on each run so it's not prohibitively expensive.


Cool! I hope it's coming to Japan (I live) near future.


It’s interesting to see how differentiable logic/binary circuits can be made cheap at inference time.

But what about the theoretical expressiveness of logic circuits vs baselines like MLPs? (And then of course compared to CNNs and other kernels.) Are logic circuits roughly equivalent in terms of memory and compute being used? For my use case, I don’t care about making inference cheaper (eg the benefit logical circuits brings). But I do care about the recursion in space and time (the benefit from CAs). Would your experiments work if you still had a CA, but used dumb MLPs?


Well, with all 16 logic gates available, they can express all Boolean circuits (you could get that even with NAND or NOR gates, of course, if you are working with arbitrary as opposed to fixed connectivity). And so you could have a 32 bit output vector which could be taken as a float (and you could create any circuit that computes any bitwise representation of a real).

As for efficiency, it would depend on the problem. If you're trying to learn XOR, a differentiable logic gate network can learn it with a single unit with 16 parameters (actually, 4, but the implementation here uses 16). If you're trying to learn a linear regression, a dumb MLP would very likely be more efficient.


A MLP must be compilable to some arrangement of logic gates, so you could always try a tack like initializing everything as randomly-wired/connected MLPs, and perhaps doing some pretraining, before compiling to the logic gate version and training the logic gates directly. Or take the MLP random initialization, and imitate its distributions as your logic gate distribution for initialization.


This looks great; very useful for (example) ranking outputs by confidence so you can do human reviews of the not-confident ones.

Any chance we can get Pydantic support?


Fyi logprobs !== confidence.

If you run "bananas,fishbowl,phonebook," and get {"sponge": 0.76}

It doesn't mean that "placemat" was the 76% correct answer. Just that the word "sponge" was the next most likely word for the model to generate.


Actually, OpenAI provides Pydantic support for structured output (see client.beta.chat.completions.parse in https://platform.openai.com/docs/guides/structured-outputs).

The library is compatible with that but does not use Pydantic further than that.


Right the hope was to go further. E.g. if the input is:

```

class Classification(BaseModel):

    color: Literal['red', 'blue', 'green']
```

then the output type would be:

```

class ClassificationWithLogProbs(BaseModel):

    color: Dict[Literal['red', 'blue', 'green'], float]
```

Don't take this too literally; I'm not convinced that this is the right way to do it. But it would provide structure and scores without dealing with a mess of complex JSON.


but this ultimately just converts to json schema, or the openai function calling definition format.

One question I always had was what about the descriptions you can attach to the class and attributes? ( = Field(description=...) in pydantic) is the model made aware of those descriptions?


Like other comments, I was also initially surprised. But I think the gains are both real and easy to understand where the improvements are coming from.

Under the hood Reflection 70B seems to be a Llama-3.1 finetune that encourages the model to add <think>, <reflection> and <output> tokens and corresponding phases. This is an evolution of Chain-of-Thought's "think step by step" -- but instead of being a prompting technique, this fine-tune bakes examples of these phases more directly into the model. So the model starts with an initial draft and 'reflects' on it before issuing a final output.

The extra effort spent on tokens, which effectively let the model 'think more' appears to let it defeat prompts which other strong models (4o, 3.5 Sonnet) appear to fumble. So for example, when asked "which is greater 9.11 or 9.9" the Reflection 70b model initially gets the wrong answer, then <reflects> on it, then spits the right output.

Personally, the comparison to Claude and 4o doesn't quite seem apples-to-apples. If you were to have 4o/Claude take multiple rounds to review and reflect on their initial drafts, would we see similar gains? I suspect they would improve massively as well.


https://huggingface.co/mattshumer/Reflection-70B says system prompt used is:

   You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.
Also, only "smarter" models can use this flow, according to https://x.com/mattshumer_/status/1831775436420083753


> Personally, the comparison to Claude and 4o doesn't quite seem apples-to-apples. If you were to have 4o/Claude take multiple rounds to review and reflect on their initial drafts, would we see similar gains? I suspect they would improve massively as well.

They may already implement this technique, we can't know.


Claude 3.5 does have some "thinking" ability - I've seen it pause and even say it was thinking before. Presumably this is just some output it decides not to show you.


THIS!!!!!!! People act like Claude and 4o are base models with no funny business behind the scenes, we don't know just how much additional prompt steps are going on for each queue, all we know is what the API or Chat interface dump out, what is happening behind that is anyones guess.. The thinking step and refinement steps likely do exist on all the major commercial models. It's such a big gain for a minimal expenditure of backend tokens, WTF wouldn't they be doing it to improve the outputs?


Well they can't do a /lot/ of hidden stuff because they have APIs, so you can see the raw output and compare it to the web interface.

But they can do a little.


As if they couldn’t postprocess the api output before they send it to the client…


No, I mean they sell API access and you can query it.


That's only in the web version, it's just that they prompt it to do some CoT in the antThinking XML tag, and hide the output from inside that tag in the UI.


The API does it too for some of their models in some situations.


Interesting, is there any documentation on this or a way to view the thinking?


I suspect GPT4o already has training for CoT. I've noticed it often responds by saying something like "let's break it down step by step". Or maybe it's the system prompt.


I am not sure, but you seem to be implying that the Reflection model is running through multiple rounds? If so, that is not what is happening here. The token generation is still linear next token prediction. It does not require multiple rounds to generate the chain of thought response. It does that in one query pass.

I have been testing the model for the last few hours and it does seem to be an improvement on LLAMA 3.1 upon which it is based. I have not tried to compare it to Claude or GPT4o because I don't expect a 70b model to outperform models of that class no matter how good it is. I would happy to be wrong though...


I had a similar idea[0], interesting to see that it actually works. The faster LLM workloads can be accelerated, the more ‘thinking’ the LLM can do before it emits a final answer.

[0]: https://news.ycombinator.com/item?id=41377042


Further than that, it feels like we could use constrained generation of outputs [0] to force the model to do X amount of output inside of a <thinking> BEFORE writing an <answer> tag. It might not always produce good results, but I'm curious what sort of effect it might have to convince models that they really should stop and think first.

[0]: https://github.com/ggerganov/llama.cpp/blob/master/grammars/...


Can we replicate this in other models without finetuning them ?



Apple infamously adds "DO NOT HALLUCINATE" to its prompts.


Huh ? Source please (this is fascinating)



what's our estimate of the cost to finetune this?


I don't know the cost, but they supposedly did all their work in 3 weeks based on something they said in this video: https://www.youtube.com/watch?v=5_m-kN64Exc


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: