More

whytai · 2024-11-14T22:45:57 1731624357

I really enjoy the obsidian daily notes feature for this [1]. It's a dedicated button to create a new note with a title of your choosing. I typically do YYYY-MM-DD d, so 2024-12-1 mon.

I'm not sure about the time tracking though. Is this more for people working on contract for billing? I see the value in having the data but collecting the data seems difficult.

[1] https://help.obsidian.md/Plugins/Daily+notes

packetlost · 2024-11-14T23:38:23 1731627503

The task plugin for Obsidian allows tracking time to completion iirc. If you're billing hourly for clients or trying to use it as a stand-in for a stop-watch it could be useful. I personally don't use it though

whytai · on Nov 6, 2023

Every day this video ages more and more poorly [1].

categories of startups that will be affected by these launches:

- vectorDB startups -> don't need embeddings anymore

- file processing startups -> don't need to process files anymore

- fine tuning startups -> can fine tune directly from the platform now, with GPT4 fine tuning coming

- cost reduction startups -> they literally lowered prices and increased rate limits

- structuring startups -> json mode and GPT4 turbo with better output matching

- vertical ai agent startups -> GPT marketplace

- anthropic/claude -> now GPT-turbo has 128k context window!

That being said, Sam Altman is an incredible founder for being able to have this close a watch on the market. Pretty much any "ai tooling" startup that was created in the past year was affected by this announcement.

For those asking: vectorDB, chunking, retrieval, and RAG are all implemented in a new stateful AI for you! No need to do it yourself anymore. [2] Exciting times to be a developer!

[1] https://youtu.be/smHw9kEwcgM

[2] https://openai.com/blog/new-models-and-developer-products-an...

morkalork · on Nov 6, 2023

If you want to be a start-up using AI, you have to be in another industry with access to data and a market that OpenAI/MS/Google can't or won't touch. Otherwise you end up eaten like above.

dragonwriter · on Nov 7, 2023

> a market that OpenAI/MS/Google can't or won't touch.

But also one that their terms of service, which are designed to exclude the markets that they can't or won't touch, don't make it impractical for you to service with their tools.

Cali_cramoisie · on Nov 7, 2023

Or you can treat what OpenAI is doing like a commodity like AWS and leverage it to solve a meaningful problem.

ushakov · on Nov 6, 2023

We just launched our AI-based API-Testing tool (https://ai.stepci.com), despite having competitors like GitHub Co-Pilot.

Why? Because they lack specificity. We're domain experts, we know how to prompt it correctly to get the best results for a given domain. The moat is having model do one task extremely well rather than do 100 things "alright"

parkerhiggins · on Nov 6, 2023

Domain specialization could be the moat, not only in the business domain but the sheer cost of deployment/refinement.

Check out Will Bennett's "Small language models and building defensibility" - https://will-bennett.beehiiv.com/p/small-language-models-and... (free email newsletter subscription required)

vunderba · on Nov 7, 2023

If the primary value-proposition for your startup is just customized prompting with OpenAI endpoints, then unfortunately it's highly likely it could be easily replicated using the newly announced concept of GPTs.

esafak · on Nov 6, 2023

If you just launched it is too soon to speak.

ushakov · on Nov 6, 2023

Of course! Today our assumption is that LLMs are commodities and our job is to get the most out of them for the type of problem we're solving (API Testing for us!)

darkwater · on Nov 6, 2023

Sorry to be blunt but they can be totally right, if you do not succeed and have to shut down your startup.

ushakov · on Nov 6, 2023

It certainly will be a fun experience. But our current belief is that LLMs are a commodity and the real value is in (application-specific) products built on top of them.

Rastonbury · on Nov 7, 2023

Exactly, everyone is so pessimistic but for every AWS sku there is a billion dollar startup that leads that market.

sharemywin · on Nov 6, 2023

Time will tell

Rastonbury · on Nov 7, 2023

Even if you aren't eaten, the use case will just be copied and run on the same OpenAI models by competitors, having good prompts is not good enough a moat. They win either way

renewiltord · on Nov 6, 2023

Writer.ai is quite successful, and is totally in another industry that Google+MS participate in.

ren_engineer · on Nov 6, 2023

depends on how much developers are willing to embrace the risk of building everything on OpenAI and getting locked onto their platform.

What's stopping OpenAI from cranking up the inference pricing once they choke out the competition? That combined with the expanded context length makes it seem like they are trying to lead developers towards just throwing everything into context without much thought, which could be painful down the road

klabb3 · on Nov 6, 2023

> depends on how much developers are willing to […] getting locked onto their platform.

I mean.. the lock in risks have been known with every new technology since forever now, and not just the risk but the actual costs are very real. People still buy HP printers with InkDRM and companies willingly write petabytes of data into AWS that they can’t even afford to egress at current prices.

To be clear, I despise this business practice more than most, but those of us who care are screaming into the void. People are surprisingly eager to walk into a leaking boat, as long as thousands of others are as well.

ky0ung · on Nov 7, 2023

Combination of 1) short-term business thinking (save $1 today = $1 more of EPS) and 2) fear of competition building AI products and taking share. thus rush to use first usable platform (e.g. openAI).

Psychology and FOMO plays interesting role in walking directly into a snake pit.

klabb3 · on Nov 7, 2023

100%.I was even gonna add to my comment that these psychological biases seem to particularly affect business people, but omitted to stay on point. I don’t think like that, but I also can’t say what works better on average, so I’ll try to stay humble.

Also, with AI there’s not really a “roll your own” option as with Cloud – the barrier of entry is gigantic, which obviously the VCs love, because as we all know they don’t like having to compete on price & quality on an open market.

keithwhor · on Nov 6, 2023

I suspect it is in OpenAI's interest to have their API as a loss leader for the foreseeable future, and keep margins slim once they've cornered the market. The playbook here isn't to lock in developers and jack up the API price, it's the marketplace play: attract developers, identify the highest-margin highest-volume vertical segments built atop the platform, then gobble them up with new software.

They can then either act as a distributor and take a marketplace fee or go full Amazon and start competing in their own marketplace.

stuckkeys · on Nov 6, 2023

Reminds me of that sales entrapment approach from cloud providers. “Here is your free $400, go do your thing” next thing you know you have build so much on there already that it is not worth the time and effort to try and allocate it regardless of the 2k bill increase -haha. Good times.

vikramkr · on Nov 7, 2023

i mean sure it's lock in, but it's lock in via technical superiority/providing features. Either someone else replicates a model of this level of capability or anyone who needs it doesn't really have a choice. I don't mind as much when it's because of technical innovation/feature set (as opposed to through your usual gamut of non-productive anti-competitive actions). If I want to use that much context, that's not openAIs fault that other folks aren't matching it - they didn't even invent transformers and it's not like their competitors are short on cash.

larodi · on Nov 6, 2023

Well, if said startups were visionaries, the could've known better the business they're entering. On the other hand - there are plenty of VC-inflated balloons, making lots of noise, that everyone would be happy to see go. If you mean these startups - well, farewell.

There's plenty more to innovate, really, saying OpenAI killed startups it's like saying that PHP/Wordpress/NameIt killed small shops doing static HTML. or IBM killing the... typewriter companies. Well, as I said - they could've known better. Competition is not always to blame.

karmasimida · on Nov 6, 2023

TBH those are low-hanging fruits for OpenAI. Much of the value still being captured by OpenAI's own model.

The sad thing is, GPT-4 is its own league in the whole LLM game, whatever those other startups are selling, it isn't competing with OpenAI.

teaearlgraycold · on Nov 6, 2023

I’ve been keeping my eye on a YC startup for the last few months that I interviewed with this summer. They’ve been set back so many times. It looks like they’re just “ball chasing”. They started as a chatbot app before chatgpt launched. Then they were a RAG file processing app, then enterprise-hosted chat. I lost track of where they are now but they were certainly affected by this announcement.

You know you’re doing the wrong thing if you dread the OpenAI keynotes. Pick a niche, stop riding on OpenAI’s coat tails.

riku_iki · on Nov 6, 2023

> - vectorDB startups -> don't need embeddings anymore

they don't provide embedings, but storage and query engines for embeddings, so still very relevant

> - file processing startups -> don't need to process files anymore

curious what is that exactly?..

> - vertical ai agent startups -> GPT marketplace

sure, those startups will be selling their agents on marketplace

dragonwriter · on Nov 7, 2023

> they don't provide embedings, but storage and query engines for embeddings, so still very relevant

But you don't need any of the chain of: extract data, calculate embeddings, store data indexed by embeddings, detect need to retrieve data by embeddings and stuff it into LLM context along with your prompt if you use OpenAI's Assistants API, which, in addition to letting you store your own prompts and manage associated threads, also lets you upload data for it to extract, store, and use for RAG on the level of either a defined Assistant or a particular conversation (Thread.)

visarga · on Nov 6, 2023

It's easy to host your query engine somewhere else and integrate it as a search function in chatGPT. Quite easy to switch providers of search.

obmelvin · on Nov 6, 2023

As in, use an existing search and call it via 'function calling' as part of the assistants routine - rather than uploading documents to the assistant API?

make3 · on Nov 6, 2023

they definitely do provide embeddings, https://openai.com/blog/new-models-and-developer-products-an... ctrl+f retrieval, "... won't need to ... compute or store embeddings"

riku_iki · on Nov 6, 2023

I mean embeddingsDB startups don't provide embeddings. They provide databases which allows to store and query computed embeddings (e.g. computed by ChatGPT), so they are complimentary services.

taf2 · on Nov 6, 2023

Yeah I still see a chat bot being able to look for related information in a database as useful. But I see it as just one of many tools a good chat experience will require. 128k context means for me there other applications to explore and larger tasks to accomplish with fewer api requests. Better chat history and context not getting lost

blibble · on Nov 6, 2023

HN is quite notorious for that Dropbox comment

I suspect that video is going to end up more notorious, it's even funnier given it's the VCs themselves

thisgoesnowhere · on Nov 6, 2023

I'm firmly in the camp that in a vacuum that comment looks dumb but the thread was actually great.

Those were valid concerns at the time and the market for non technical file storage like they were building was non existant.

Perfectly rational to be skeptical and Drew answered all his questions with well thought out responses.

TeMPOraL · on Nov 7, 2023

The infamous comment itself made sense in context, too.

arcanemachiner · on Nov 6, 2023

More context, please.

EDIT: I guess it's this:

https://news.ycombinator.com/item?id=8863#9224

blibble · on Nov 6, 2023

that's the one

lazzlazzlazz · on Nov 6, 2023

Embeddings are still important (context windows can't contain all data + memorization and continuous retraining is not yet viable), and vertical AI agent startups can still lead on UX.

dragonwriter · on Nov 7, 2023

Separate embedding DBs are less important if you are working with OpenAI, since their Assistants API exists to (among other things) let you bring in additional data and let them worry about parsing it, storing it, and doing RAG with it. Its like "serverless", but for Vector DBs and RAG implementations instead of servers.

Finbarr · on Nov 6, 2023

Context windows can't contain all data... yet.

Yadayadaaaa · on Nov 6, 2023

Just because something is great doesn't mean that others can't compete. Even a secondary good product can easily be successful due to a company having invested too much, not being aware of openai (ai progress in general), due to some magic integration, etc.

If it would be only me, no one would buy azure or aws but just gcp.

bluecrab · on Nov 6, 2023

Vector DBs should never have existed in the first place. I feel sorry for the agent startups though.

m3kw9 · on Nov 6, 2023

How does this absolve vectordbs

dragonwriter · on Nov 6, 2023

If you are using OpenAI, the new Assistants API looks like itnwill handle internally what you used to handle externally with a vector DB for RAG (and for some things, GPT-4-Turbo’s 128k context window will make it unnecessary entirely.) There are some other uses for Vector DBs than RAG for LLMs, and there are reasons people might use non-OpenAI LLMs with RAG, so there is still a role for VectorDBs, but it shrunk a lot with this.

oezi · on Nov 6, 2023

OpenAI is still way too expensive to run a corporate knowledge base on top

m3kw9 · on Nov 7, 2023

It’s more reliable than chatpdfs that relies on vector search. With vector db all you are doing is doing a fuzzy search and then sending in that relevant portion near that text and send it to a LLM model as part of a prompt. It misses info.

dragonwriter · on Nov 7, 2023

I'd be very surprised if the Assistants API is not doing RAG with a vector DB behind the scenes with the supplied files.

danielbln · on Nov 6, 2023

It doesn't, but semantic search is a lot less relevant if you can squeeze 350 pages of text into the context.

quinncom · on Nov 6, 2023

OpenAI charges for all those input tokens. If an app requires squeezing 350 pages of content in every request is going to cost more. Vector DB still relevant for cost and speed.

gk1 · on Nov 7, 2023

Besides the cost factor, stuffing the context window can actually make the results worse. https://www.pinecone.io/blog/why-use-retrieval-instead-of-la...

Der_Einzige · on Nov 6, 2023

Startups built around actual AI tools, like if one formed around automatic1111 or oogabooga, would be unaffected, but because so much VC money went to the wrong places in this space, a whole lot of people are about to be burned hard.

throwaway-jim · on Nov 6, 2023

damn hahaha it's oobabooga not oogabooga

yawnxyz · on Nov 6, 2023

i'm excited for the open source, local inferencing tech to catchup. The bar's been raised.

andrewjl · on Nov 7, 2023

None of those categories really fall under the second order category mentioned in the video. Using their analogy they all sound more like a mapping provider versus something like Uber.

felixding · on Nov 7, 2023

Offtopic. I find it's amusing that we not only have "chatGPT" but now also "vectorDB". Apple's influence is really strong.

mvkel · on Nov 6, 2023

Probably best not to make your company about features that a frontier AI company would have a high probability of adding in the next 6-12 months.

bilsbie · on Nov 6, 2023

Why don’t you need embedding?

monkeydust · on Nov 6, 2023

You might. Depends what your trying to do. For RAG seems like they can 'take care of it' but embeddings also offer powerful semantic search and retrieval ignoring LLMs.

colordrops · on Nov 6, 2023

I haven't been paying attention, why are embeddings not needed anymore?

sharemywin · on Nov 6, 2023

Retrieval: augments the assistant with knowledge from outside our models, such as proprietary domain data, product information or documents provided by your users. This means you don’t need to compute and store embeddings for your documents, or implement chunking and search algorithms. The Assistants API optimizes what retrieval technique to use based on our experience building knowledge retrieval in ChatGPT.

The model then decides when to retrieve content based on the user Messages. The Assistants API automatically chooses between two retrieval techniques:

it either passes the file content in the prompt for short documents, or performs a vector search for longer documents Retrieval currently optimizes for quality by adding all relevant content to the context of model calls. We plan to introduce other retrieval strategies to enable developers to choose a different tradeoff between retrieval quality and model usage cost.

sjnair96 · on Nov 6, 2023

Really cool to see the Assistants API's nuanced document retrieval methods. Do you index over the text besides chunking it up and generating embeddings? I'm curious about the indexing and the depth of analysis for longer docs, like assessing an author's tone chapter by chapter—vector search might have its limits there. Plus, the process to shape user queries into retrievable embeddings seems complex. Eager to hear more about these strategies, at least what you can spill!

riku_iki · on Nov 6, 2023

> or performs a vector search for longer documents

so, clients upload all their docs to OpenAI database?..

karmasimida · on Nov 6, 2023

Embedding is poor man's context length increase. It essentially increases your context length but with loss.

There is a cost argument to make still, embedding-based approach will be cheaper and faster, but worse result than full text.

That being said, I don't see how those embedding startups compete with OpenAI, no one will be able to offer better embedding than OpenAI itself. It is hardly a convincing business.

The elephant in the room is the open source models aren't able to match up to OpenAI models, and it is qualitative, not quantitive.

estreeper · on Nov 6, 2023

For embeddings specifically, there are multiple open source models that outperform OpenAI’s best model (text-embedding-ada-002) that you can see on the MTEB Leaderboard [1]

> embedding-based approach will be cheaper and faster, but worse result than full text

I’m not sure results would be worse, I think it depends on the extent to which the models are able to ignore irrelevant context, which is a problem [2]. Using retrieval can come closer to providing only relevant context.

1. https://huggingface.co/spaces/mteb/leaderboard

2. https://arxiv.org/abs/2302.00093

karmasimida · on Nov 7, 2023

> on the MTEB Leaderboard

The point isn't about leaderboard. With increasing context length, the question is on whether we need embeddings or not. With longer context length, embeddings is no longer a necessity, and it lowers its value.

civilitty · on Nov 7, 2023

For more trivial use cases, sure, but not for harder stuff like working with US law and precedent.

The US Code is on the order of tens of millions of tokens and I shudder to think how many billions of tokens make up all the judicial opinions that set or interpreted precedent.

lazzlazzlazz · on Nov 6, 2023

OP is incorrect. Embeddings are still needed since (1) context windows can't contain all data and (2) data memorization and continuous retraining is not yet viable.

zwily · on Nov 6, 2023

But the common use case of using a vector DB to pull in augmentation appears to now be handled by the Assistants API. I haven't dug into the details yet but it appears you can upload files and the contents will be used (likely with some sort of vector searching happening behind the scenes).

nextworddev · on Nov 6, 2023

"yet"

coding123 · on Nov 6, 2023

It's also much slower. LLMs are generating text token at a time. That's not very good for search.

Pre-search tokenization however, probably a good fit for LLMs.

emadabdulrahim · on Nov 6, 2023

I believe their API can be stateful now: https://openai.com/blog/new-models-and-developer-products-an...

treprinum · on Nov 6, 2023

There is not much info about retrieval/RAG in their docs at the moment - did you find any example on how is the retrieval supposed to work and how to give it access to a DB?

baq · on Nov 6, 2023

Checking hn and product hunt a few times a week gives you most of that awareness and I don’t need to remind you about the person behind hn ‘sama’ handle.

seydor · on Nov 6, 2023

more startups should focus on foundation models, it's where the meat is. Ideally there won't be a need for any startup as the platform should be able to self-build whatever the customer wants.

echelon · on Nov 6, 2023

We don't want Open AI to win everything.

cityzen · on Nov 6, 2023

Where is the part about embeddings?

atleastoptimal · on Nov 6, 2023

There will be a lot of startups who rely on marketing aggressively to boomer-led companies who don't know what email is and hoping their assistant never types OpenAI into Google for them.

whytai · on Sept 1, 2023

It's certainly true that most people are deficient in potassium. The daily recommended dose for males is over 3 grams per day![1]

To make matters worse, the FDA limits the amount of potassium that can be present in supplements to 100mg[2]. So good luck taking 30 supplements to meet your daily requirements!

One option Id like to advertise is salt alternatives at grocery stores which are filled with potassium, some with at least 800mg per tsp. This can be another way to supplement potassium and magnesium in the diet [2]

[1] https://ods.od.nih.gov/factsheets/Potassium-HealthProfession...

[2] https://www.health.harvard.edu/staying-healthy/should-i-take...

Eric_WVGG · on Sept 1, 2023

I participated in a "crank science" study, where a bunch of us took salt alternative daily to see if the added potassium explained the success of the so-called "potato diet".

Salt alternatives taste like sipping a freshly blended nine volt battery. My big contribution to the project was discovering that the stuff is mostly tolerable when dissolved in cranberry juice.

https://slimemoldtimemold.com/2022/12/20/people-took-some-po...

arcanemachiner · on Sept 1, 2023

I mix potassium-based salt alternative into my food and I barely notice it.

Fun fact: The potassium salt is mildly radioactive: http://large.stanford.edu/courses/2015/ph241/gilpin1/

tomjakubowski · on Sept 1, 2023

> To make matters worse, the FDA limits the amount of potassium that can be present in supplements to 100mg[2]. So good luck taking 30 supplements to meet your daily requirements!

You don't need nearly 30 pills, even with a poor diet. Almost all foods, even junk foods, have some amount of potassium. Per 100g (3.5oz), a random sampling from my typical snacks, lunches and breakfasts:

  chicken breast       223mg
  cooked white rice    35mg
  cooked pasta         24mg
  1 large egg          69mg
  whole milk           132mg
  apple                107mg
  bagel                165mg
  almonds              705mg
  banana               358mg
  Miss Vickie's Sea Salt & Vinegar Flavored Potato Chips  1260mg

Potato chips are high in potassium and have a close to ideal 2:1 ratio of potassium to sodium. Superfood!

SirMaster · on Sept 11, 2023

Lol why potato chips? Why not just potatoes?

I eat potatoes most days, usually just boiled, sometimes baked to help get more potassium.

tomcam · on Sept 1, 2023

No one never explains how to get 3000 mg of potassium a day and potassium is usually a very small part of multivitamin supplements. It makes me slightly skeptical of that number.

slashdev · on Sept 1, 2023

I used to supplement with potassium salt powder. A teaspoon was like 5000-10000mg if I’m remembering right. I was very uncomfortable having it in the house, I think a couple tablespoons would likely stop the heart of an adult. I think it’d be very unpleasant to consume that much, but I didn’t test the theory.

I left a 1/8 tsp permanently in the bottle so there can be no mistake about what does to use.

ska · on Sept 1, 2023

You are supposed to get it through eating a varied diet including potassium rich foods every day, typically.

This is part of the "5 servings per day" idea for fresh fruits and vegetables.

For example a banana and a cup of cooked spinach get you nearly 1/2 way there in 2 servings.

tomcam · on Sept 7, 2023

OK I clearly need to repeat my research. This gives me hope, thanks!

pkaye · on Sept 12, 2023

Potassium is in a lot of foods. I know this very well because I spend many years on dialysis due to kidney failure and had to do a lot of diet tracking so I didn't consume too much potassium or phosphorus because the kidneys are important in removing excess amounts. Most meats, vegetables and fruits, nuts legumes all have potassium of a varying amount.

Also the USDA maintains a database of complete nutrition of many common foods.

https://fdc.nal.usda.gov/

DesertVarnish · on Sept 1, 2023

It depends on what you eat. A person eating a lot of tomatoes, legumes, potatoes, and squash would hit it pretty easily, which is common for a lot of traditional diets (south asia, some parts of latin america, etc).

tomcam · on Sept 7, 2023

Thank you. I wasn’t able to come up with the right mix but other posts along with yours are telling me I should re-double my research.

kylecazar · on Sept 1, 2023

It's in a surprising amount of foods, people usually think bananas but it's in many fruits/vegetables, chicken, fish, in pretty high quantities

hinkley · on Sept 1, 2023

Bananas are for carbs and electrolytes. If you just want electrolytes, kiwis have a much higher ratio.

tomcam · on Sept 7, 2023

I’ll be darned. I missed that.

tomcam · on Sept 2, 2023

Now try getting the full daily amount in

vixen99 · on Sept 1, 2023

I wouldn't be. A lot of people have worked hard on these numbers over the years.

I'm guessing the limitation on supplements assumes a good diet (no shortage of advice from USDA on 'good' and potassium sources) which typically provides the recommended intake. However there can of course be days that's not possible.

francisofascii · on Sept 1, 2023

For fun, I asked Chat GPT to come up with a daily diet to hit the DRV number. It had a banana, potatoes, chicken, salmon, black beans, spinach, broccoli, an avacado, and a few other things. So it is "doable".

modeless · on Sept 1, 2023

Does it satisfy all other dietary recommendations though? I saw some people saying that it's actually impossible to satisfy the potassium value simultaneously with all the other dietary recommendations, using natural food.

hinkley · on Sept 1, 2023

Potassium chloride is what they use to stop mammal hearts in euthenasia/capital punishment.

I think the FDA might have some reasons to prefer potassium be spread out across meals instead of taken in a lump sum all at once (think also, children eating vitamins as candy. I know someone who almost died of iron poisoning as a child before they made the pills bitter)

0cf8612b2e1e · on Sept 1, 2023

That’s intravenous KCL. Significantly different absorption than taking it orally. Wikipedia is showing a roughly 100x difference in the LD50 of oral vs intravenous.

If you tried to ingest a lethal dose of KCL, I would put huge odds on you first retching out you guts. ~190 grams orally to hit the LD50

philipkglass · on Sept 1, 2023

You can also buy potassium chloride water softener pellets. On the Lowe's site I currently see a 40 pound sack for $40. I used to grind them up in a coffee maker for making my own custom lower-sodium salt blend for cooking.

foobarian · on Sept 1, 2023

The potassium salt tastes disgusting (try a pinch!). That is my main blocker for having more of it via this route.

whytai · on July 17, 2023

How do you guys do the static analysis on the queries? I notice you support dbt, bigquery etc, but all of our companies pipelines are in airflow. That makes the static analysis difficult because we're dealing with arbitrary python code that programmatically generates queries :).

Any plans to support airflow in the future? Would love to have something like this for our companies 500k+ airflow jobs.

ersatz_username · on July 17, 2023

It depends a bit on your stack. Out of the box it does a lot with the metadata produced by the tools your using. With something like dbt we can do things like extract your test assertions while for postgres we might use database constraints.

More generally we can embed the transformation logic of each stage of your data pipelines into the edge between nodes (like two columns). Like you said, in the case of SQL there are lots of ways to statically analyze that pipeline but it becomes much more complicated with something like pure python.

As an intermediate solution you can manually curate data contracts or assertions about application behavior into Grai but these inevitably fall out of sync with the code.

Airflow has a really great API for exposing task level lineage but we've held off integrating it because we weren't sure how to convert that into robust column or field level lineage as well. How are y'all handling testing / observability at the moment?

whytai · on July 18, 2023

For testing:

- we have a dedicated dev environment for analysts to experience a dev/test loop. None of the pipelines can be run locally unfortunately.

- we have CI jobs and unit tests that are run on all pipelines

Observability:

- we have data quality checks for each dataset, organized by tier. This also integrates with our alerting system to send pagers when data quality dips.

- Airflow and our query engines hive/spark/presto each integrate with our in-house lineage service. We have a lineage graph that shows which pipelines produce/consume which assets but it doesn't work at the column level because our internal version of Hive doesn't support that.

- we have a service that essential surfaces observability metrics for pipelines in a nice ui

- our airflow is integrated with pagerduty to send pagers to owning teams when pipelines fail.

We'd like to do more, but nobody has really put in the work to make a good static analysis system for airflow/python. Couple that with the lack of support for column level lineage OOTB and it's easy to get into a mess. For large migrations (airflow/infra/python/dependecy changes) we still end up doing adhoc analysis to make sure things go right, and we often miss important things.

Happy to talk more about this if you're interested.

whytai · on July 12, 2023

I always thought bee communication through "dancing" was visual. On reading more, it seems the bees build up electric charge which interacts with the antennae on other bees.

Excerpt [1]:

> Honeybees accumulate an electric charge during flying. Bees emit constant and modulated electric fields during the waggle dance. Both low- and high-frequency components emitted by dancing bees induce passive antennal movements in stationary bees. The electrically charged flagella of mechanoreceptor cells are moved by electric fields and more strongly so if sound and electric fields interact. Recordings from axons of the Johnston's organ indicate its sensitivity to electric fields. Therefore, it has been suggested that electric fields emanating from the surface charge of bees stimulate mechanoreceptors and may play a role in social communication during the waggle dance.

[1] - https://en.wikipedia.org/wiki/Waggle_dance#Mechanism

pulpfictional · on July 13, 2023

This is amazing.

whytai · on June 22, 2023

The boat may not have heard it, but the navy certainly did: https://news.ycombinator.com/item?id=36439661

whytai · on Nov 13, 2022

Dubai is one of the few counties that lacks an extradition treaty with the US[1] and has a high standard of living.

[1] https://en.wikipedia.org/wiki/List_of_United_States_extradit...

notch656a · on Nov 13, 2022

What happens if the US cancels his passport? It's my understanding Cody Wilson was kicked out of Taiwan, which lacked formal extradition agreement, by cancelling his passport which nullified his legal presence in the country.

wmf · on Nov 13, 2022

If SBF is smart he has two or three passports like Peter Thiel and Justin Sun.

whytai · on Nov 7, 2022

Even more incredible is that his own advisor refused to write him letters of recommendation upon graduation [1]

  After graduation, Zhang had trouble finding an academic position. In a 2013 interview with Nautilus magazine, Zhang said he did not get a job after graduation. "During that period it was difficult to find a job in academics. That was a job market problem. Also, my advisor [Tzuong-Tsieng Moh] did not write me letters of recommendation." ... Moh claimed that Zhang never came back to him requesting recommendation letters. In a detailed profile published in The New Yorker magazine in February 2015, Alec Wilkinson wrote Zhang "parted unhappily" with Moh, and that Zhang "left Purdue without Moh's support, and, having published no papers, was unable to find an academic job".

  In 2018, responding to reports of his treatment of Zhang, Moh posted an update on his website. Moh wrote that Zhang "failed miserably" in proving Jacobian conjecture, "never published any paper on algebraic geometry" after leaving Purdue, and "wasted 7 years of his [Zhang's] own life and my [Moh's] time".

1. https://en.wikipedia.org/wiki/Yitang_Zhang

gavagai691 · on Nov 7, 2022

Yes, if you want to see something incredible (in both the literal sense and the usual sense), read https://www.math.purdue.edu/~ttm/ZhangYt.pdf (by Moh).

lordnacho · on Nov 7, 2022

> For some 10 years, I had recommended 100 mainland Chinese students to the department and all accepted by the department. I am always indebt to the trust of my judgements by the department. Only very few of them misbehaved, bit the hands which fed them, none of them intended to murder their parents/friends, almost all of them performed well and became well-liked.

No murderers, great success!

yding · on Nov 7, 2022

It's a reference to Brendt Christensen.

MikePlacid · on Nov 8, 2022

I’ve looked it up and wished I had not.

( Interestingly, every summary of the case in media and Wiki stops listing the evidence against him at his secretly taped confession to a girlfriend - confession that included some things absolutely not confirmed. The most convincing evidence to my eyes is the victim’s DNA in the blood found under the carpet and elsewhere there it has survived cleaning efforts. This is not mentioned anywhere except in the court recordings: https://news.wttw.com/sites/default/files/article/file-attac... . Kinda sad what is convincing these days ).

smacke · on Nov 8, 2022

Probably a reference to Yongfei Ci actually (https://dailyillini.com/news-stories/2014/06/20/yongfei-ci-s...)

UIUC is not having a great track record wrt grad student murderers o_O

eclarkso · on Nov 8, 2022

I doubt it, as the non-bolded portions were written in 2013, 4 years prior to the murder of Yingjing Zhang.

lawrenceyan · on Nov 8, 2022

It's tragic how the relationship dynamics between Moh and Zhang almost resulted in the total write-off of Zhang and a loss of genius/talent, with nothing left but bitterness and animosity.

I'm glad Zhang was able to find success despite his initial setbacks and from what it seems like in his recent interviews also let go of his bitterness/resentment (holding something like that in your heart can only ever hold you back). And though the power dynamics here were clearly unequal, I don't think it's fair to blame Moh entirely for what happened at Purdue.

I think it's important to remember Moh is also human with all the complexity that comes along with that. In reading his published statement, even though there is no direct apology to Zhang, I sense that he does genuinely regret how things turned out.

Perhaps one day, Zhang and Moh will be able to meet again and resolve/rekindle their relationship.

m_nyongesa · on Nov 7, 2022

In the earlier version I saw (I guess it consists of the non-bold parts), he didn't mention as much negative stuff about Zhang. His claim that Zhang "want to be famous all the time" I regard with suspicion.

ummonk · on Nov 7, 2022

Yeah I started reading that from the Wiki citation. Yikes. Academia is brutal.

Firmwarrior · on Nov 7, 2022

Man, it's so weird and pathetic

All of these guys are probably a hundred times smarter than me or most of the other code monkeys working for the FANGMAN, but they're all squabbling over little 5-figure scraps of grant money.

whytai · on Nov 7, 2022

https://en.wikipedia.org/wiki/Sayre%27s_law

  In any dispute the intensity of feeling is inversely proportional to the value of the issues at stake

nequo · on Nov 7, 2022

I think it’s their egos they’re squabbling over, not grant money.

bigbacaloa · on Nov 8, 2022

Zhang evidently doesn't care about money at all. The same is true for many professonal mathematicians. Caring about money makes it difficult (not impossible) to do anything deep.

warbler73 · on Nov 8, 2022

Wow thanks so much. That is indeed "incredible" on many levels.

whytai · on May 4, 2021

> It can't be both an ethical stand an a convenient way to increase profits.

Why not?

whytai · on May 1, 2021

Palmer Lucky?

nytesky · on May 1, 2021

Well, from Wikipedia his dad worked at a car dealership, but they lived quite well, with a stay-at-home mom, sailing lessons, and gobs of money on equipment. That sounds fairly well to do.

"As a child he was homeschooled by his mother, took sailing lessons,[7] and had an intense interest in electronics and engineering.[3][8] He took community college courses at Golden West College and Long Beach City College[5] beginning at the age of 14 or 15, and started attending courses at California State University, Long Beach[1] in 2010.[6] He wrote and served as online editor for the university's student-run newspaper, Daily 49er.[9]

During his childhood and teenage years, Luckey experimented with a variety of complex electronics projects including railguns, Tesla coils, and lasers, with some of these projects resulting in serious injuries.[1] He built a PC gaming "rig" worth tens of thousands of U.S. dollars[8] with an elaborate six-monitor setup.[10] His desire to immerse himself in computer-generated worlds led to an obsession with virtual reality (VR)."

exolymph · on May 1, 2021

What, middle class normalcy doesn't count, you're specifically looking for destitute children who went on to make millions? Seeking a Horatio Alger story? It's not exactly surprising that grinding poverty doesn't yield a lot of unicorn founders. Not sure why you'd think that was a reasonable expectation or litmus test.