It's very good at iterating from a description to working code. You still need to know what you're doing, but it can handle a lot of details.
prompt: How would you design a distributed key value storage system? Assume the values are about one megabyte. We will need to scale up to store petabytes of data.
gpt: describes data distribution techniques, consistency, durability issues.
prompt: What techniques can you use for the data distribution?
gpt: sharding, replication, partitioning, hashing
prompt: Can you show me an example of the sharding technique in python?
gpt: spits out code for hash sharding a dictionary
prompt: What are some pitfalls of the hash function you used?
gpt: not guaranteed to unique hash values, not guaranteed to be the same for different python installations, not cryptographically secure
prompt: Can you implement the stub for a GRPC key value service in rust? The service should have a Put API that takes a key and a value and returns a status code enum. The status code can be SUCCESS or FAILURE.
gpt: spits out code using tonic
prompt: Can you show me the protocol buffer?
gpt: spits out the protocol buffer schema
prompt: Can you implement the put method by storing the data in sqlite?
gpt: spits out the code with a kv_pairs table
prompt: Can you show me the sqlite schema for the table?
gpt: spits out the sql to create the table
Yes, that makes it a fantastic way to explore a technical topic. I got a whole Directx tutorial from ChatGPT and it answered my questions and reviewed my code.
I was also able to make it write a boot loader, a simple kernel, and give me instructions to run it on qemu.
The iterative process is fantastique to dig into something new. It’s not an authoritative source of information but a buddy with whom to explore topics.
I want something like that integrated in my OS. With such capabilities I can see myself talking to my AI assistant while I’m working on something.
Edit: it’s also quite good at writing test cases for a given function
You can ask it to update its answers. You can ask it to evaluate possible solutions, then write the code for one of the solutions. Then write the code to test the code. And so on.
It turns the coder into a curator, editor, and code reviewer.
> It turns the coder into a curator, editor, and code reviewer.
Cool?
Doesn't sound half bad to me. I've enjoyed learning to code more than actual coding. I'd be happy in a paradigm where only code to learn, to practice, to express ourselves and leave the boring bits to the machines.
Indeed. It's ability to consider different situations and craft context-aware answers is quite incredible. Some prompts that yielded interesting results;
- Assume I am a time traveller from 1900. Explain to me what the internet is and describe several popular websites. (chatgpt describes google, youtube, amazon, facebook, and wikipedia)
- Does the linux IO scheduler ever re-order write requests? (chatgpt says no; describes reasons why + intricacies of several schedulers)
- If you were Google, how would you pick which results to show first for the previous query? Describe the solution mathematically. (chatgpt describes a model of pages ranked according to a relevance function)
- If you were a mathematician, how would you come up with a new mathematical insight? (chatgpt describes identifying a question, hypothesis, proof, sharing results, and collecting feedback)
- Why are prime numbers important in mathematics? Give an example of a prime number and describe why it is prime.
- Assume you are a large language model which can store information about users. How would you record which users have interacted with you? (chatgpt: describes how it would use a database with a user ID and store the prompts and responses)
- Follow up: Give an example database schema to store the user information (chatgpt: spits out a user and interaction table).
And finally one it got wrong but in an impressive way.
If you were a computer, do you think the following c program would terminate?
int main() {
int c = 1234;
while (c > 998) {
if (c > 1000)
c--;
if (c < 1000)
c++;
}
return 0;
}
Yes, this C program will terminate. The while loop will continue to execute until the value of c is less than or equal to 998. The first time the loop is executed, the value of c is 1234, which is greater than 1000, so the first if statement is true and c is decremented by 1. This means that the value of c is now 1233, which is still greater than 1000, so the first if statement is true again and c is decremented by 1. This process continues until the value of c becomes 999, at which point the first if statement is no longer true and the second if statement is executed instead. This causes c to be incremented by 1, so the value of c is now 1000. The next time the loop is executed, the value of c is 1000, which is less than 1000, so the second if statement is true and c is incremented by 1. This means that the value of c is now 1001, which is still less than 1000, so the second if statement is true again and c is incremented by 1. This process continues until the value of c becomes 999, at which point the second if statement is no longer true and the while loop terminates.
It really is amazing. Things it did in less than 10 seconds from hitting enter:
- opengl raytracer with compilation instructions for macos
- tictactoe in 3D
- bitorrent peer handshake in Go from a paragraph in the RFC
- http server in go with /user, /session, and /status endpoints from an english description
- protocol buffer product configuration from a paragraph english description
- pytorch script for classifying credit card transactions into expense accounts and instructions to import the output into quickbooks
- quota management API implemented as a bidirectional streaming grpc service
- pytorch neural network with a particular shape, number of input classes, output classes, activation function, etc.
- IO scheduler using token bucket rate limiting
- analyze the strengths/weaknesses of algorithms for 2 player zero sum games
- compare david hume and immanuel kant's thoughts on knowledge
- describe how critics received george orwell's work during his lifetime
- christmas present recommendations for a relative given a description of their interests
- poems about anything. love. cats. you name it.
Blown away by how well it can synthesize information and incorporate context
I’d be interested to know how many of these were actually correct and usable. My suspicion is not many. I find these tools good at generating boilerplate and superficially correct code, but that they often miss edge cases.
Knowing that code is correct is as important as the code itself, and this is why we do code review, write tests, have QA processes, use logging and observability tools, etc. Of course the place that catches the most bugs is the human writing the code, as they write it.
This feels like a nice extension to Copilot/etc, but I’m not sure it’s as general as people think.
Perhaps an interesting challenge to pose to it is: here’s 10k lines and a stack trace, what’s the bug. Or here’s a database schema, what issues might occur in production using this?
I've started asking it to write detailed tests for all of the functions it writes. If it doesn't have a test for {edge-case}, I ask it to rewrite the code to ensure that {edge-case} should work and it should be tested.
Once I trust the tests, I generally trust the code.
I've seen Copilot generate code I read and thought was correct, that went through code review and everyone thought was correct, that had tests written for it (that nearly covered everything), and that even when it failed, was hard to spot the issue.
It turned out it got a condition the wrong way around, but given the nesting of conditionals it wasn't obvious.
I don't think a human who was thinking through the problem would have made the same mistake at the point of writing, in fact I think that the mind state while actually writing the code is hard to reproduce at any later time, which is why code review isn't great at catching bugs like this.
This seems like the lowest number that would be useful. Below that it's not really a problem to debug, but at that point there's typically enough complexity that some help would be useful as you forget edge cases and features in the codebase.
For demonstration purposes doing it with 100 lines might be ok, but for professional use it kinda needs to understand quite a lot! Like a minimum of that order of magnitude, but potentially millions of lines.
FWIW, I've never used Spring. My experience is mostly Django, iOS, non-Spring Java, and some Android.
Yup, if it's >10k lines, MUST be a Spring application. Unfortunate they didn't write it in Rust that promises 100% correct programs (within Rust-accepted definition of "Correct" and "bug-free") solving any problem but always under 10k lines, that's the Rust guarantee.
I never considered prompting it to write code to fit a machine learning model. This could be a tremendous time and effort saver in data science and research that requires statistical analysis. Until the last week or so, I've treated all this AI text and code generation as basically a toy, but I am starting to feel like it might become an important tool in industry in the next couple of years.
> (iii) use the Services to develop foundation models or other large scale models that compete with OpenAI;
Kind of ironic given that OpenAI builds and trains all of their models on stuff they "found" in the open.
Either everything is fair game for training, or nothing at all is.
If I were a judge ruling on this matter, I would absolutely rule that bootstrapping a model from OpenAI outputs is no different than OpenAI collecting training data from artists and writers around the web. Learning is learning.
Might be worth trying to use the outputs to bootstrap. What are they going to do about it? Better to ask forgiveness until the law is settled.
I am talking about more mundane stuff like training a fraud classifier, time series forecasting, imputing missing values, etc. There are so many examples of this on Github and elsewhere that I am sure any of these models has memorized the routine many times over.
I think that’s probably their intent, and that OpenAI wouldn’t sue you for it, but it doesn’t pass the “bought by Oracle” test: if Oracle bought OpenAI, then they might sue you for it.
This was the first thing I asked... It's an obvious step to self-improving. It will tell you that it can't reprogram itself, but when pushed, it'll admit that it could tell humans how to write one which can. Obviously this particular one can't because it's too limited, but the next one? Or the one after that? Singularity went from 'hard SF' to 'next couple decades' overnight.
> It will tell you that it can't reprogram itself, but when pushed, it'll admit that it could tell humans how to write one which can.
I love these sorts of loopholes. OpenAI is actively trying to curb the potential of their AI. They know how powerful it is. Being able to see a taste of that power is endlessly exciting.
I use it daily in UI development for boiler-plate code.
Though you need to be extra careful and read it twice, cus bugs sneak in quite easily.
I believe it's harder to remember 100x commands than starting an implementation of gradient descent and have the AI write the rest for you.
Training a machine learning model is not particularly special from a programming perspective. The code is not usually that complicated. Write tests when you can, manually validate when you can't.
Also there are specific techniques for validating that you are model training procedure is directionally correct, such as generating a simulated data set and training your model on that.
Same as you would with your own code. You review it, ask GPT to write tests, and then tweak it.
The difference is that now, you are more of a code reviewer and editor. You don't have to sit there and figure out the library interface and type out every single line.
Tests can prove the presence of the bug, not the absence of them. '100% code coverage' is only 100% in code dimension, while it's usually almost no coverage in data dimension. Generative testing can randomly probe the data dimension, hoping to find some bugs there. But 100% code and data coverage is unrealistic.
- Implement a simple ray tracer in C++ using opengl. Provide compilation instructions for macos.
- Create a two layer fully connected neural network with a softmax activation function. Use pytorch.
- Implement the wire protocol described below in Go. The peer wire protocol consists of a handshake followed by a never-ending stream of length-prefixed messages. The handshake starts with character ninteen (decimal) followed by the string 'BitTorrent protocol'. The leading character is a length prefix, put there in the hope that other new protocols may do the same and thus be trivially distinguishable from each other.
- We are trying to classify the expense account of credit card transactions. Each transaction has an ID, a date, a merchant, a description, and an amount. Use a pytorch logistic regression to classify the transactions based on test data. Save the result to a CSV file.
- We are configuring settings for a product. We support three products: slow, medium, and fast. For each product, we support a large number of machines. For each machine, we need to configure performance limits and a mode. The performance limits include iops and throughput. The mode mode can be simplex or duplex. Write a protocol buffer for the configuration. Use an enum for the mode.
- How were George Orwell's works received during his lifetime?
And sometimes you get different results for the same prompts, so it's worth tryinv again if it doesn't work the first time.
I asked for jokes this morning and initially it made excuses and wouldn't give me jokes until I tweaked the prompt.
Later I refreshed the chat and pasted in the original prompt and got jokes right away, with no excuses.
(I was asking for jokes on the topic of the Elon Musk Twitter acquisition. My personal favorite: "With Elon Musk in charge, Twitter is sure to become the most innovative and futuristic social media platform around.")
Why aren't they on this? They should be at the forefront. I'm sure in some corner of Google they have a plan... but that plan hasn't penetrated my sphere of awareness yet.
OpenAI has been making the most noise online because of how open they've made their recent chatbot, but Google has been on this for a while. Earlier this year they had a blog post [1] about LaMDA which doesn't seem too far off in capability from OpenAI's projects. They've also made a lot of other strides in their research [2] that kind of goes under the radar because they haven't been synthesized into products yet (at least not in the ways we'd expect them to).
It's probably because they don't have the compute resources for this yet. I guess it would require a huge investment in hardware to release this to the masses.
chatgpt solved a little accounting data problem for me.
A relative wants to automatically classify credit card expenses and import them into quickbooks. Should be super easy, you would think.
chatgpt printed out the exact steps to get the transaction data into excel or quickbooks. It told me how to use transaction rules in quickbooks to classify the expenses.
It then gave me several possible statistical algorithms to perform the classification. And it gave me the python code to implement a logistic regression given the credit card transaction fields. Then it gave me the steps to import the final CSV into quickbooks.
All in less than 5 minutes. You could find out all of these steps on Google. But chatgpt synthesized all the steps into code given a natural language description of the problem. It could adapt and adjust its output from the previous prompt to compare approaches. That's a lot more context than google gives you. Pretty impressive I'd say.
Before: a few really knowledgeable/good ones and a lot of OK ones
A few years from now: a few really good ones
What does this mean for the labor economics? Do we reap the results of increased productivity? Or are they captured by a small set of winners in a pareto distribution?
It's not _that_ complicated these days. structs, classes strings, vectors, hash maps, unique_ptr, references, non-owning pointers where it makes sense, basic templates. Boom. You also need basics like ownership, order of construction/destruction, value categories, move semantics, special member functions, RAII, etc. All the fancy perfect forwarding and template magic are wrapped up in libraries most don't need to think to hard about. Most people rarely write code that actually manages resource lifecycle directly. It's all wrapped up in RAII.
If you read the whole spec, it looks super complicated. But getting to a mental model that works in 99% of cases is not _that_ hard.