More

crashocaster · on Oct 8, 2023

I always find evals of this flavor offputting given that 3.5 and 4 likely share preference models (or at least feedback data)

crashocaster · on Aug 13, 2023

Actually, the only numbers every LLM developer should know are their accelerator specs. For example:

A100 specs:

- 312e12 BF16 FLOPS

- 1555e9 GB/s HBM bandwidth

H100:

- 1000e12/2000e12 BF16/INT8 FLOPS

(apply ~0.7 flops efficiency multiplier because h100s power throttle extremely quickly)

- 3000 GB/s HBM bandwidth

---

For a 13B model on an A100, this nets:

13e9 * 2 bytes per param = 26 GB HBM required (at bf16)

26e9/1555e9 = 17ms / token small-batch latency (~60 tokens / second)

What about large batches?

latency for some batch size B is 13e9 * 2 FLOP per param * B / 312e12

We want B such that we're just about no longer HBM bound: 26e9/312e12 * B = 17ms

<=> 17e-3/(26e9/312e12)

giving a batch size of 204.

At that batch size (and all larger batch sizes), the a100 delivers a throughput of B * 1/17ms = 12000 tokens / second

---

KV caching, multi-gpu and -node comms and matmul efficiencies left as an exercise to the reader :)

crashocaster · on Sept 27, 2021

I’d suggest to revisit voice recognition — it works quite well for me in the same usecase.

I also like to take walks, sometimes listening to podcasts. The stock iOS voice recognition (the microphone button on the keyboard, not Siri) is quite good, I usually just talk into the phone without looking at the output. After the walk, I format and clean up the notes to fix any errors.

yjftsjthsd-h · on Sept 27, 2021

My experience of voice to text is that it's painfully nondeterministic; amazing when it works, which is... just short of enough for me to trust it.

yathern · on Sept 27, 2021

That's fair - I live in a bustling city and get self conscious with people being able to hear me text and such. I know it's really effective these days though.

crashocaster · on April 28, 2021

I would have been interested to hear more about the verification techniques and tools they used for this project.

jeffbarr · on April 28, 2021

Check out https://cacm.acm.org/magazines/2015/4/184701-how-amazon-web-... ("How Amazon Web Services Uses Formal Methods") and https://d1.awsstatic.com/Security/pdfs/One_Click_Formal_Meth... ("One-Click Formal Methods") for more info.

sidereal · on April 28, 2021

The folks involved gave a neat talk about the verification techniques and tools they used as part of AWS Pi Week recently: https://www.twitch.tv/videos/951537246?t=1h10m10s

crashocaster · on Jan 17, 2021

For some large n, integers in the algorithm may be so large that operations on them cease to be constant time.

d0mine · on Jan 17, 2021

that is obvious. But how does it change big O? (why any manual implementation would have a better big O compared to existing arithmetics implementation in CPython?)

kaba0 · on Jan 17, 2021

I think he meant that naively implementing an algorithm may not be bounded by the O notation he/she originally wanted due to code calling other functions “hidden” from the programmer.

crashocaster · on Sept 2, 2020

This can be done by 'unparsing' the python AST. We do this in the DaCe project: https://github.com/spcl/dace/blob/900cac070c4b083653c672e225...

crashocaster · on Aug 25, 2020

Loading books using programs like calibre [1] allows you to covert EPUB to MOBI (the kindle format) seamlessly before transferring. In my experience this works perfectly.

[1] https://calibre-ebook.com/

crashocaster · on Feb 16, 2020

It should be noted that the described graph embedding related tasks are only a small subset of the tasks that GNNs solve. Many (if not most) graph learning techniques focus on more "local" tasks like node classification or edge prediction.

crashocaster · on Nov 27, 2019

His prerequisite first year course [0] actually matches your description more, and even has students designing their own CPUs on FPGAs. I feel like this course goes more in the direction of his own research interest, and already assumes the basic "breath" you speak of.

[0] https://safari.ethz.ch/digitaltechnik/spring2019/doku.php?id...

crashocaster · on Nov 25, 2019

Much of the CompArch topics are covered in his first year course: https://safari.ethz.ch/digitaltechnik/spring2019/doku.php?id...