More

totalperspectiv · 2025-11-01T12:07:32 1761998852

The author works for Modular. He shared the write up on the Mojo Discord. I think Mojo users were the intended audience.

totalperspectiv · 2025-09-15T14:17:15 1757945835

Removing the wrapping newline from the FASTA/FASTQ convention also dramatically improves parsing perf when you don't have to do as much lookahead to find record ends.

Gethsemane · 2025-09-15T14:49:19 1757947759

Unfortunately, when you write a program that doesn't wrap output FASTAs, you have a bunch of people telling you off because SOME programs (cough bioperl cough) have hard limits on line length :)

sharedptr · 2025-09-16T06:28:39 1758004119

Is BioPerl still standard, did people move to BioPython?

When I was shown BioPerl I was tempted to write a better, C++ version, but was overwhelmed by other university stuff and let it go.

o11c · 2025-09-15T19:27:03 1757964423

You can use content-defined chunking to wrap at a predictable place so that compression still works.

bede · 2025-09-15T15:01:01 1757948461

Thanks for reminding me to benchmark this!

totalperspectiv · 2025-09-15T15:19:31 1757949571

I've only tested this when writing my own parser where I could skip the record end checks, so idk if this improves perf on a existing parser. Excited to see what you find!

totalperspectiv · 2025-09-15T14:13:38 1757945618

> a testament to the massive gap in perceived vs actual programming ability of the average bioinformatician.

This is not really a fair statement. Literally all of software bears the weight of some early poor choice that then keeps moving forward via weight of momentum. FASTA and FASTQ formats are exceptionally dumb though.

totalperspectiv · 2025-09-07T01:18:47 1757207927

I have used Mojo quite a bit. It’s fantastic and lives up to every claim it makes. When the compiler becomes open source I fully expect it to really start taking off for data science.

Modular also has its paid platform for serving models called Max. I’ve not used that but heard good things.

totalperspectiv · 2025-09-07T01:16:45 1757207805

I don’t follow your logic. Mojo can target multiple gpu vendors. What is the Modular specific lock in?

smilekzs · 2025-09-07T01:49:39 1757209779

Not OP but I think this could be an instance of leaky abstraction at work. Most of the time you hand-write an accelerator kernel hoping to optimize for runtime performance. If the abstraction/compiler does not fully insulate you from micro-architectural details affecting performance in non-trivial ways (e.g. memory bank conflict as mentioned in the article) then you end up still having per-vendor implementations, or compile-time if-else blocks all over the place. This is less than ideal, but still arguably better than working with separate vendor APIs, or worse, completely separate toolchains.

whimsicalism · 2025-09-07T02:38:04 1757212684

Yes, it looks like they have some sort of metaprogramming setup (nicer than C++) for doing this: https://www.modular.com/mojo

totalperspectiv · 2025-09-07T14:59:43 1757257183

I can confirm, it’s quite nice.

whimsicalism · 2025-09-07T17:56:10 1757267770

jw: why do you use mojo here over triton or the new pythonic cute/cutlass?

totalperspectiv · 2025-09-08T13:22:26 1757337746

Because I was originally writing some very CPU intensive SIMD stuff, which Mojo is also fantastic for. Once I got that working and running nicely I decided to try getting the same algo running on GPU since, at the time, they had just open sourced the GPU parts of the stdlib. It was really easy to get going with.

I have not used Triton/Cute/Cutlass though, so I can't compare against anything other than Cuda really.

subharmonicon · 2025-09-07T07:20:44 1757229644

The blog post is about using an NVIDIA-specific tensor core API that they have built to get good performance.

Modular has been pushing the notion that they are building technology that allows writing HW-vendor neutral solutions so that users can break free of NVIDIA's hold on high performance kernels.

From their own writing:

> We want a unified, programmable system (one small binary!) that can scale across architectures from multiple vendors—while providing industry-leading performance on the most widely used GPUs (and CPUs).

totalperspectiv · 2025-09-07T14:58:46 1757257126

They allow you to write a kernel for Nvidia, or AMD, that can take full advantage of the Hardware of either one, then throw a compile time if-statement in there to switch which kernel to use based on the hardware available.

So, you can support either vendor with as-good-vendor-library performance. That’s not lock-in to me at least.

It’s not as good as the compiler being able to just magically produce optimized kernels for arbitrary hardware though, fully agree there. But it’s a big step forward from Cuda/HIP.

totalperspectiv · 2025-08-28T19:46:51 1756410411

I can't speak to Gleam, but for Elixir I just used Burrito to create a single executable: https://github.com/burrito-elixir/burrito I think it works for just Erlang too.

toast0 · 2025-08-28T20:30:38 1756413038

I haven't used it, but from the docs, I don't see why this wouldn't work for any language that compiles to beam files. You might need to adjust the build setup a bit.

Personally, I think I'd prefer something that worked without unpacking, but I don't actually need something like this, so my preferences aren't super important :D

totalperspectiv · 2025-07-16T14:55:30 1752677730

I really wish Crystal had taken off a bit. I thought it had a chance in bfx with some good benchmarking and PR by lh3 in biofast.

totalperspectiv · 2025-07-16T11:58:21 1752667101

I would rather write Groovy than YAML any day of the week.

Why did you rule out Nextflow or Snakemake? I believe they both work with k8 clusters.

Argo doesn’t look great from my standpoint as a workflow author.

armedgorilla · 2025-07-17T13:10:39 1752757839

For both workflow languages, they are both better for building a singular reproducible workflow that can be published with an academic paper. For us, I'm looking for a workflow language that can treat the pipeline as a testable, deployable piece of software. I find that with Nextflow, scientists fall into bad patterns of mixing in the pipeline logic (eg if this sample type, then process it this way) interspersed with the bioinformatics model (eg use these bowtie2 parameters) throughout the pipeline which makes it more difficult to maintain as our platform evolves. Their K8s integration is lacking for both of them and they work much better an academic-style clusters.

YAML does leave a lot to be desired, but it also forces a degree of simplicity in architecting the pipeline because to do otherwise is too cumbersome. I really liked WDL as a language when I used to use that--seemed to have a nice balance of readability and simplicity. I believe Dyno created a python SDK for the Argo YAML syntax, and I need to look into that more.

totalperspectiv · 2025-07-16T11:51:54 1752666714

NF Tower / Seqera would be the selling points. They offer a nice UX for managing pipelines and abstract over AWS.

Technically snakemake can do it all. But in practice NF seems to scale up a bit better.

That said, if you don’t need the UI for scientists, I’d stick to snakemake.

totalperspectiv · 2025-07-16T09:03:37 1752656617

Cool seeing a workflow language pop up on HN!

Nextflow and Snakemake are the two most-used options in bioinformatics these days, with WDL trailing those two.

I really wish Nextflow was based on Scala and not Groovy, but so it goes.

There is a Draft up for dsl3 that adds static types to the channels that I’m very excited about. https://github.com/nf-core/fetchngs/pull/309