Hacker News new | past | comments | ask | show | jobs | submit login
Clay: A new language for generic programming (based on LLVM) (tachyon.in)
69 points by zaph0d on July 26, 2010 | hide | past | favorite | 42 comments



Hi folks, I'm the author of Clay. There aren't any docs ready yet, but I'll try to answer whatever questions I can. Also, there's a long thread over at reddit with many interesting questions.

http://www.reddit.com/r/programming/comments/ctmxx/the_clay_...


I couldn't help imagining how OO "objects" might be implemented in Clay

Your dispatch code now:

    record Square {
    side : Double;
    }

    record Circle {
        radius : Double;
    }

    variant Shape = Square | Circle;

    procedure show;
    overload show(x:Square) { println("Square(", x.side, ")"); }
    overload show(x:Circle) { println("Circle(", x.radius, ")"); }

My "Objectivizing":

    interface Shape {
        abstract procedure show(self);
    }

    record Square : Shape  {
        side : Double;
        show(self) { println("Square(", self.side, ")"); }
    }

    record Circle: Shape {
        radius : Double;
        show(self) { println("Circle(", self.radius, ")"); }
    }
    variant Shape = Square | Circle;
Here, "abstract" declaration in the parent would make the children's "show" automatically become an overload. 'self' is variable which would be automatically specialized to the containing record. I think the resulting code isn't excessively verbose.


At first sight, this looks like a major improvement over C++ and even more powerful than D with less ad-hoc features.

Interesting language features, as far as I can tell:

* Type inference

* First-class functions/closures (interesting compared to C++)

* Overloading is based on compile-time evaluation, with the possibility to have arbitrary values as type parameters, and is similar to predicate dispatch (you can dispatch using arbitrary conditions), with also the possibility to do runtime dispatch.

* Most of the functionality of the language is implemented in libraries, with a small kernel of built-ins.

Library functionality:

* the standard standard library: arrays, vectors, maps, algorithms, sequence abstractions etc.

* tuples, unions, variants

* lazy sequences (streams)

* reference-counting shared pointers (boost::shared_ptr)

* SIMD intrinsics and vectors using SIMD operations

* bindings to the C standard library of Unix and Win32 API

* Green threads, channels

Other:

* C binding generator based on CLang


I like the multiple dispatch feature (http://bitbucket.org/kssreeram/clay/src/b1df8340f2b4/test/va...).

It reminds me of clojure multimethods (http://clojure.org/multimethods). Can dispatch conditions in Clay be "arbitrary"? (in the example it is on the value of 'x', but can it be as complex as in Clojure multimethods?


It looks pretty nice. I'm curious about how the code generation works and whether Clay has the potential to produce the kind of code bloat you sometimes see with C++ templates. Say I have a function

  min(a, b) {
      if(a < b)
          return a;
      return b;
  }
From what I understand, this function can accept two Int32s, or a Uint8 and a Float32, or basically any other combination of numeric types. Is a new function for each type combination generated? I don't think there's any way around this to some degree, but I'm curious if you've been able to do anything to ease the pain.


You can control type specialization to a certain extent with overloading. For instance, if a procedure takes two arguments, and if both these types can vary independently, then you can use overloading to ensure that both arguments are converted to a common type.

    procedure foo;
    
    [T1,T2]
    overload foo(a:T1, b:T2) {
        // by default, convert both arguments
        // to the type of first argument.
        foo(T1(a), T1(b));
    }
    
    // the following overload specializes for the case
    // when both arguments are of the same type.
    [T]
    overload foo(a:T, b:T) {
        ...
        implement the logic here
        ...
    }


As someone who likes high level languages (mostly Lisp and Ruby, and learning Haskell for for the second time), I am surprised how much I like the language: list comprehensions, type inferencing for concise code, etc. There is about zero chance of my using this new language however since my language selection is about 90% driven by what customers want.


Hello kssreeram, question: your company web page and quillpad product look interesting; do you use Clay for quillpad development and deployment?


We are currently using Clay for a few projects within Tachyon for performance sensitive code. It's simply being used as a better C, and since Clay can generate light-weight C compatible DLLs, it fits in very well.

Quillpad itself predates Clay and hence doesn't use it.


It looks like the syntax could use some polishing. Three obvious improvements: use line breaks for semicolons (and backslash at the end of line for multiline statements), do not require parentheses around conditionals in if, while, for, etc. statements, and (slightly more controversially) use significant indentation instead of curly braces. I think each of these changes would make the language undebatably more concise and readable than before (except the last one, which has been debated, but I think it’s definitely an improvement).

An example from algorithms/introsort of how the syntax would look:

  overload introSort(first, last)
      if first != last
          introSortLoop(first, last, log2(last-first)*2)
          finalInsertionSort(first, last)
instead of

  overload introSort(first, last) {
      if (first != last) {
          introSortLoop(first, last, log2(last-first)*2);
          finalInsertionSort(first, last);
      }
  }


This more than a little vacuous.


Not really. Python is famous for its readability, and this interest in reducing "noise" is one of the reasons why. It's definitely something that a language could benefit from at least riffing off, if not stealing outright.

The Python philosophy is that everything is expressed clearly and concisely. Parens around conditionals are just restating information that is already apparent, as are semicolons between lines. Fluent readers actually learn to ignore these syntactic features unless they're debugging (since these useless tokens are a breeding ground for bugs).

Ask yourself: How often do you mentally match the opening and closing parens on a conditional? How often do you rely on semicolons to tell when you're looking at a new statement as opposed to just looking at the lines of code?


Yeah, listen. I'm a python programmer as a matter of profession and paycheck.

It's a matter of taste.

Do I prefer significant whitespace ala python/haskell?

Yes.

Does it matter? No.

Does it matter when you're discussing a programming language you've just now encountered for the first time and is rather new and has many novel things to contribute to the world?

Fuck no.

Like I said, it was a vacuous thing to say. There are far more important questions to ask like,

"Are the generics a space-time trade-off similar to C++ templates?"

"Can I use the type system to encapsulate and restrict behavior in powerful ways, allow me to create performant but safe code?"

"Can I make a beowulf cluster out of this?"

Any of those questions have more substance than, "hurr whitespace is bettar why didn't you do tghaaaasdfsdgsg"

Christ-sakes.


Your entire response focuses on significant whitespace, which I didn't mention at all in mine.

And no, the time-space tradeoff of the generics is not necessarily a better question than how readable the language is. That is a matter of taste. I will spend more time reading the code than I will worrying about the performance characteristics of generics or creating a Beowulf cluster, so caring about the common case is not exactly vacuous, even if it isn't the #1 most important thing.


>Your entire response focuses on significant whitespace, which I didn't mention at all in mine.

Doesn't matter.

>Beowulf cluster

It was a fatuous comment designed to compare with the original fatuous comment.

>the time-space tradeoff of the generics is not necessarily a better question than how readable the language is. That is a matter of taste

No it's not. Taste is preference, whether or not a language is impossible to deploy with generics in an embedded environment has absolutely nothing to do with whining about syntax.

    importance:
        Semantics > syntax
I don't think I've seen sophistry and an obsession with the trivial on hackerne.ws like this in quite some time.

You're complaining about the color of the bikeshed when real discussion and work is to be done concerning the semantics and structure of the language?


I think you're unnecessarily trivializing things that you personally care less about. Who cares about embedded environments? People who work in them. Is that group dwarfed by the group of "programmers who have to read and debug code"? Yes it is. So it seems really petty to personally insult me for caring (not much, but a little) about something that makes code less bug-prone and easier to read for everybody while you hold up something that affects a vanishingly small number of people as what we should be talking about.

Your complaints seem to reflect an idea that only features and implementations matter, while user interface is fairly trivial. If you believe that, I think you might be interested in this: http://www.alistapart.com/articles/indefenseofeyecandy/


You linked to a web design website in defense of trival eye candy in a discussion about computer science.

I'll just leave it at that.


Web design is the art of creating user interfaces for websites – the website visitors are the users. Language design is the art of creating user interfaces for programs – programmers are the users. The fields are related by the common theme of designing things for users. And I don’t see how you can call eye candy trivial without defending that assertion right after reading an article that argues it is not trivial.


Please don’t resort to blatant ad-hominem misquoting. My remarks were not as stupid as “hurr” and “tghaaaasdfsdgsg”, they were statements that I put much thought behind. And rather than saying my thoughts were vacuous (mindless, meaningless), since you admit you prefer significant whitespace too, the worst you could call them is irrelevant.

As for why I decided to bring the syntax up when semantics are more important, it seems to me that if a language designer hasn’t thought about the syntax of a language and has just copied verbatim the syntax of Java/C, then they show a lack of attention to detail, and there is a greater probability that they will continue to show this lack of attention to detail when designing the rest of the language. Yes, I could read the whole semantics section of the language spec and evaluate it directly, but that takes more time than some people have when faced with yet another unproven language to consider learning or watching. I’m sure more people than me use syntax as a warning sign, so the syntax of a language is important if the designer doesn’t want to scare these people off.


I find it interesting that goto is in the examples. I'm not deeply familiar with program theory and computability, but I thought goto made the program impossible both to reason about and to verify its correctness. Why would one include it in a new programming language? Are there really valid uses of it? I don't think the factorial example here is a good use - he basically used it as a while(true) infinite loop. However, I also couldn't tell you with accuracy why it might be bad here other than the ingrained mentality of goto=evil.


There are two trains of thought on goto. On one side Donald Knuth and Linus Torvalts consider goto useful in specialized operations. Knuth uses goto in certain operations where it would be optimal to have the construct. Linus allows them when it would be a good optimization strategy but you must use them sensibly

On the other hand Edsger Dijkstra thought that having goto's in the language complicated the analysis of loops and also the analysis of the flow of the program. Dijstra did a great deal of research on structured programming which nearly all programming languages support (loop design, if then statements etc...) Citations: http://en.wikipedia.org/wiki/Goto#cite_note-4

http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pd...


I implemented goto because I've seen it being useful in other languages. That said, there's no other occurrence of goto in clay.

http://stackoverflow.com/questions/24451/goto-usage


Goto isn't inherently evil. It allows bad (and newbie) programmers to make spaghetti code. It is taught as 'evil' to prevent them from using it when it would produce ugly code.

Later, once you understand things, there are a few very rare times that it is legit to use it. Those few times can be handled without using it, though, and many programmers continue to avoid it out of habit or philosophy.

As for including it in his new language, it looks like it's basically C with some syntactical sugar. He didn't so much 'include' it as fail to remove it. He got it for free from C.


goto can be useful for "alarm exits". The linux kernel uses them a lot. See the bottom of this thread for some pseudo code http://kerneltrap.org/node/553/2131


Sure, but in the factorial example, Clay uses goto to express iteration: http://bitbucket.org/kssreeram/clay/src/b1df8340f2b4/test/fa...


Linux kernel uses goto all over the place, if you know what you're doing, it doesn't matter.


Looks like a very interresting language to me. I think it superficially looks a lot like rpython, with way more options to restrict and specify types on top of it. I'm very eager to see more documentation, specifically on types and memory management.

What is the syntax for pointers for example ?


Hi. Pointers to type T have the type Pointer[T]. '&' operator is for getting the address of a lvalue, and the '^' operator is for dereferencing. '^' is a better choice than C's '*' for dereferencing because, I can conveniently use the same operator for dereferenced field-access too, whereas C had to invent another operator "->" for that.

    record Point[T] {
        x : T;
        y : T;
    }
    
    updateViaPointer(ptr) {
        ptr^.x += 1;
        ptr^.y += 2;
    }
    
    test() {
        var p = Point(10, 20);  // type will be inferred as Point[Int]
        updateViaPointer(&p);
    }


Sample code I'd like to see: * How you'd implement a B-tree object * How you'd implement the-equivalent-of-a-class (a list of related functions and structures akin to something you'd see in the GTK documentation).


That looks like a hodge-podge of C and Pascal to me.

Sorry about the tone of this comment but I fail to see anything that would make me go 'yes, let's try this'.

Generic programming is something you can do in any language, and most of the (successful) ones out there are created with that goal in mind.

Can someone more in the know enlighten me as to why 'clay' is special in this respect?


> Generic programming is something you can do in any language, ...

That's not true.

Generic Programming is about writing re-usable code that is also very efficient. On the whole, it requires the following:

- Static types

- Overloading

- Type-parametric functions (templates)

- Type specialization.

Not all languages have these features.

Generic programming first took off with C++, when it introduced templates. But a few languages before and after C++ have supported generic programming: Ada, Haskell, D, etc.

edit: formatting.


Generic programming is a way of writing code, not a thing that your language supports or you can't do it. I can write perfectly re-usable C code that is also very efficient by relying on the pre-processor to customise the code to the exact types and conditionals required for the situation at hand.

It's a bit like saying you can't write functional code unless you use a functional language.


Firstly, using the preprocessor will mean there is code duplication, even if it is not necessary. Secondly, once you are using the preprocessor to do generic stuff, you have probably thrown type-safety away.

That's not to say you can't write C in a generic way, but "generic programming" means something specific to computer scientists and has certain prerequisites.

And you can't write functional code without a functional language, without building a functional language on top of your language (which may not even be possible, e.g. you can do functional-ish stuff in C because of function pointers. Without them, you'd be stuffed.)

You can write functional code in a "non functional language", if by functional language you mean "Haskell, ML or Scala". But you need certain features like function pointers to do it, and in that sense you do need a functional language. Same for generic programming - you need certain features.


Function pointers are very far from sufficient because you can't define new functions at runtime, since you can't nest functions or define anonymous ones. That means your functions are not first class citizen of your language and some common functional programming techniques are impossible to use in a clear way.

You can't do that for example :

    def make_adder(num_1):
        def adder(num_2):
            return num_1 + num 2
        return adder


Yes. Generally you then start cheating by passing around a function pointer and a void* that is the first argument of the function, i.e. doing closure conversion yourself. You can hide this with some macros and get close to having a functional language, but it's ugly and tedious (Cfront anyone?) Also, what I'm describing here is more similar to object orientation than functional programming. It's worth remembering the following koan though:

The venerable master Qc Na was walking with his student, Anton. Hoping to prompt the master into a discussion, Anton said "Master, I have heard that objects are a very good thing - is this true?" Qc Na looked pityingly at his student and replied, "Foolish pupil - objects are merely a poor man's closures."

Chastised, Anton took his leave from his master and returned to his cell, intent on studying closures. He carefully read the entire "Lambda: The Ultimate..." series of papers and its cousins, and implemented a small Scheme interpreter with a closure-based object system. He learned much, and looked forward to informing his master of his progress.

On his next walk with Qc Na, Anton attempted to impress his master by saying "Master, I have diligently studied the matter, and now understand that objects are truly a poor man's closures." Qc Na responded by hitting Anton with his stick, saying "When will you learn? Closures are a poor man's object." At that moment, Anton became enlightened.

-- Anton van Straaten


I downvoted you because i think you're flat out wrong, both on the functional account and on the generic account. Although i'm curious, how would you do generic programming in C a language in which you can't even define a generic list type without renouncing to static typing ?


You do realise that the first version of the C++ compiler was a pre-processor for the C compiler ? And that in the end all this stuff outputs assembly language ?

As for functional programming someone just released a lisp for PHP. That doesn't mean it's the 'right' way to do stuff, but it can be done, and programming 'generically' was done for years before someone took the time to produce a language for it. I've written piles of code that would write programs (usually C) to avoid having to write the same kind of code with slight variations.

In the end all this stuff is Turing complete, so if you can do something in one language, by simple reasoning you can figure out that you can do that trick in any other by implementing a (subset of) the former.


Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy. -- Alan Perlis, Epigrams on Programming


Why does Generic Programming require static types?


The type-parametric functions are inherently a feature of a static-typing system...so I guess "by definition" would be the answer. At least that's what the wikipedia entry would lead me to believe.

http://en.wikipedia.org/wiki/Generic_programming

See also: http://en.wikipedia.org/wiki/Type_polymorphism#Parametric_po...


From the OP's definition, it needs static types so it can be compiled efficiently. You can do it dynamically (e.g. Ruby) but you can't do it as fast in general.

Additionally, generic programming is a computer science concept, and those types tend to prefer static-typing for safety reasons. They're probably not strictly necessary if you ignore efficiency.


Dynamic typing (ala Scheme or Python) gives you generic programming for free, but it's less efficient.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: