Clay: A new language for generic programming (based on LLVM)

kssreeram · on July 26, 2010

Hi folks, I'm the author of Clay. There aren't any docs ready yet, but I'll try to answer whatever questions I can. Also, there's a long thread over at reddit with many interesting questions.

http://www.reddit.com/r/programming/comments/ctmxx/the_clay_...

joe_the_user · on July 26, 2010

I couldn't help imagining how OO "objects" might be implemented in Clay

Your dispatch code now:

    record Square {
    side : Double;
    }

    record Circle {
        radius : Double;
    }

    variant Shape = Square | Circle;

    procedure show;
    overload show(x:Square) { println("Square(", x.side, ")"); }
    overload show(x:Circle) { println("Circle(", x.radius, ")"); }

My "Objectivizing":

    interface Shape {
        abstract procedure show(self);
    }

    record Square : Shape  {
        side : Double;
        show(self) { println("Square(", self.side, ")"); }
    }

    record Circle: Shape {
        radius : Double;
        show(self) { println("Circle(", self.radius, ")"); }
    }
    variant Shape = Square | Circle;

Here, "abstract" declaration in the parent would make the children's "show" automatically become an overload. 'self' is variable which would be automatically specialized to the containing record. I think the resulting code isn't excessively verbose.

jaen · on July 26, 2010

At first sight, this looks like a major improvement over C++ and even more powerful than D with less ad-hoc features.

Interesting language features, as far as I can tell:

* Type inference

* First-class functions/closures (interesting compared to C++)

* Overloading is based on compile-time evaluation, with the possibility to have arbitrary values as type parameters, and is similar to predicate dispatch (you can dispatch using arbitrary conditions), with also the possibility to do runtime dispatch.

* Most of the functionality of the language is implemented in libraries, with a small kernel of built-ins.

Library functionality:

* the standard standard library: arrays, vectors, maps, algorithms, sequence abstractions etc.

* tuples, unions, variants

* lazy sequences (streams)

* reference-counting shared pointers (boost::shared_ptr)

* SIMD intrinsics and vectors using SIMD operations

* bindings to the C standard library of Unix and Win32 API

* Green threads, channels

Other:

* C binding generator based on CLang

_zhqs · on July 26, 2010

I like the multiple dispatch feature (http://bitbucket.org/kssreeram/clay/src/b1df8340f2b4/test/va...).

It reminds me of clojure multimethods (http://clojure.org/multimethods). Can dispatch conditions in Clay be "arbitrary"? (in the example it is on the value of 'x', but can it be as complex as in Clojure multimethods?

Chickencha · on July 26, 2010

It looks pretty nice. I'm curious about how the code generation works and whether Clay has the potential to produce the kind of code bloat you sometimes see with C++ templates. Say I have a function

  min(a, b) {
      if(a < b)
          return a;
      return b;
  }

From what I understand, this function can accept two Int32s, or a Uint8 and a Float32, or basically any other combination of numeric types. Is a new function for each type combination generated? I don't think there's any way around this to some degree, but I'm curious if you've been able to do anything to ease the pain.

kssreeram · on July 26, 2010

You can control type specialization to a certain extent with overloading. For instance, if a procedure takes two arguments, and if both these types can vary independently, then you can use overloading to ensure that both arguments are converted to a common type.

    procedure foo;
    
    [T1,T2]
    overload foo(a:T1, b:T2) {
        // by default, convert both arguments
        // to the type of first argument.
        foo(T1(a), T1(b));
    }
    
    // the following overload specializes for the case
    // when both arguments are of the same type.
    [T]
    overload foo(a:T, b:T) {
        ...
        implement the logic here
        ...
    }

mark_l_watson · on July 26, 2010

As someone who likes high level languages (mostly Lisp and Ruby, and learning Haskell for for the second time), I am surprised how much I like the language: list comprehensions, type inferencing for concise code, etc. There is about zero chance of my using this new language however since my language selection is about 90% driven by what customers want.

mark_l_watson · on July 26, 2010

Hello kssreeram, question: your company web page and quillpad product look interesting; do you use Clay for quillpad development and deployment?

kssreeram · on July 26, 2010

We are currently using Clay for a few projects within Tachyon for performance sensitive code. It's simply being used as a better C, and since Clay can generate light-weight C compatible DLLs, it fits in very well.

Quillpad itself predates Clay and hence doesn't use it.

roryokane · on July 26, 2010

It looks like the syntax could use some polishing. Three obvious improvements: use line breaks for semicolons (and backslash at the end of line for multiline statements), do not require parentheses around conditionals in if, while, for, etc. statements, and (slightly more controversially) use significant indentation instead of curly braces. I think each of these changes would make the language undebatably more concise and readable than before (except the last one, which has been debated, but I think it’s definitely an improvement).

An example from algorithms/introsort of how the syntax would look:

  overload introSort(first, last)
      if first != last
          introSortLoop(first, last, log2(last-first)*2)
          finalInsertionSort(first, last)

instead of

  overload introSort(first, last) {
      if (first != last) {
          introSortLoop(first, last, log2(last-first)*2);
          finalInsertionSort(first, last);
      }
  }

alnayyir · on July 26, 2010

This more than a little vacuous.

chc · on July 26, 2010

Not really. Python is famous for its readability, and this interest in reducing "noise" is one of the reasons why. It's definitely something that a language could benefit from at least riffing off, if not stealing outright.

The Python philosophy is that everything is expressed clearly and concisely. Parens around conditionals are just restating information that is already apparent, as are semicolons between lines. Fluent readers actually learn to ignore these syntactic features unless they're debugging (since these useless tokens are a breeding ground for bugs).

Ask yourself: How often do you mentally match the opening and closing parens on a conditional? How often do you rely on semicolons to tell when you're looking at a new statement as opposed to just looking at the lines of code?

alnayyir · on July 26, 2010

Yeah, listen. I'm a python programmer as a matter of profession and paycheck.

It's a matter of taste.

Do I prefer significant whitespace ala python/haskell?

Yes.

Does it matter? No.

Does it matter when you're discussing a programming language you've just now encountered for the first time and is rather new and has many novel things to contribute to the world?

Fuck no.

Like I said, it was a vacuous thing to say. There are far more important questions to ask like,

"Are the generics a space-time trade-off similar to C++ templates?"

"Can I use the type system to encapsulate and restrict behavior in powerful ways, allow me to create performant but safe code?"

"Can I make a beowulf cluster out of this?"

Any of those questions have more substance than, "hurr whitespace is bettar why didn't you do tghaaaasdfsdgsg"

Christ-sakes.

chc · on July 26, 2010

Your entire response focuses on significant whitespace, which I didn't mention at all in mine.

And no, the time-space tradeoff of the generics is not necessarily a better question than how readable the language is. That is a matter of taste. I will spend more time reading the code than I will worrying about the performance characteristics of generics or creating a Beowulf cluster, so caring about the common case is not exactly vacuous, even if it isn't the #1 most important thing.

alnayyir · on July 26, 2010

>Your entire response focuses on significant whitespace, which I didn't mention at all in mine.

Doesn't matter.

>Beowulf cluster

It was a fatuous comment designed to compare with the original fatuous comment.

>the time-space tradeoff of the generics is not necessarily a better question than how readable the language is. That is a matter of taste

No it's not. Taste is preference, whether or not a language is impossible to deploy with generics in an embedded environment has absolutely nothing to do with whining about syntax.

    importance:
        Semantics > syntax

I don't think I've seen sophistry and an obsession with the trivial on hackerne.ws like this in quite some time.

You're complaining about the color of the bikeshed when real discussion and work is to be done concerning the semantics and structure of the language?

chc · on July 26, 2010

I think you're unnecessarily trivializing things that you personally care less about. Who cares about embedded environments? People who work in them. Is that group dwarfed by the group of "programmers who have to read and debug code"? Yes it is. So it seems really petty to personally insult me for caring (not much, but a little) about something that makes code less bug-prone and easier to read for everybody while you hold up something that affects a vanishingly small number of people as what we should be talking about.

Your complaints seem to reflect an idea that only features and implementations matter, while user interface is fairly trivial. If you believe that, I think you might be interested in this: http://www.alistapart.com/articles/indefenseofeyecandy/

alnayyir · on July 27, 2010

You linked to a web design website in defense of trival eye candy in a discussion about computer science.

I'll just leave it at that.

roryokane · on Aug 8, 2010

Web design is the art of creating user interfaces for websites – the website visitors are the users. Language design is the art of creating user interfaces for programs – programmers are the users. The fields are related by the common theme of designing things for users. And I don’t see how you can call eye candy trivial without defending that assertion right after reading an article that argues it is not trivial.

roryokane · on Aug 8, 2010

Please don’t resort to blatant ad-hominem misquoting. My remarks were not as stupid as “hurr” and “tghaaaasdfsdgsg”, they were statements that I put much thought behind. And rather than saying my thoughts were vacuous (mindless, meaningless), since you admit you prefer significant whitespace too, the worst you could call them is irrelevant.

As for why I decided to bring the syntax up when semantics are more important, it seems to me that if a language designer hasn’t thought about the syntax of a language and has just copied verbatim the syntax of Java/C, then they show a lack of attention to detail, and there is a greater probability that they will continue to show this lack of attention to detail when designing the rest of the language. Yes, I could read the whole semantics section of the language spec and evaluate it directly, but that takes more time than some people have when faced with yet another unproven language to consider learning or watching. I’m sure more people than me use syntax as a warning sign, so the syntax of a language is important if the designer doesn’t want to scare these people off.

planckscnst · on July 26, 2010

I find it interesting that goto is in the examples. I'm not deeply familiar with program theory and computability, but I thought goto made the program impossible both to reason about and to verify its correctness. Why would one include it in a new programming language? Are there really valid uses of it? I don't think the factorial example here is a good use - he basically used it as a while(true) infinite loop. However, I also couldn't tell you with accuracy why it might be bad here other than the ingrained mentality of goto=evil.

zitterbewegung · on July 26, 2010

There are two trains of thought on goto. On one side Donald Knuth and Linus Torvalts consider goto useful in specialized operations. Knuth uses goto in certain operations where it would be optimal to have the construct. Linus allows them when it would be a good optimization strategy but you must use them sensibly

On the other hand Edsger Dijkstra thought that having goto's in the language complicated the analysis of loops and also the analysis of the flow of the program. Dijstra did a great deal of research on structured programming which nearly all programming languages support (loop design, if then statements etc...) Citations: http://en.wikipedia.org/wiki/Goto#cite_note-4

http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pd...

kssreeram · on July 26, 2010

I implemented goto because I've seen it being useful in other languages. That said, there's no other occurrence of goto in clay.

http://stackoverflow.com/questions/24451/goto-usage

wccrawford · on July 26, 2010

Goto isn't inherently evil. It allows bad (and newbie) programmers to make spaghetti code. It is taught as 'evil' to prevent them from using it when it would produce ugly code.

Later, once you understand things, there are a few very rare times that it is legit to use it. Those few times can be handled without using it, though, and many programmers continue to avoid it out of habit or philosophy.

As for including it in his new language, it looks like it's basically C with some syntactical sugar. He didn't so much 'include' it as fail to remove it. He got it for free from C.

fhub · on July 26, 2010

goto can be useful for "alarm exits". The linux kernel uses them a lot. See the bottom of this thread for some pseudo code http://kerneltrap.org/node/553/2131

adamc · on July 26, 2010

Sure, but in the factorial example, Clay uses goto to express iteration: http://bitbucket.org/kssreeram/clay/src/b1df8340f2b4/test/fa...

alnayyir · on July 26, 2010

Linux kernel uses goto all over the place, if you know what you're doing, it doesn't matter.

Raphael_Amiard · on July 26, 2010

Looks like a very interresting language to me. I think it superficially looks a lot like rpython, with way more options to restrict and specify types on top of it. I'm very eager to see more documentation, specifically on types and memory management.

What is the syntax for pointers for example ?

kssreeram · on July 26, 2010

Hi. Pointers to type T have the type Pointer[T]. '&' operator is for getting the address of a lvalue, and the '^' operator is for dereferencing. '^' is a better choice than C's '*' for dereferencing because, I can conveniently use the same operator for dereferenced field-access too, whereas C had to invent another operator "->" for that.

    record Point[T] {
        x : T;
        y : T;
    }
    
    updateViaPointer(ptr) {
        ptr^.x += 1;
        ptr^.y += 2;
    }
    
    test() {
        var p = Point(10, 20);  // type will be inferred as Point[Int]
        updateViaPointer(&p);
    }

joe_the_user · on July 26, 2010

Sample code I'd like to see: * How you'd implement a B-tree object * How you'd implement the-equivalent-of-a-class (a list of related functions and structures akin to something you'd see in the GTK documentation).

jacquesm · on July 26, 2010

That looks like a hodge-podge of C and Pascal to me.

Sorry about the tone of this comment but I fail to see anything that would make me go 'yes, let's try this'.

Generic programming is something you can do in any language, and most of the (successful) ones out there are created with that goal in mind.

Can someone more in the know enlighten me as to why 'clay' is special in this respect?

kssreeram · on July 26, 2010

> Generic programming is something you can do in any language, ...

That's not true.

Generic Programming is about writing re-usable code that is also very efficient. On the whole, it requires the following:

- Static types

- Overloading

- Type-parametric functions (templates)

- Type specialization.

Not all languages have these features.

Generic programming first took off with C++, when it introduced templates. But a few languages before and after C++ have supported generic programming: Ada, Haskell, D, etc.

edit: formatting.

jacquesm · on July 26, 2010

Generic programming is a way of writing code, not a thing that your language supports or you can't do it. I can write perfectly re-usable C code that is also very efficient by relying on the pre-processor to customise the code to the exact types and conditionals required for the situation at hand.

It's a bit like saying you can't write functional code unless you use a functional language.

Robin_Message · on July 26, 2010

Firstly, using the preprocessor will mean there is code duplication, even if it is not necessary. Secondly, once you are using the preprocessor to do generic stuff, you have probably thrown type-safety away.

That's not to say you can't write C in a generic way, but "generic programming" means something specific to computer scientists and has certain prerequisites.

And you can't write functional code without a functional language, without building a functional language on top of your language (which may not even be possible, e.g. you can do functional-ish stuff in C because of function pointers. Without them, you'd be stuffed.)

You can write functional code in a "non functional language", if by functional language you mean "Haskell, ML or Scala". But you need certain features like function pointers to do it, and in that sense you do need a functional language. Same for generic programming - you need certain features.

Raphael_Amiard · on July 26, 2010

Function pointers are very far from sufficient because you can't define new functions at runtime, since you can't nest functions or define anonymous ones. That means your functions are not first class citizen of your language and some common functional programming techniques are impossible to use in a clear way.

You can't do that for example :

    def make_adder(num_1):
        def adder(num_2):
            return num_1 + num 2
        return adder

Robin_Message · on July 26, 2010

Yes. Generally you then start cheating by passing around a function pointer and a void* that is the first argument of the function, i.e. doing closure conversion yourself. You can hide this with some macros and get close to having a functional language, but it's ugly and tedious (Cfront anyone?) Also, what I'm describing here is more similar to object orientation than functional programming. It's worth remembering the following koan though:

The venerable master Qc Na was walking with his student, Anton. Hoping to prompt the master into a discussion, Anton said "Master, I have heard that objects are a very good thing - is this true?" Qc Na looked pityingly at his student and replied, "Foolish pupil - objects are merely a poor man's closures."

Chastised, Anton took his leave from his master and returned to his cell, intent on studying closures. He carefully read the entire "Lambda: The Ultimate..." series of papers and its cousins, and implemented a small Scheme interpreter with a closure-based object system. He learned much, and looked forward to informing his master of his progress.

On his next walk with Qc Na, Anton attempted to impress his master by saying "Master, I have diligently studied the matter, and now understand that objects are truly a poor man's closures." Qc Na responded by hitting Anton with his stick, saying "When will you learn? Closures are a poor man's object." At that moment, Anton became enlightened.

-- Anton van Straaten

Raphael_Amiard · on July 26, 2010

I downvoted you because i think you're flat out wrong, both on the functional account and on the generic account. Although i'm curious, how would you do generic programming in C a language in which you can't even define a generic list type without renouncing to static typing ?

jacquesm · on July 26, 2010

You do realise that the first version of the C++ compiler was a pre-processor for the C compiler ? And that in the end all this stuff outputs assembly language ?

As for functional programming someone just released a lisp for PHP. That doesn't mean it's the 'right' way to do stuff, but it can be done, and programming 'generically' was done for years before someone took the time to produce a language for it. I've written piles of code that would write programs (usually C) to avoid having to write the same kind of code with slight variations.

In the end all this stuff is Turing complete, so if you can do something in one language, by simple reasoning you can figure out that you can do that trick in any other by implementing a (subset of) the former.

rosh · on July 26, 2010

Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy. -- Alan Perlis, Epigrams on Programming

endtime · on July 26, 2010

Why does Generic Programming require static types?

pohl · on July 26, 2010

The type-parametric functions are inherently a feature of a static-typing system...so I guess "by definition" would be the answer. At least that's what the wikipedia entry would lead me to believe.

http://en.wikipedia.org/wiki/Generic_programming

See also: http://en.wikipedia.org/wiki/Type_polymorphism#Parametric_po...

Robin_Message · on July 26, 2010

From the OP's definition, it needs static types so it can be compiled efficiently. You can do it dynamically (e.g. Ruby) but you can't do it as fast in general.

Additionally, generic programming is a computer science concept, and those types tend to prefer static-typing for safety reasons. They're probably not strictly necessary if you ignore efficiency.

klipt · on July 26, 2010

Dynamic typing (ala Scheme or Python) gives you generic programming for free, but it's less efficient.