Announcing Rust 1.24.1

mkj · on March 2, 2018

In case anyone else was wondering how longjmp() over the Rust code isn't a problem regardless:

"There are only Copy types on the rust stack frame being jumped over."

https://github.com/rust-lang/rust/issues/48251

stmw · on March 2, 2018

neat!

Freak_NL · on March 2, 2018

> Cargo couldn’t fetch the index from crates.io if you were using an older Windows without having applied security fixes.

Why are developers (since this is an issue that shows up with cargo) running Windows 7 without security patches installed? Especially since the issue only shows up on Windows 7 installs that haven't received security patches since June 2016.

> libgit2 created a fix, using the WinHTTP API to request TLS 1.2. On master, we’ve updated to fix this, but for 1.24.1 stable, we’re issuing a warning, suggesting that they upgrade their Windows version.

That's really a neat and responsible way of handling this.

mtgx · on March 2, 2018

Although developers should know and do better, I think a big part of why they aren't doing it is because Microsoft has made it so hard to update Windows 7 since a year or two ago, all in the name of forcing people to switch to Windows 10.

You now have to visit and manually download the updates from Microsoft's Update Catalog website. Oh, and that website only works on Internet Explorer.

Arnavion · on March 2, 2018

No idea what you mean. Windows 7 receives updates from Microsoft through Windows Update just fine.

ndh2 · on March 2, 2018

Windows Update generally still works on 7.

Maybe you ran into this issue? https://www.myce.com/news/windows-7-dont-receive-security-up...

oblio · on March 2, 2018

Are you sure about that? I know plenty of people with Windows 7 that just get updates through the normal channels.

Arnavion · on March 2, 2018

>Why are developers (since this is an issue that shows up with cargo) running Windows 7 without security patches installed? Especially since the issue only shows up on Windows 7 installs that haven't received security patches since June 2016.

The issue shows up on patched machines. The patch does not enable TLS 1.2 by default. You have to add reg keys for it.

https://github.com/rust-lang/cargo/issues/5065#issuecomment-...

amluto · on March 2, 2018

I've never understood why Microsoft adds support for new crypto protocols but turns them off by default.

Arnavion · on March 4, 2018

(To be clear, Windows 7 without the patch also supports TLS 1.2. The patch adds a registry key to configure the default protocols for applications that don't explicitly override the default.)

colejohnson66 · on March 2, 2018

To avoid breaking badly designed programs?

mstade · on March 2, 2018

> Why are developers (since this is an issue that shows up with cargo) running Windows 7 without security patches installed? Especially since the issue only shows up on Windows 7 installs that haven't received security patches since June 2016.

Enterprise. It’s only recently I’ve stopped seeing XP boxes around, but Windows 7 is everywhere.

oblio · on March 2, 2018

Even in Enterprise environments, having boxes which haven't gotten security patches from 2 years ago is just careless.

mstade · on March 2, 2018

Oh please don’t for a second believe I excuse this behavior in any way. I 100% agree with you, yet at the same time that is the reality you have to adjust to at times. Enterprise IT often moves at glacial speeds, and that’s on a good day...

ogoffart · on March 2, 2018

Speculating here: Maybe developers are testing/debugging on some VM with older Windows, without that machine be their development machine.

HumanDrivenDev · on March 2, 2018

Rust gives me a headache. I want to like it, but it just seems so... overengineered

https://doc.rust-lang.org/std/str/struct.SplitWhitespace.htm...

Why would you create a special data type to represent a string split by Whitespace? Lunacy

devit · on March 2, 2018

Because unlike most other programming languages, Rust is well-engineered and seeks to both provide the most general abstractions possible and to make the code generated by them as efficient as possible.

In particular, for this task, this requires to:

1. Return references to subranges of the original string, rather than copying them, so that no copy happens if you only need to examine the component instead of storing it

2. Not use reference counting to do so, but rather statically checked references with lifetimes, to avoid unnecessary instructions to update the reference count and lack of a static finalization point

3. Provide a way to get components one by one, so that if you only need e.g. the first two, time is not wasted to split the whole string

4. Provide that through a generic Iterator trait, so that it may be passed to generic methods (like one that collects the result into a vector)

5. Dispatch that generic trait statically rather than using an indirect call as that would destroy performance

6. Make the state manipulated by such an interface into a first-class object, and allow to put them in a data structure (like an array) while still doing static dispatch, so that you can, for instance, split multiple strings into components and interleave them without ever making an indirect call.

The combination of these essential requirements results in the creation of the SplitWhitespace<'a> data type, which represents the state of a parser splitting a string into 'a-lifetime references to whitespace-separated its components one by one, implementing the Iterator trait, and usable in a data structure.

stuaxo · on March 2, 2018

Echoing the other commented, it would be great if the docs had a "why?" link to this.

steveklabnik · on March 2, 2018

It's extremely hard to know when someone needs a "why?" link. This is the kind of information that's conditional, which is the hardest part of API docs.

ndh2 · on March 2, 2018

Good read. Too bad none of that made it into the documentation.

steveklabnik · on March 2, 2018

So, a lot of it is in the documentation, but if you don't know Rust, then you might not infer it. This kind of thing is tough, as you need to balance audiences. The API docs assume you already know Rust.

For example:

Point 1 is known because the Item type is &str.

Point 2 is known because, well, that's the norm, but beyond that, SplitWhitespace is parameterized over a lifetime

Point 3 is known because it's an iterator; next() is its primary interface, which returns things one by one.

Point 4 is the "impl<'a> Iterator" bit

Point 5 is known because the return type of split_whitespace is this iterator, not Box<Iterator>

Point 6 is related.

If we repeated all of this stuff in every single bit of docs, it might make it more useful for some audiences, but also kinda destroy the docs for intermediate/advanced Rust users.

I'd love to make docs generally more accessible, but I'm not aware of any great solutions to this particular problem.

frankmcsherry · on March 2, 2018

You could probably just have a link that directs the reader to HN and posts a pre-filled "I have barely read about Rust, and it all seems over-complicated and ill-designed; convince me" comment.

varjag · on March 2, 2018

The better half of those points are directly and indirectly caused by insistence on manual memory management. Yes, it was a major design point of Rust, but it doesn't make it better engineered than "most other languages". It's just the corner it painted itself to.

maxaf · on March 2, 2018

Firstly, manual memory management is what happens when a C programmer must manually place malloc/free calls within her program. Rust doesn't require any of that; the compiler determines the right times to allocate and free memory, while requiring the programmer to follow certain design rules in exchange for the convenience.

Second, you're making it sound as if the position Rust (and those who program in it) is somehow undesirable. This state of things is the consequence of an explicit design goal, which was to accomplish automatic memory management without runtime cost.

varjag · on March 2, 2018

Interesting; I certainly wouldn't consider C++ constructor/destructor like semantics and manual tracking of object ownership in the code as automatic memory management. But regardless, the point stands.

And yes it's a consequence of a design goal, I said as much. The outcome however isn't beautiful enough to feel smug about the rest of programming languages.

mathw · on March 2, 2018

Rust doesn't have C++ constructor/destructor semantics. It doesn't require manual tracking of object ownership. You just have to say when you're passing ownership and when you aren't, and the compiler takes care of the rest.

It's automatic memory management without runtime accounting or a garbage collector, which means Rust doesn't need a runtime at all.

varjag · on March 2, 2018

> It doesn't require manual tracking of object ownership. You just have to say when you're passing ownership and when you aren't […]

Call it whatever you want, but there are languages where you don't have to "pass ownership" manually for every frigging thing.

iknowstuff · on March 2, 2018

And you won't see them being used for systems programming because of their overhead. So without Rust, we'll be stuck with stupid, avoidable security vulnerabilities at the base of all our software, forever. On top of that, Rust's ownership system also prevents data races and none of those other languages are capable of that.

creatornator · on March 2, 2018

I'd say Java, a GC'ed language, is used for systems programming quite often.

kosinus · on March 2, 2018

Do you consider C++ RAII as manual memory management?

olavk · on March 2, 2018

Rust doesn't have manual memory management. It does have deterministic memory management as opposed to garbage collected language. Which means you can use it it in domains where garbage collected languages cannot be used - e.g. for a OS ...or for implementing a garbage collector.

varjag · on March 2, 2018

You absolutely can implement an OS or a garbage collector in a GC-enabled language.

__david__ · on March 2, 2018

Well, sure, if you want completely unpredictable timings on everything. GCs have real problems in certain domains.

varjag · on March 2, 2018

You can have entirely responsive desktop system. Hard RTOS would certainly be harder to achieve, but not sure Rust has much to show there so far either.

microtonal · on March 2, 2018

Besides that Rust does not do manual memory management (as a peer commenter pointed out), Rusts borrowing rules also prevent typical ownership bugs. For instance:

- You split a string, returning an iterator or slice where every element is a slice of the original string.

- You change the original string.

- Now the splitting may be invalid.

Such bugs are not possible in Rust, since; (1) the SplitWhitespace struct borrows the str immutably; (2) the borrows checker does not permit simultaneously borrowing data as mutable and immutable.

Another problem that is prevented by Rust is 'memory leaks' where someone splits a large string and uses only a smaller substring. If the substring is a slice of the original string, a GC cannot deallocate the larger string. Rust prevents this, because a string slice reference (&str) cannot outlive the underlying String. [1]

tl;dr: better of the half points are indirectly and indirectly caused by Rust's ownership model that prevents a lot of ownership bugs. That does not mean that you do not have to think about ownership problems in other languages.

[1] Such memory leaks were one of the reasons why Oracle Java switched to a much slower, copying implementation of substring:

http://java-performance.info/changes-to-string-java-1-7-0_06...

netheril96 · on March 2, 2018

Only point 2 is related to memory management, while all others could be applied to GC-enabled languages as optimizations just fine. That is 1/6th, not "better half".

rkangel · on March 2, 2018

If you have no need for the use case of a high speed, low overhead non-GC language (or believe there is no need for such a thing), then Rust is going to be of no interest to you, and you are going to be better served solving your problems with another language - Java, C#, Python etc.

A large number of engineers do desire such a tool, and Rust provides that in a way that provides as little cost as possible over one of those more dynamic languages.

varjag · on March 2, 2018

You are answering a point I didn't make. I addressed the stated 'engineering supremacy'.

rkangel · on March 2, 2018

I am asserting that (a) Rust has made a good choice to be a 'manual memory management' language, and that (b) it has executed that well (or at least better than any other option).

Your phrase about 'painting itself into a corner' implies you disagree with point (a), and that was the case I was trying to make.

ekidd · on March 2, 2018

> Why would you create a special data type to represent a string split by Whitespace?

Big chunks of Rust are build around iterators, and `split_whitespace` returns an iterator of type `SplitWhitespace`. Because this is a concrete type, the Rust compiler and LLVM will then work to together to completely inline it, and they will generate code that looks like a hand-rolled loop.

There are some downsides to this system—it usually takes me about 10 minutes to write custom iterators for a new data structure—but iterators are very nice to program with and they go fast.

There's a new 'impl Trait' feature scheduled for later this year which will eliminate the need to export a custom struct like this. And it will eliminate the 10 minutes I spend writing iterators. Of course, it adds a new language feature. Nothing's free.

> Rust gives me a headache. I want to like it, but it just seems so... overengineered

I admit, Rust does sometimes have a "heavy industry" feeling to it. But this has some nice benefits, too:

1. I can write cross-platform CLI tools that Just Work on Linux, MacOS and Windows, because Rust has good abstractions for paths, files, threads, etc.

2. I can write multi-threaded code that does things like, "Read an arbitrary stream of bytes in a background thread, compress it, break it into 5MB chunks, and upload each of those chunks to S3 in parallel, using no more than N worker threads and applying backpressure, and do all this in the background while I work on something else." And thanks to Rust's threading rules, all this will work on the first try, with no nightmarish threading bugs.

3. In general, if my Rust code actually compiles, there's about an 85% chance that it will work flawlessly on the first try.

Personally, these are benefits that I'm willing to pay for. And Rust does require some familiarity both with how processors work, and with functional programming. And of course, everybody has different tradeoffs. But for certain kinds of work, Rust really hits the sweet spot.

romwell · on March 2, 2018

>and upload each of those chunks to S3 in parallel, using no more than N worker threads

Do you use a crate like Rayon for this? I'm just starting with Rust, and my current understanding is that without using a library, one can only spawn threads and distribute load by hand (as opposed to automatic scheduling a-la OpenMP). Is this correct?

ekidd · on March 2, 2018

We're hoping to open source our streaming S3 uploader. Behind the scenes, it uses the `crossbeam` crate (for scoped threads) and BurntSushi's `chan` crate (for bounded, blocking mpmc queues, which is how we get backpressure from a worker pool). One of the cool things about Rust is that it's possible for third-party crates to implement threading primitives.

`rayon` is awesome, and it uses `crossbeam_dequeue` internally. But `rayon` is really best-suited to computational parallelism on many small pieces of data, and less suited to I/O parallelism. So it may be worth using `crossbeam` directly in that case.

All the threads in our S3 uploader run on a single machine, because multi-part uploads are mostly limited by how much memory we want to use for buffering S3 object parts, not by CPU.

(Of course, I'll probably overhaul a lot of this code later this year once `#[async]` and `futures` stabilize, so I can also stop paying for the memory used by thread stacks.)

But the cool part is that we can already do streaming input, compression, chunking and parallel uploads without ever creating a temp file. (And we can upload multiple data streams using the same worker pool.) This took about three days to build.

romwell · on March 2, 2018

Sounds great, thanks! Computational parallelism is what I'm after, so I'll look more into Crayon.

lucozade · on March 2, 2018

I wrote in another comment that I don't think the SplitWhitespace type is an example of overengineering. And I don't.

However, thinking about this more, I think I understand what you mean and I, sort of, agree.

What I believe is going on is that Rust exposes a lot of its engineering. This can be either good or bad depending on where you're coming from. If you're working at a low level then this is usually a good thing. If you're working at a high level then this can be quite annoying.

What I think you'll find is that this is largely a point in time thing. Rust is still relatively young and there is substantial work in flight under the banner of ergonomics. Even though this is more engineering, I'm pretty confident that the net effect will be to make the language feel less engineered. It probably won't completely get there this year but I don't think it'll be long.

jayflux · on March 2, 2018

That’s not a datatype... That’s an iterater returned by calling .split_whitespace() on a string.

Basically the same as calling .split(‘ ‘) on a string in JavaScript

You can find a list of Rusts primitive types here: https://doc.rust-lang.org/std/#primitives

lucozade · on March 2, 2018

> That’s not a datatype

It's a named struct. That's pretty datatype-y.

I don't agree with the GP that this is an example of overengineering. If anything it's an example of current Rust being slightly underengineered. Presumably a lot of these temporary types can be killed off once `impl Trait` goes mainstream?

twic · on March 2, 2018

It's a datatype, but it's an implementation detail - str::split_whitespace has to return something that implements the Iterator trait, so it return this. You as a programmer should never need to name it explicitly.

When "impl Trait" lands, it will be possible to hide details like this.

the_mitsuhiko · on March 2, 2018

I would never want the named types to go away. They heavily improve the error messages you get.

leshow · on March 2, 2018

I asked this question on IRC and they said that it likely wouldn't go away. There may be a couple reasons, backwards compatibility for one, the other may be that when the type is existentially quantified the compiler has to search for a matching implementation, that search can take time. We want stuff using std to compile fast.

Sharlin · on March 2, 2018

As implied by other commenters, the problem is that Rust cannot currently abstract out a concrete return type without indirection of some sort. More formally, it’s not possible to existentially quantify over a trait bound; in other words, to say that whatever the actual type that is returned, all the caller needs to know is that it implements a certain trait or traits.

Note that it’s exactly the same in other major statically-typed languages like Java or C++: there are no existential types without indirection (which in the case of Java and similar managed languages is implicit and mandatory).

Rust is in the progress of adding existential trait bounds in the form of the `impl trait` feature. It’s already available in nightly.

the_mitsuhiko · on March 2, 2018

> Note that it’s exactly the same in other major statically-typed languages like Java or C++

It's not just statically typed languages. Python also returns loads of special types from iterator functions, they are just usually not documented as being types:

    >>> type(itertools.chain([1, 2], [3, 4]))
    <class 'itertools.chain'>

Sharlin · on March 2, 2018

Yes, I just meant that in a dynamically typed language you don’t have the problem of types leaking to signatures because there are no type annotations in the first place. And when there are, everything has indirection anyway.

the_mitsuhiko · on March 2, 2018

> Why would you create a special data type to represent a string split by Whitespace? Lunacy

Because it's an iterator. A bespoke iterator type is also created behind the scenes in many other languages. How else would you do it?

dom96 · on March 2, 2018

> How else would you do it?

There is a `splitWhitespace` iterator[1] in the Nim programming language as well and doesn't require a separate type. In Nim, the iterator is inlined.

1 - https://nim-lang.org/docs/strutils.html#splitWhitespace.i,st...

dralley · on March 2, 2018

It's inlined in rust, too

the_mitsuhiko · on March 2, 2018

Iterators in nim are also weird because many of them are not first class types and cannot be assigned to variables. I do not consider that to be a particularly good concept to be honest.

dom96 · on March 2, 2018

There are two types of iterators in Nim. The inlined variant is more efficient, and you can fairly easily wrap it in the other type: the "closure"" iterator. Why do you not consider it a good concept?

the_mitsuhiko · on March 2, 2018

Because since we're comparing this with Rust right now which can achieve highly efficient iterators without sacrificing the ergonomics of it.

Someone · on March 2, 2018

I would call that underengineered. This could be more generic, both in what it operates on (why only strings and not any sequence of value items?) and what it separates on (why only Unicode whitespace? It could be used as return value from a call that iterates over CSV fields in a line or that finds regex matches in a string).

burntsushi · on March 2, 2018

SplitWhitespace is a convenience. Its underlying internal type is actually:

    Filter<Split<'a, IsWhitespace>, IsNotEmpty>

The split[1] method is generic, and for example, one can indeed use a regex for it. (I wouldn't use it for CSV though, since it would almost certainly be wrong.)

[1] - https://doc.rust-lang.org/std/primitive.str.html#method.spli...

swsieber · on March 2, 2018

... why not create a new type. As noted by a sibling comment, there are some good reasons to do it.

And there's actually very little chance that you, as the developer, are ever going to explicitly specify that type - there's pretty good type inference. As a rust user, I don't care that it's another type because I don't need to care about it in order to write code.

So lots of benefits, very low cost/impact.

oblio · on March 2, 2018

It depends on what software world you come from. In many places "stringly typed" code is frowned upon: http://wiki.c2.com/?StringlyTyped

Yes, it's convenient, but using data types can help with many things, from implicit documentation, to optimizations to enforcing a code contract/interface.

GolDDranks · on March 2, 2018

This is an interesting demonstration how fiddly and UB-happy FFI boundaries can be... Fortunately the UB → abort change will land in the future, it will be for the better.

_binder · on March 2, 2018

One command to update the language...wow...ok Iong for the day I can do that in C++.

ilurkedhere · on March 2, 2018

Very good