Teleport – Modern SSH server for clusters and teams

tssva · on Sept 29, 2016

There have been several comments regarding authentication but what I see as the glaring issue is the lack of authorization. Without some sort of RBAC implemented within Teleport and/or integration with external authorization solutions the use cases for Teleport are going to be limited to very simple scenarios. The suggestion mentioned within a couple of github issues of using os logins as roles is a non-starter in a large number of environments due to security and audit considerations.

I understand that trusted clusters were originally intended as a means to allow Teleport to work through restrictive firewalls but the same type of environments which are going to necessitate the use of trusted clusters are also likely to require more restrictions regarding which users are allowed to access the cluster from a trusted cluster. The current solution of specifying individual allowed users is too limited and doesn't scale.

Gravity, which Teleport is ripped from, is a system to support SaaS providers in deploying and managing onsite implementations within enterprise networks. Maybe these issues are addressed at a different level within Gravity but if not I would be hesitant to allow a SaaS provider to deploy within my network using Gravity.

old-gregg · on Sept 29, 2016

tssva, great points. Gravity is basically Kubernetes-on-Teleport, and our distribution of k8s has additional security measures built-in, where Teleport (in a library form) is acting more or less on a lowest "root" level which the infrastructure owner can optionally turn on/off for pieces of their infrastructure, we'll be gradually publishing more documentation on Gravity soon, but you're right: RBAC lives one level higher.

qwertyuiop924 · on Sept 28, 2016

It's interesting, but I don't think I'd take it over openssh just yet. I'm not at the scale where most of the features become relevant, and frankly, I don't trust it as much as I trust openssh, a piece of software that's been in active development for almost 17 years, has an excellent security track record, and a development team well known for its care for security and its response to security issues.

And in a piece of vital security infrastructure, trust is everything.

draugadrotten · on Sept 28, 2016

> "Teleport has completed a security audit from a nationally recognized technology security company. So we are comfortable with the use of Teleport from a security perspective."

It would be interesting to read more details about this audit. This blurb is pretty much "Trust us" with more letters.

microcolonel · on Sept 28, 2016

Having worked on software which received audits from a "natonally recognized security company", this statement on their part gives me no additional confidence. I performed what I would consider a "light" audit on the same codebase and found database injection attacks, auth forging attacks, denial of service; and the infrastructure it was deployed on used versions of libraries which opened the service up to plausible remote code execution.

Ditto, it just basically says "trust us"; no offense to the authors.

twakefield · on Sept 28, 2016

Unfortunately, our current agreement with the security company we hired prevents us from sharing the details unless under NDA. We may revisit those terms so that we can publish it now that Teleport is garnering more interest.

Edit: tried to add more clarification to address jtchang's question.

jtchang · on Sept 28, 2016

Why is this? Doesn't it make sense to publish which company did the audit? I mean for financial companies like KPMG they definitely do it.

dsr_ · on Sept 28, 2016

My company pays for an annual external audit, and I do most of the negotiations.

About half the companies don't want the audit results going anywhere else. About half are OK with distribution under NDA. In between, there are companies which will negotiate the right to distribute, usually with an explicit indemnity clause, and on the far end are a minority of companies that tell you that the results are work for hire and entirely your prerogative.

The major concern is that the customer will misrepresent the results in some way, or even represent them accurately but then change the underlying service to open up a vulnerability. Either way, nobody really wants to be in the news...

sethhochberg · on Sept 28, 2016

I'd guess it is something like "pay us $XX for an audit for your information; pay us $XXXX to tell other people we audited you"

developer2 · on Sept 28, 2016

It's exactly this. A private audit is generally cheap. The truth with these is that the audit really doesn't mean anything - you will often find such audits performing such a superficial sweep of the software that even glaring problems may not be found.

Publicly releasing an audit puts the auditor's reputation on the line. As such, the audit will always be performed very seriously, digging into the code and scrutinizing every possible angle - at a level usually not even remotely done for a private audit.

All said: a private audit means squat all.

qwertyuiop924 · on Sept 28, 2016

Indeed. That's why I'm not entirely trusting of it yet.

old-gregg · on Sept 28, 2016

Maintainer of Teleport here.

qwertyuiop924: I'm with you. When it comes to security it pays off to be skeptical and pragmatic. We're on the hook of keeping it robust, simple and secure for years to come. So eventually, with the help from the OSS community, we expect it to earn your trust, hopefully sooner than in 17 years though :)

qwertyuiop924 · on Sept 28, 2016

If the maintainer is understanding of your skepticism, it's already a good sign :-).

kordless · on Sept 29, 2016

> And in a piece of vital security infrastructure, trust is everything.

Trust is based on an individual's experience over time. It can also be based on statistical observations over time. Statistically, nearly every single piece of software in use today has had a security hole in it at some point in time and many still have them. Audit or not, openssh is included in this fact. Today's infrastructure software is far too complex to ever be secure past a certain statistical threshold, at least without reinventing the way the Internet works.

Your comments are curious given Teleport just came out. One comes to trust something by spending time with it first. You can't apply anti-trust to something just because it's not the same as something you do trust.

GrinningFool · on Sept 29, 2016

>Your comments are curious given Teleport just came out. One comes to trust something by spending time with it first. You can't apply anti-trust to something just because it's not the same as something you do trust.

I don't follow this line of reasoning. I was with you for the opening sentence - trust is a thing earned over time - and then you lost me.

Statistically it is a near-certainty that OpenSSH does have undiscovered security holes. The trust that the maintainers have built over years makes me comfortable with that, because I know they will be rapidly addressed in a reasoned out manner.

I don't see gp as applying anti-trust just because it is different from OpenSSH - I see it as the maintainers of Teleport not yet having earned the trust that the maintainers of OpenSSH have.

And statistically as a new product with a new code base, Teleport is likely to have more security issues than an established product actively monitored for security flaws in the same space.

So the issue is not a matter of "new and different is bad", so much as "new product and (relatively) unknown community behind it are unproven". Those two factors combined make it inherently less trustworthy in a space where trustworthiness is critical.

kordless · on Sept 29, 2016

The parent post is presenting conflicted information about trust. I find that both interesting and disgusting. I'm disgusted (fear + boredom) with it because I fear for the human race moving forward with existing centralized infrastructure. The boredom is a side effect of it involving trust and me not trusting it. I remain interested in talking about trust, however, so I'm conflicted myself. :)

This is the basic argument they are making:

> I don't trust it as much as I trust openssh...And in a piece of vital security infrastructure, trust is everything.

This is an implicit statement that all infrastructure tends toward full trustworthiness by the increasing use of legacy software. It also implies that trusted infrastructure affects everyone (which is true). Together, this is a logical fallacy - infrastructure will never be 100% trustworthy and when it breaks it will not affect everyone the same. You yourself made that observation regarding holes in OpenSSH. It is anti-trust, or "inverse trust" (if one is a stickler for terms) when one attempts to publically imply their individual/internal trust of a thing is based on their societal/external trust of another thing which is being used in a mutually exclusive way. That itself is an anti-competitive practice, whether intentional or not.

I think calling this flawed logic out is important when the subject is trust and security. There's a reason for the limit of causality in our universe. The lesson we can take from that limit would be to not apply inverse internal trust to something to prevent it's external use when using it is required to trust it.

qwertyuiop924 · on Sept 29, 2016

Allow me to clarify my opinion:

I fail to see what's wrong with saying that a new piece of software hasn't yet earned the trust that an older piece of software has. However, I never said that it could not earn that trust. That's an assumption on your part.

And I don't distrust Teleport because OpenSSH exists, I distrust Teleport because it hasn't proven its reputation yet. If OpenSSH didn't exist, I'd be saying the same things. The difference is, because OpenSSH does exist, I can make a comparison to piece of infrastructure I do trust.

And I do agree that we cannot move forward with existing infrastructure, I just don't trust this particular piece of new infrastructure yet.

qwertyuiop924 · on Sept 29, 2016

It just came out, so I don't trust it yet. I'm waiting to see what security flaws arise, how the developers respond, etc.

And yes, you do start out with anti-trust if you're replacing something I trust, because you then have to be at least that trustworthy to replace it.

NickBusey · on Sept 28, 2016

This project was posted here 6 months ago, but has since evolved quite a bit. Version 1.1.0 was recently released, so I personally wanted to hear HN's thoughts on this more stable version of the project. (I am not affiliated.)

sciurus · on Sept 28, 2016

Link to previous discussion: https://news.ycombinator.com/item?id=11355976

jdc0589 · on Sept 28, 2016

ditto. I evaluated it back then and though it could be fantastic at some point, it was just missing to many small things. Admittedly these were things we could have solved ourselves, but it seemed like they would be added to the core product at some point, so I opted to hold off adoption for a while.

e12e · on Sept 29, 2016

Agreed, this looks like a pretty good evolution on plain (open)sshd now.

Still does seem to sit at a somewhat uncomfortable place between being an augmented ssh that's easier to manage, and a completely separate auth/authz solution (in some ways a little like installing saltstack or puppet etc, which essentially grants a daemon arbitrary/root privileges and ability to do configuration changes on the "outside" of the unix user/group-framework).

Things like:

"Also, people can join your session via terminal assuming they have Teleport installed and running. They just have to run:

> tsh --proxy=teleport.example.com join 7645d523-60cb-436d-b732-99c5df14b7c4

!!! tip "NOTE": For this to work, both of you must have proper user mappings allowing you access db under the same OS user." (my emphasis)

I don't really see this as a big problem -- but I'd prefer a tool that basically magically took care of generating short-lived user-certificates for ssh (and it does indeed look like teleport does this now, in a rather nice and well-documented[1] manner) -- but I'd like to keep the user-database in some sane place, like ldap, especially as long as teleport use is still subject to pam/unix user/group in addition to internal user/authorizations.

I also don't particularly like having to use password login (even with two-factor) - but I suppose one could either a) wrap the web bit in ssl client auth (but now you need two CAs, one for ssh certs and one for x509) - or better (for my use-case) create a daemon that works as a ssh daemon and can be configured for normal ssh access (set of known keys, limit access to internal network/whitelisted IPs) -- that can be used for initial auth. Or maybe just replace the current web auth with a specialized ssh daemon that uses standard ssh+two-factor login (something like[2]).

All that said, I really like the look of this project in its current state, and I'll strongly consider using it over setting up a "bare" (open)ssh CA system.

[1] https://github.com/gravitational/teleport/blob/master/docs/a...

[2] Note, I'm not a great fan of the google auth pam module, but have successfully tested using OAUTH -- and it works well both with pre-generated passwords or google auth or similar TOTP-apps. An added benefit is that it can also be enabled for sudo/su elevation, rather than just relying on password for that.

See eg: https://jonarcher.info/2015/07/hardening-ssh-with-otp-for-2-...

and/or: http://spod.cx/blog/two-factor-ssh-auth-with-pam_oath-google...

Incidentally my fight with OATH/pam/various python tools to generate qr-codes allowed me to overcome the seemingly insurmountable challenge of pairing the standard Google Authenticator App with the Star Wars: Old Republic TOTP secret in order to enable 2-factor login. It appears that makes me a member of a rather select club of swtor-players....

moondev · on Sept 28, 2016

Why is ssh even needed on distributed clusters? Shouldn't provisioning the cluster be automated and the nodes be immutable by design? I can only imagine what a nightmare a huge fleet of special sniwflake machines woukd be to manage. Cattle not pets

mdeeks · on Sept 28, 2016

I firmly believe No-SSH is a goal you should always strive for but never actually achieve. There are always cases where you need to do really detailed troubleshooting that requires things like tcpdump, a debugger, or even running lsof (or other expensive command that you can't afford to run regularly and log).

Some random user with some edge case will put one of your nodes for your service into a bad state. Throwing away the machine will just lose the state and push the user somewhere else.

Also centralized log and metrics stores get real big and expensive really fast. Sometimes you simply can't afford to ship everything to it. So you'll find yourself putting detailed debug logs on the "free" ephemeral/local disks while info logs go to your central store.

cakeface · on Sept 28, 2016

Direct SSH access can be an invaluable tool for debugging production issues. It is great to be able to SSH in and check the tcp dump, view the running processes in detail, attach gdb and get a memory dump. These types of tasks will never go away. We have great leaps in orchestrating remote application management but when you get down to issues at the bottom of your stack you'll always need to directly access the machine. I like to tell my developers that "the stack goes all the way down". A bug in Linux networking or even a CPU error is still a bug.

mitchty · on Sept 28, 2016

Yep, sometimes you have to debug things live with things like perf/gdb echo c > /proc/sysrq_trigger.

If not, well congratulations, but if I got told I can't get the above to debug things I'd start question why this "pet" infrastructure lacks basic debugging ability.

moondev · on Sept 28, 2016

Smart health checks and logging should take care of that and remove the instance automatically. You can also spin up a canary machine to "live" debug. I'm referring to distributed clusters not a single machine taking on all the traffic.

icebraining · on Sept 28, 2016

Smart health checks and logging should take care of that and remove the instance automatically.

What if the bug is corrupting data or sending incorrect results to the client? Even if you detect and kill the instance, you still have a problem to fix. And even if you can cleanly kill the instance and redirect the request, you can't avoid the latency hit from having to re-process it.

You can also spin up a canary machine to "live" debug.

You can reproduce the code and data, but how can you reproduce the exact state? You can't log everything that happens in a machine.

vasco · on Sept 28, 2016

This is a bit naive. If you've ran a production system with any decent traffic and never needed to SSH into machines, congrats. I haven't and I don't know anyone who has. You might need to go in for anything from auditing to troubleshooting, even if it's rare.

PS: How is your automated provisioning system reaching your cluster if not by SSH?

SysArchitect · on Sept 28, 2016

OpenStack + cloud-init + SaltStack

chetanahuja · on Sept 29, 2016

"SaltStack"

Saltstack is either using SSH to communicate or opening it's own port. I'd much rather trust an open ssh port for securely provisioning/management than allow any other piece of software to keep a port open (upto and including TLS based protocols).

SysArchitect · on Sept 30, 2016

SaltStack has an agent that communicates with a master on a different server. The agents on the clients don't need an open port (other than egress).

This allows me to have one central server that is well secured and protected that allows ingress from the remote hosts, and then all the clients reach out to the master to get their tasks.

moondev · on Sept 28, 2016

logging and metrics are sent elsewhere to be consumed and queried.

I build machine images with packer that get provisioned during the deployment pipeline. That single machine is then put into a cluster with x number of copies. If one dies I don't care, the cluster provisions another automatically.

Sanddancer · on Sept 29, 2016

How do you debug things when you discover that machines are randomly dropping logs for x minutes?

colemickens · on Sept 28, 2016

>PS: How is your automated provisioning system reaching your cluster if not by SSH?

Not sure about moondev, but Terraform + Cloud-init + Container Orchestrator means that I basically never need to SSH into my nodes, except in extreme/rare circumstances.

Thaxll · on Sept 28, 2016

Except when you need to troubleshoot complex issues.

vacri · on Sept 29, 2016

> except in extreme/rare circumstances.

So you do need ssh after all?

colemickens · on Sept 29, 2016

I said "I basically never need to". Not that I never need to. But yeah, I basically never, ever need to. Short of needing to take a coredump or docker shits the bed, I don't really ever need to log into my nodes.

I guess that's an offensive thing to point out judging by my score...

johnbellone · on Sept 29, 2016

So you need SSH.

colemickens · on Sept 29, 2016

Are you kidding me?

>I said "I basically never need to". Not that I never need to. But yeah, I basically never, ever need to

It is interesting to imagine why this innocuous comment has ruffled so many feathers.

colemickens · on Sept 29, 2016

Boy, I sure hope I'm helping someone really sad feel better about themselves. Downvote away, it's not going to change a damn thing.

moondev · on Sept 28, 2016

I do love me some terraform. Mainly use spinnaker these days

amalcon · on Sept 28, 2016

Let's say you have a thousand machines sitting around, doing whatever, and suddenly you notice one of them acting strangely. Maybe it's just a little slower than the others, maybe it's crashing, maybe your monitoring system can't pick it up, maybe it's even producing incorrect outputs. How do you tell if it's:

A) A hardware issue (thus requiring hardware maintenance)

B) A software issue triggered by the particular workload of this machine (thus requiring a software change)

C) A network issue (thus requiring network maintenance, possibly phone calls to ISPs, etc)

D) Something else (thus requiring who knows what)

The most normal way in the real world is, have a sysadmin and a dev sit down together, SSH into the machine, and poke things until they arrive at the root of the problem (possibly with additional process involved if it's a really big system).

Then, through normal (non SSH) processes, perform the required maintenance and update your monitoring system so that it identifies problems of that class.

moondev · on Sept 28, 2016

E) Your health checks notice the machine is acting up and it is removed from the cluster and another provisioned. In a distributed environment all of your nodes should be designed to be immutable and stateless. That's the advantage of running them at scale horizontally

amalcon · on Sept 28, 2016

So your plan is to buy a new machine as a result of a software issue, a network issue, or another problem that has nothing to do with the hardware like a corrupted image?

When you have 100 machines, you can probably afford to do this. It would be silly, because when you only have 100 machines, most of your problems will be software or network issues. Still, you can do it and it probably won't cost you a kidney. When you have ten thousand, you can't.

And this is even assuming that your monitoring system notices the problem. All too frequently, you'll notice the problem in some other way. Monitoring systems are designed to detect the things you knew might happen, and problems similar to those. There will be blind spots, it's just unavoidable.

serge2k · on Sept 28, 2016

Then you notice you're churning through machines pretty often and start to wonder if maybe you should actually fix that bug instead of just ignoring it.

We aren't talking about using SSH to go in and fix a machine so things stay up. It's about figuring out why something is happening.

dsr_ · on Sept 28, 2016

When all of your nodes are immutable and stateless, you have a really tough time doing logging, new account signup, and storing any information at all on behalf of your customers.

So not all of the nodes can be immutable and stateless.

microcolonel · on Sept 28, 2016

In fact, there is no such thing as an immutable or stateless computer; it violates the theory of computing. Just a bunch of buzzword nonsense people say so that they can justify spending their entire life asking for too much money to try making the machine behave as though it is stateless.

With real, physical machines; you could have a cooling issue (which you should fix), a manufacturing defect (which you should RMA), a networking issue (which you should fix). All of these can easily be unique to an instance, and all of them need analytical attention; especially when you're paying for all of those machines "horizontally" and "at scale".

BraveNewCurency · on Sept 29, 2016

> In fact, there is no such thing as an immutable or stateless computer

Nobody calls the computer immutable/stateless. Immutable/stateless is an architecture pattern, implying that nothing important is stored or shared between calls.

It doesn't say that you can't cache info or write log files, etc.

> Just a bunch of buzzword nonsense

Well, what's the alternative that you propose? Should my HTTP requests leave lots of state-full files laying around? Is it easier to manage deployments when every box has different files laying around depending on how old it is?

Also, remember these techniques are for people "at scale". If you only have 1 or 2 boxes, these techniques may not be useful to you, so feel free to ignore them. I've got a 1000 boxes, and have seen first hand why it's a bad idea for each box to be able to be a unique snowflake.

Even if you have physical machines, you shouldn't put your production image on it to debug it -- you should put a debug image on it that allows SSH. If it's a hardware problem, it will be easy to find. If it's a software problem -- well, you have 1000 other boxes running that software, so you'll see it again. Just add better logging.

amalcon · on Sept 28, 2016

I mean, you can have an immutable stateless computer, but it's just called a "boolean logic circuit". You give it an input, it gives you an output, that's all.

In practice, you'll of course actually be using a computer with RAM. Most likely, it will even have a Von Neumann architecture. That leaves you open to everything from race conditions to hardware failures to cosmic rays putting a machine into a bad state.

drdrey · on Sept 29, 2016

Stateless depends on the context, it doesn't violate any theory of computing. Here are some elements of stateless systems:

  * all the state needed for serving requests is transient, not something that needs to be persisted
  * any particular instance can disappear without affecting the observable behavior of the system
  * the system is infinitely scalable: it doesn't matter if the cluster is comprised of 1, 100 or 1000 instances

niij · on Sept 28, 2016

THROW MORE BUZZWORDS AT IT

sciurus · on Sept 28, 2016

Even Netflix, the poster child for this style of managing systems, allows engineers to SSH in to machines. In fact, their system and Facebook's system for performing SSH auth were discussed on HN recently.

https://news.ycombinator.com/item?id=11746425 https://news.ycombinator.com/item?id=12482212

imsofuture · on Sept 29, 2016

Provisioning the cluster should of course be automated. Does that mean you'll never need to ever access a node for any reason? Of course not.

colemickens · on Sept 28, 2016

So Teleport always records the SSH session? Doesn't that get expensive at times? Sometimes I stream logs over SSH or use `watch` on a fast interval. It's enough that tmux and ssh take a non-trivial amount of CPU, even just on the receiving end. I just wonder if that proxy recording becomes a bottleneck.

old-gregg · on Sept 28, 2016

Teleport is actually a Golang library, and that's how it's used internally at Gravitational, where session recording can be turned on or off [1] based on a use case.

The pre-built teleport daemon as you see in `tool` directory does not have a config switch for turning recording off yet, we should add it.

[1] https://github.com/gravitational/teleport/blob/master/lib/se...

sandstrom · on Sept 28, 2016

Something (vaguely) similar can be achieved using Vault and an ssh-helper[1].

It enables one time passwords and password issuing can itself be tied to 2-factor, etc.

[1] https://github.com/hashicorp/vault-ssh-helper

INTPenis · on Sept 29, 2016

Do you know anything about recording ssh sessions too perhaps? That was one of the more interesting points of this software in the OP.

I would like to see a proxying authentication layer for both ssh and rdp to manage accountability and access centrally.

nixgeek · on Sept 28, 2016

No support for "plugins" as far as authentication, or to elaborate a bit, how do you go about running the 'auth' component in multiple VPC and have some degree of sync? Perhaps a use case for an underlying LDAP directory, or ..?

alexk · on Sept 28, 2016

We support OIDC connectors, so you can plug in LDAP using https://github.com/coreos/dex as one of the providers, or simply roll a new OIDC provider customized to your needs.

Sanddancer · on Sept 29, 2016

Why OIDC? Why not use something much more standardized like PAM?

alexk · on Sept 29, 2016

Mostly because we wanted to support Google auth out of the box and OIDC is a good way to get this + give options for pluggable auth to everyone else.

vacri · on Sept 29, 2016

Pet hate: projects that use existing common(-ish) words. 'ssh' is instantly recognisable. 'Teleport'... meh. Make your own word, be unique.

ChoHag · on Sept 29, 2016

Why do these fucking quasi-products never actually say what the fuck they DO? "<this> replaces sshd, possibly the most critical piece of software your server runs, with a magic black box".

There's nothing in the rather meagre list of suggestions at the top of the readme that can't be done already and 0 detail on how they intend to achieve it and why their dubious methods require wholesale replacement of your, did I say it's critical?, ssh daemon.

SSH is definitely not a tool I'll be replacing any time soon with the latest fly-by-night product from CADT BikeShedders Inc.

INTPenis · on Sept 29, 2016

There was one very interesting feature, recording of sessions.

There is an increasing demand in large organisations to manage server sessions better.

That is to have accountability and manage access centrally.

So far I've only reviewed one solution, Cyberark, they promised to manage all our server sessions centrally with 2FA, recording ssh and rdp, and more. But in the end we weren't motivated enough to pick them.

I haven't seen any open source offering to do this.

Of course this product would be outside of your normal SSH server, as an additional proxying layer and not a replacement for ssh. Just saying that there was one very interesting point in that repo, I wouldn't replace OpenSSH ligthly either.

HeadlessChild · on Sept 28, 2016

This just feels like a lazy solution.

NickBusey · on Sept 28, 2016

I'd be interested in hearing about non-lazy alternatives other than manual key shuffling.

jessaustin · on Sept 28, 2016

Actually certificates can be used with openssh as well:

https://news.ycombinator.com/item?id=12518902

I'm not sure this isn't too "lazy" for some tastes, however.

alfalfasprout · on Sept 28, 2016

Kerberos, RADIUS, you name it.