Missed my favourite part of learning the mainframe: where the enter key is.
Return and enter and two different keys, but, on most modern systems, they perform a similar function. On z/OS Return moves down a line (similar to Tab, but ignoring all entries on the current line) and Enter actually sends to data off.
Once you get used to it, it's really no different to the Linux or Windows command lines. It's certainly dated, but that's what you get from running a system designed to be fully backwards compatible (with 24-bit, 31-bit and 64-bit addressing modes) that can continue to run software that's over 40 years old.
[For reference, the mainframe originally had 24-bit addressing. When IBM wanted to add 32-bit addressing, they found that people had been using the remaining byte to store other data, such as flags. So, to avoid breaking customer applications, the 32nd bit is used to identify whether the address is 24-bits or 31-bits]
((And yes, for the record, I am an IBMer, working in a z/OS product that's over 40 years old))
The ISPF editor is actually more than decent. About 20 years ago my father was writing COBOL (on both AS/400, now iSeries, and PC) using SPFPC, an MS-DOS editor inspired by ISPF. The line command/primary command mechanism is really no more quirky than vi's movement and insert modes, and it makes a lot of sense for languages like COBOL.
SEU, the built in editor on the iSeries/IBM i, is however rather shitty. It does actually do some linting, but no syntax highlighting and the 5250 interface never really got improved as much as 3270, so its 24x80 or 27x132. Granted you're supposed to use an eclipse based IDE since SEU is no longer supported, but to just look at stuff really quick. A fast editor is preferred.
I think SEU is far better than you seem to giving it credit for. In fact, I used it for decades and my only real complaints about it started when IBM stopped actively maintaining it. Over the years I also tried using various other "modern" editors, only to eventually cast them aside as being too unstable, buggy, or just slow and clunky (like Eclipse). In fact, I can only recall one particular instance where using such an editor (Eclipse in this case) actually proved unquestionably advantageous over using SEU, but even then I quickly got annoyed at some of the stupid (IMO) implementation details of it.
When Java was first available on the AS/400 (ported by a man at IBM UK - I have forgotten his name), I tried to use SEU as my editor (me being a PC programmer). OMG worst experience ever.
The reason why things are the way they are on mainframes, and how you should think intuitively, becomes a lot clearer once you realise that they are a long-evolved version of the very first mechanical punched-card processors:
In fact, I'm almost willing to bet that the foreignness of mainframes to the average developer is due to mini/microcomputers having become dominant and taken a very different evolutionary path; Linux and DOS/Windows have far more in common with each other than mainframe OSes, despite their huge differences, because they evolved from mini/microcomputers and UNIX.
> Linux and DOS/Windows have far more in common with each other than mainframe OSes, despite their huge differences, because they evolved from mini/microcomputers and UNIX.
Surely Linux and the Windows NT line (don't confuse the Windows 9x line, which evolved indirectly from DOS with the Windows NT line) have much more in common with each other than mainframe OSes, but Windows did not evolve from UNIX:
The Windows 9x line evolved indirectly from DOS (cf. [1]). DOS was inspired a lot by CP/M.
The Windows NT line is a spiritual succesor to VMS (cf. [2]). The kernels of both operating systems were designed by Dave Cutler, who did not like UNIX (cf. [3]). Indeed, Windows NT has I/O Completion Port (IOCP, [4]) as an - in my opinion - better I/O system than the UNIX process input/output model.
Not sure what you believe to be 'indirect' about that relationship. That Windows codebase started out running on DOS and ultimately wound up so closely coupled that they shipped together in the same box as the same product.
> Windows did not evolve from UNIX:
Windows, through its DOS lineage, does indeed have some roots in the Unix tradition. (This should be unsurprising, since Microsoft was one of the more successful Unix licensees in the early 80's.)
To put Cutler's dislike of Unix into perspective, DOS (and Windows) had close to a decade of development prior to his involvement. (DOS shipped in 81, Cutler got involved in ~88, his first product shipped in ~93, and the old DOS-based Windows wasn't fully deprecated until 2001-1.)
MS/PC/DR-DOS was also inspired by Unix. It's where the handle-based API, the hierarchical directory structure, and indeed ioctl, came from that appeared in MS/PC-DOS version 2.
On top of it, the Linux ecosystem has evolved in cloud space to be more like VMS with things like metering, more security restrictions, clustering, and so on. The "unnecessary" stuff that was integrated with VMS ground up got bolted onto Linux's later.
I've managed to make do with VSCode, sshfs, reading the iconv manpage and using the USS Submit command for job cards. The current workflow has SDSF open in c3270 to the left and VSCode loaded with Cobol extensions to the right. Saving, compiling and running my code is only a few keystrokes and it's not too hard. There's even full IDEs and integration packages as other users have mentioned.
It's usable, but a little different. I mean the guys at Rocket Software have ports of some of the basic unix packages for z/OS including the Bash Shell, just adding that to USS improves quality of life tenfold. Many features are doable through USS but ISPF will always be the main interface.
However, the people who develop for z/OS are often people who are familiar with Interactive System Productivity Facility (ISPF). It's a bit like whether people who are proficient in Vim or Emacs really benefit that much from an IDE.
But again, the people who program mainframe learn it through that environment. It would be interesting to hear from people who have used these sorts of set ups.
I've been working on Mainframes for the passed 8 years full time, and several internships before that. I often go several days with out needing to use ISPF. I use
https://compuware.com/workbench-mainframe-modernization/
for dataset editing, job submission, etc. I find the Unix shell you access through ISPF to be very difficult to use. Luckily SSH is available on z/OS, so if I need to do Unix stuff I can just SSH in with the client of my choice.
The biggest difficulty I had when getting started was understanding JCL. I had come from a Linux background, so was used to executing commands with arbitrary file names, looping, etc. It took me a while to understand how JCL worked. Once I understood that you have to define datasets with things such as space allocation and disposition before using them, and that JCL was a series of sequential steps with the output from a prior step available to a later step a switch flipped and I went from struggling with JCL to fairly comfortable in a couple days. I'm not sure how much of this was the fact that I was an intern at the time so didn't have a lot of programming experience and how much of it is the platform. I find that once you get comfortable with the basics of JCL the structure it forces you into actually makes it easier, even though it is limiting in some ways.
The issue I still have with the Mainframe is documentation. Unlike most programming questions a google search does not turn up Stackoverflow results. I find I spend a lot of time going through random forums or reading the IBM doc. I don't find IBM doc easy to understand, but the more I have to deal with it the easier it gets.
> I find the Unix shell you access through ISPF to be very difficult to use.
Hehe, I once ran tail(1) and forgot to enter the file name, so tail read from stdin, and Ctrl+D did not work, so my shell was stuck waiting for input until the system was IPL'ed. Fun times... ;-)
This is just between you and me, OK, but a couple of years back I managed to do something similar on an IBM i, in its QShell (AIX) environment. And it felt bad, too, because such an unrecoverable glitch is almost unheard of in iSeries land!
> The issue I still have with the Mainframe is documentation. Unlike most programming questions a google search does not turn up Stackoverflow results. I find I spend a lot of time going through random forums or reading the IBM doc. I don't find IBM doc easy to understand, but the more I have to deal with it the easier it gets.
You're not alone my friend, those IBM docs certainly are not easy to read/understand.
> You're not alone my friend, those IBM docs certainly are not easy to read/understand.
Just as for Unix, Linux, Windows, and so on, there's an IBM "culture" (terminology, conventions, etc.) that you have to get used to. (It often helps to know some of its history, too.) But since I grew up professionally with one foot in the IBM world I usually don't have much trouble at all reading their docs.
I wonder why they haven't developed a more human-friendly successor to JCL. I get that IBM's all about backward compatibility, but would it kill them to add a more accessible alternative?
I once spent three days figuring out how to deal with an invocation that did not fit in an 80-character line. sigh
There was OCL on the IBM S/34 and S/36 back in the day, and there's CL on the IBM i today, which dates back to the earliest S/38 days (circa 1978). It's been some decades now since I've dealt with JCL, but as I recall both of those other languages make JCL look like child's play in comparison. Also, don't forget about Rexx!
Ugh, I tried to learn Rexx and I really did not like it. Maybe it is more "at home" on a mainframe, but playing with Regina Rexx on my desktop was no fun.
(Rexx has since developed an object oriented dialect, that might be more convenient to use.)
> that doesn't mean IBM shouldn't invest in usability.
For whom? They shouldn't necessarily spend a bunch of effort adapting their product towards the Unix world, particularly to the extent it compromies their product for their primary customer base.
Part of the problem about mainframes is that it has become really difficult for companies to find employees familiar with the environment. And learning the environment makes learning Unix look like a cakewalk.
This is a nice reminder about how one feels when one meets a piece of technology for which one's intuition can't offer anything useful, many "common sense" assumptions end up incorrect, and you don't even know the right words to feed to a web search engine.
I'll try to memorize this feeling, and remember it every time I try to explain something technical to people outside that field. Maybe it will help me explain better.
My introduction to z/OS mirrors the posters. A week of trying to google random keywords, copy JCL, waffle about the system trying to get anything done. Eventually copied out some JCL from a frame of a paused youtube video, and another four hours to track down a system library I needed. But the entire system is fascinating and it kept me going. The once unappealing 600page manual on IBM Enterprise Cobol became the defacto documentation, reading Redbooks and listening to the Terminal Talk podcast became valuable sources of information and the acronym soup started to be a little less fuzzy. Stack Overflow became nearly useless.
Then I found Master The Mainframe and got to play with a properly maintained LPAR with some (admittedly handhold-y guides). Joy! Non-crusty versions of z/OS. Did you know you can generate JSON with Cobol and I've managed to bolt this on to a webservice that interfaces with DB2? I sure as hell didn't! (z/OS Connect is a better way to do this though).
It's only been a month or two, but the amount of time I spend going against my intuition is beautiful. It's really made me reconsider the way I use/design computing facilities in other avenues. I'm not a professional or employed programmer, but this is the most fun I've had since playing with distributed computing, and in Cobol nonetheless. I even set up a 3270 styled blog due to it.
I remember when someone referred to Unix as the User Hostile Operating System.
They obviously never used MVS (z/OS's ancestor), OS/400 (now I/OS or something like that) or Burroughs' MCP (when an OS lends its name to a movie super villain, you have to respect it).
BTW, there's an interesting backstory on how MCP ended up being Tron's villain: Bonnie MacBird, who co-authored the story with Steven Lisberger, is Alan Kay's wife and Alan Kay was working at Unisys (or Burroughs) at the time she wrote it, as well as being an advisor to Lisberger and his partner and producer Donald Kushner.
> IBM uses special and completely unintuitive names for basic concepts…. because OF COURSE THEY DO.
As I recall, it's called "Bluespeak", and that sort of thing is pretty common actually. I was educated in networking at a Cisco netacad, so I use Cisco terminology that is apparently not universal.
Programming languages do this too for some reason: Sum type, tagged union, discriminated union, variant...
> IBM uses special and completely unintuitive names for basic concepts
Once upon a time I worked as a (young) field engineer looking after mostly Intel based kit, peripherals and Novell/Windows/Unix OS support. Our company was subcontracted to look after a bunch of Perle controllers for another maintenance company that didn't have engineering staff locally (these were on two and four hour onsite must-fix contracts). Perle manufactured a range of clones of IBM's 5294/5394/5494 Twinax remote access controllers that you plugged stuff like 5250 series terminals and printers into.
Anyway, I had to go on a training to course to learn about the gear, usual faults etc. But I also had to learn the IBM lingo such as asking the remote ops folks to "VARY ON" and "VARY OFF" (i.e. enable/disable) controllers when working on them. There were other, now long forgotten, incantations you needed to utter over the phone to IBM ops folks when on site, but the VARY ON/OFF one stuck with me.
As an aside, I also ended up looking after and field repairing a bunch of System/36's[0], in particular replacing hard disks which looked like:
Amazingly a sole engineer could carry out this task in about 30-40 minutes with no need for extra hands. These were well thought out and designed workhorses.
The point that M. Bellotti came close to, but missed, is that "of course they do" is because every field has jargon. The error here is in thinking that one set of jargon is intuitive and normal whilst another set of jargon is "special and completely unintuitive".
The simple truth is that there are many people to whom jargon such as "WIMP", "ISOs", "flat UI", "IIFE", "DOM", "pull request", and "UX" is equally as opaque and foreign as "DASD", "APAR", "SRC", "PMR", and "PTF" (http://jdebp.info./FGA/fix-terminology.html). All are in fact niche terminology.
IIRC, as the story goes the folks who designed IBM's SNA (Systems Network Architecture) went out of their way in order to come up with a whole new set of technical jargon for that. I don't remember why, though.
BTW, I've been around long enough to have had conversations like the following:
Them: "I want to run Lotus 1-2-3." (This was the near-universal spreadsheet standard before Microsoft Excel came along.)
Me: "OK, then first we're going to have to get you a PC."
Them: "What's a PC?"
Then I would have to explain that this was a "Personal Computer". And not just any personal computer, either, but rather an "IBM" PC, in order to distinguish it from an Apple or Commodore or TRS-80 or TI-99/4A or Atari or whatever. And they might respond that they didn't even know that IBM made personal computers. (Which they don't any longer, of course.)
I recently started learning more about git in order to use GitHub for work. I can confirm that "pull request" makes no sense whatsoever before you have a working mental model of git (even now, I don't think I could adequately explain it to someone else).
I would argue with programming languages that it's less bluespeak-esque and rather that they have a more or less one-to-one relation with the mathematical principles they are based on.
Often programming features end up with a math name (matching the element they are based on) and a developer friendly name which in some ways makes life easier (by separating high/low level discussions) and more confusing (everything now has different names which get used based on author/speaker preference).
I like that the kernel is called "nucleus". It's like parallel evolution, many concepts are the same, but they developed independently, so the naming is different.
I did not mind the names, at the time I had plenty of old mainframe hands around, who were actually happy I showed such an interest in their work, so they gladly took their time to answer any and all questions I had. Fun times... :-)
The names "kernel" and "nucleus" are not in fact different, let alone different concepts. "Control Program" is by far the more different name in this regard.
>> IBM uses special and completely unintuitive names for basic concepts…. because OF COURSE THEY DO.
> As I recall, it's called "Bluespeak", and that sort of thing is pretty common actually.
How much of that is because they invented terms for these concepts before our current terms were coined or became ubiquitous? It might be easy to forget, but IBM was once at the cutting edge of computing. They coined terms, others coined rival terms, and it wasn't at all clear whose terms would be ubiquitous in 2018. IBM changing to adopt another computing culture's terms would be akin to metrification: expensive, short term pain to abandon a good-enough system to achieve distant long-term benefits.
This article brings back some happy memories! Well done Marianne.
z/OS, TSO, JCL and the rest are different to what most people are used to but this is where modern IT started, where virtualisation, high availability and serious backward compatibility were invented.
Peel away the layers of technology is a large company and you will often find a mainframe managing the core data of the business.
Another shameless plug: I think the IBM 3270 (the ones with beam-spring keyboards) was the greatest terminal ever and I missed its screen font so much I had to recreate it.
It's now my most popular GitHub project and is included in the Debian repos:
Just a shameless plug: there is very little information in the form of Stack Overflow-like things and getting started with mainframes is, as this article shows, pretty hard. I made this SE proposal that's in commitment stage (where people commit to support the community). If you know about mainframes and are willing to be part of it, you should commit too.
The hard part for mainframes is that unlike Linux and the BSDs, you can't acquire a bunch of 4 year old Core i5 systems for $80 each (or sometimes free) and install a bunch of different OS varieties on them to test stuff in your home office or the bedroom of a bored teenager (example: centos hosting KVM VMs, debian + xen, freebsd on bare metal, etc).
With zSeries it's a bit harder, but for the iSeries (I still count AS/400 as alien enough to be "mainframes"), you can get machines for reasonable prices. The prices drop precipitously when that hardware is dropped from the last OS release.
For Z machines, you can use Hercules as an acceptable approximation. A couple questions I posted as example are about Hercules.
There's no legal way for a hobbyist to run anything better than MVS 3.8j on Hercules and even if you're not morally opposed to piracy, z/OS is virtually impossible to find and configure.
There is a Hercules-like product made by IBM that can legally run z/OS, the Z Development and Test Environment. It's expensive, but legal.
Also, a surprising number of concepts of MVS 3.8j are present in z/OS, so what you learn there is not wasted. As the article points out, a lot still spins around 80-column punched cards. If you go for the less bare distributions of MVS, you'll see lots of software that tries to replicate the functionality of newer releases.
As an example, I pinged Cincom Systems to see if they had an old version of Mantis that could run on MVS 3.8 that I could obtain for free (Mantis was the language I used on mainframes). To my surprise, that version is still a commercial product and is fully supported. Things dom change very slowly in mainframe land.
Don't forget that probably the only reason you can do a lot of that is because AT&T screwed up so badly legally back in the day, and lost many of their rights to Unix in the process. And if SCO had been successful in reasserting their rights to it not that long ago, you still might not be able to do it.
IBM, on the other hand, has always competently (and sometimes quite viciously) protected their products legally. They also have a massive patent portfolio, or at least they used to.
IIRC, Compaq (remember them?) managed to come up with a clean-room PC BIOS clone, which started that whole ball rolling. Then, after IBM tried to re-establish control with the MCA (which folks pretty much immediately started trying to work around in various nefarious ways), Compaq also came up with EISA, which kind of blew that out of the water. In fact, IBM's original decision to work with mostly off-the-shelf parts in designing their PC (instead of home-grown stuff) pretty much came back to bite them here, but they were still in the PC/server business for quite a long time.
As for their other hardware, these days they're pretty much the only game in town for mainframe and midrange systems. And they jealously guard those markets, as lethargic as these may currently be.
In the 'Allocate New Data Set' screenshot, it says - for the expiration date field - that the formats are 'YY/MM/DD, YYYY/MM/DD, YY.DDD, YYYY.DDD in Julian form, DDDD for retention periods in days or blank'. (I added some commas, which I assume are supposed to be there, but are just line breaks in the screenshot.)
Three things are being erroneously conflated here.
* The Julian calendar, which actually has nothing to do with this field on the panel.
* The Julian day number, which is actually a day count since a point in the 48th century BCE, and again nothing actually to do with the panel.
* The day number of the year, which is sometimes colloquially, but erroneously, referred to as the "Julian day".
The panel is making that very error. It used to be a fairly common error, and you can see its echoes in the likes of "j" being a format specifier in various systems that expands to the day of the year. It is not so common, now. But it still happens occasionally.
It does, In a similar vein Cobol has the ACCEPT DATE (YYMMDD) and ACCEPT DAY (YYDDD) constructs for retrieving the date. There's an optional argument for supporting four digit year codes however. Julian dates pop up in some wonky places across the system.
Today is July 31, 2018 in the United States, which adopted the Gregorian calender per the act of British parliament entitled "An Act for Regulating the Commencement of the Year; and for Correcting the Calendar now in Use" (24 Geo. 2 c. 23) by skipping September 3-13, 1752 (obviously this was prior to the American revolution).
In the Julian calendar, there are leap days every four years. In the Gregorian calendar, there are leap years every four years except every 100 years except every 400 years. That is, 2000 and 2004 are leap years, but 1900 is not a leap year.
Since 100, 200, 300, 500, 600, 700, 900, 1000, 1100, 1300, 1400, 1500, 1700, 1800, and 1900 were leap years in the Julian calendar but not the Gregorian calendar, one must subtract 15 days to get to the Gregorian date to get the Julian date.
Today is July 18, 2018 in the Julian calendar.
The Soviet Union adopted the Gregorian calendar in 1918 by skipping February 1-13. They were the last. This is why the famous February revolution took place in March and why the October revolution took place in November. (Though I noticed Reuters announced it had been 100 years last year on the wrong day.)
Mainframes are weird, but I really doubt they don't use the Gregorian calendar whatever the text on the screen says.
The mainframe infrastructure is both the core feature and showstopper bug of the platform. On the plus side, it has empowered countless core business applications to scale years, even decades, beyond their original development lifecycle. On the negative side, it has created a stagnant cult of mainframe “priests” who oversee a platform that most businesses would rather get rid of, if they could only figure out how.
My startup was acquired by BMC software, one of the largest third party providers of mainframe software, and I also worked for EMC for a time early in my career, so I spent a lot of time hanging around the mainframe space. Two quick anecdotes:
- working on a giant data center move for a big bank, there was a break room filled on a Saturday with operations people from the “open systems” (non-mainframe) and mainframe teams. I was struck by the contrast: the Unix/windows operations folks were tattooed, pierced, jeans and t-shirts, late 20s. The mainframe folks were short hair, polos and jeans, mid-50s. The two groups self segregated into clusters on opposite ends of the room, not speaking to each other.
- at a BMC leadership offsite, I found myself st the dinner table with a couple of mainframe engineering and product directors. After a few drinks, we got on the subject of why more tech companies weren’t using mainframes. “They’re so reliable! Why wouldn’t you just go out and rent one from IBM, and then you don’t have to worry about uptime or stability?”. I explained that the fashion had become to build the reliability into the application and assume that the hardware was not reliable. This confused them. “That sounds like a huge pain in the ass, why would you bother dealing with that?” So I explained that the cost per compute cycle was so much cheaper that it made it worth it, plus open source, etc etc
One of them kept pressing: “but Facebook could just take all the engineers they’re devoting to building reliable infrastructure and shift those people to writing customer facing code!”
The other director stopped him and said, “don’t you see? They’ve already done it. There’s no reason for them to go back now. People have figured out that they don’t need mainframes to get mainframe reliability”. The other director just kept shaking his head and we moved to other subjects.
I think that, within the context of that story, both directors are missing the deeper issue: A company's technology decisions aren't just about the final state, they're also about getting the project bootstrapped, and all the other technology decisions between that point and the final state. So it's not just that Facebook had already done it their way, it's that, along the way, they never had any realistic alternatives that led toward a mainframe solution.
Having never worked directly with a mainframe, I can't really speak to the reliability question. What I do know, though, is that I've spent my entire life living around PC hardware, working on PC hardware, etc, because that's the hardware that's designed to fit into my day-to-day life. So, if I have a new idea, and want to try it out, I'm going to reach for the hardware I already have, because it's basically free. Just like Mark Zuckerberg did.
And at that point, I'm already on a path that leads inexorably toward building the reliability into the software. There's simply never going to be any point at which it makes sense for me to scrap everything I have so far and do a complete rewrite for a different platform.
>I think that, within the context of that story, both directors are missing the deeper issue: A company's technology decisions aren't just about the final state, they're also about getting the project bootstrapped, and all the other technology decisions between that point and the final state. So it's not just that Facebook had already done it their way, it's that, along the way, they never had any realistic alternatives that led toward a mainframe solution.
Absolutely agreed, "path dependency" is a big issue that comes up in a lot of different contexts both in the natural world and human endeavors. Just because from a high level or after the fact someone can identify a much more optimal final result doesn't mean that result was actually realistic to get to, or would be worth moving to from whatever local minima a project ended up in. I think path dependency plays a major part in what makes "disruption" possible, and in how businesses can sometimes be eaten from the bottom. A lot of companies end up in a state where they have a bunch of end point products but they haven't considered the paths necessary for users to reach that end point. Without a ramp their pipe can slowly empty.
> There's simply never going to be any point at which it makes sense for me to scrap everything I have so far and do a complete rewrite for a different platform.
Yet people still often claim that it makes sense to go to all the time and trouble and expense of moving off of reliable legacy platforms onto other platforms which aren't naturally anywhere near as reliable. Imagine that!
One of my older coworkers told me that IBM used to have a travelling mainframe demo in the back of a truck. As part of this demo, they would start a big computation, and then reach into the mainframe and start pulling out RAM and entire CPUs straight from the sockets, while the thing kept right on computing.
ZFS developers like to do this with hard drives :) But how did IBM do that with RAM? Do mainframes have… redundant RAM? Like literally sticks mirroring others?
But that was only introduced in 2010, so either the timeframe I was guessing at is wrong, or I'm misremembering what component it was. Could have been disk drives maybe.
I have seen one PC-based server that had redundant RAM. I never tried pulling out RAM modules, but I guess there is not much point in having a RAID1-like arrangement for RAM if the machine cannot handle random RAM modules dying without crashing the OS. And it wasn't even a super high-end system.
Memory mirroring is a pretty common feature on basically every x86 server I've used (going back to the Core 2-based Xeon's, no experience with older platforms), though I've never tried just yanking DIMM's out and wouldn't recommend trying it.
FWIW, you could do the same thing with a Spark cluster. It might not handle it quite as gracefully, since you'd be taking down entire compute nodes, and probably some of the data storage along with them, but the computation would also be able to continue.
Two of them, I believe - the master for the cluster as a whole, and, for any individual job, its driver.
I wasn't really meaning to say that a Spark cluster is as robust as a mainframe (though maybe someone's figured out some tricks), more that I could totally see some sales folks conducting the same demo using something like Spark.
If you're using Spark's built-in scheduler then the cluster manager is a SPOF. Hadoop docs say you can get active/standy ResourceManager, not that I've tried it. Spark can also use k8s and nomad to schedule executors, and those have HA modes as well. I assume Mesos does HA.
You're still boned if the driver dies. I am pretty sure that the driver keeps some important state in RAM so if the node hosting it goes down you have to restart from the beginning, even if the cluster manager restarts the driver.
And there are stories of IBM hardware surviving fires and floods and coming back to life after not much more than a minor cleanup and a reboot. I once personally dealt with an IBM system that got hit by lightning. It crashed and took its own sweet time coming back up but it did come back up, with no special intervention, having only lost a few RS-232 comm cards to the lightning.
Meanwhile, these days you have folks running "start of the art" systems which may just roll over and die for no discernible reason. These may not easily come back up, either, if at all. That's why they have to have so many of them!
Were they pulling out compute resources they knew in advance were not being used or part of the active partition for that particular job? Or did they let potential customers do the pulling themselves?
Reminds me of an IBM technical bulletin I found, somehow, about 10 years ago. It advised that if you were moving from this z9 hardware configuration to that one, you would have 30 minutes of downtime. Implying that other changes required no downtime, which I inferred wasn't stated because it wasn't insanely awesome to mainframe guys, adding CPUs and RAM to a running system was just how things worked.
I was in college at the time and looked at job postings for mainframe programmers because I wanted to work on that. Never found one that required less than 5-10 years experience, not then and not when I've checked every couple years after that. Too bad; I quite liked the idea of working in an ecosystem that starts from "let's make this work every single time" instead of "let's make this work well enough to keep customer complaints to a dull roar".
The sales job Amazon et al did in convincing the world resilience was the job of app developers, not the system vendor, was quite impressive. And while it's true that commodity folks could stand to learn from mainframe folks - serverless looks quite a lot like a forty years on re-invention of CICS if you squint a bit - your story reminds me how profoundly true the opposite is, as well.
I've been part of a team that ran Linux on s390x, and it was a great exercise in demonstrating that however reliable your mainframe, it's still a SPOF. Even if you don't have my bad luck of four catastrophic hardware failures in as many years, your one reliable box still depends on the power, network fabric, storage fabric, etc at a single site. If you want to avoid losing your business when a sparky screws up, you need... several mainframes. At which point you're spending an absolutely astronomical amount to either sysplex, or you're building resilience into your applications, just like a bitty box.
And yes, so many mainframe folks are so very out of touch with the broader world. In around 2016 I had the supposed regional tech expert on z-Series system lecture me (rather snidely) on how:
1. s390 processors were 20 - 40 times more performant that Intel processors (demonstrably not true for any workload I cared about).
2. You could not do virtualisation on x86_64. It was impossible. With a straight face, in the year 2016, this guy told me, and clearly believed, that it was not possible to run heterogenous virtualised workloads on an Intel processor. Apparently tens of billions of Jeff Bezos' net worth literally did not exist for him.
There was more, but it was like talking to someone who had been frozen in a block of ice since 1996.
Given what is currently known today (there may be more bombshells yet to come) about x86 system- and chip-level vulnerabilities, both unintentional and intentional, I'm not sure that any sane CxO would allow their data to come anywhere near the platform if they were starting all over today again from scratch. Not that I would necessarily trust any other vendor either (especially not Microsoft), but still ...
Always remember (and don't ever forget) that the only reason most of us have even heard of Intel and Microsoft is because of their relationship to IBM back in the day. They weren't chosen, nor did they ascend to great heights, based the quality of their products. Rather IBM chose them mostly for the sake of its own convenience, and then they leveraged that relationship to the hilt.
Indeed, but what is Intel up to these days - at least eight different variations of attack here? And let's not forget about the whole Intel ME thing, either.
BTW, just because a speculative execution attack or whatever is theoretically possible on an IBM system doesn't necessarily mean that it can be carried out in any practical sense, given the overall design of those systems and their general level of built-in security. But IBM can't just sit there and ignore the possibility, either.
Another problem they have these days relates to open source software and such, which they've been porting to their platforms much more lately. If a security patch for that software comes out then they still have to apply it, even if there may be no practical way to exploit it on their systems. And it can be quite unnerving to see long lists of such patches show up on a regular basis for systems which are otherwise generally considered to be rock solid.
If you listen to the folks over at Terminal Talk (http://terminaltalk.net), they routinely question the wisdom of all those server folks having to build out and maintain infrastructure that just comes built-in to the mainframe. And from what I gather a lot of those server folks (or at least their bosses) may be starting to question that now, too.
As for Facebook and Google and Netflix and so on, you have to remember that for all their claims of reliability and such, their stuff really only needs to be "good enough". But good enough generally just doesn't cut it when dealing with things like financials and such.
BTW, I have a colleague who works for a massive corporation - one that is still recovering from a ransomware attack which has so far cost them at least $300 million. They have thousands of servers, and she says that it has now become IT's full-time job just to keep those updated and patched. (There's apparently little or no time these days for silly little things like development and testing and code loads and such.) I didn't ask her if they are still planning to replace their few remaining mainframes (I'm guessing they're in no big hurry now), but she did tell me that upper management currently thinks that "The Cloud" may yet be their road to salvation.
I find this sort of complexity absolutely fascinating: I imagine it is similar intellectually to learning of a new order or family in biology: you understand the mechanisms and underpinning goals, but the mechanisms and results are wholly foreign to you.
Had a similar experience with the Bloomberg terminal, which is a similar evolutionary offshoot. (Think "what if the GUI had never happened, but the 3270 form-style CLI had gained a mouse and graphical representations?")
I mean, DOS had plenty of pseudo-graphical interfaces in the 90s: Norton Commander, Borland IDEs and such, down to the use of the mouse―so it's rather weird seeing people treating that paradigm as completely alien.
I am explaining it quite poorly. It's different from Commander etc because they were (to my mind) essentially WIMP interfaces. You could use them with a keyboard, but it wasn't really the paradigm.
BBG's terminal is fundamentally still keyboard-driven: you can use the mouse, but in the same way you can use a mouse with emacs: you're going to be driven back to the keyboard sooner or later, so you might as well stay there.
There are some introductions on YouTube, but they're all horrible. I'll see if I can find a good one.
I demember doing virtualisation/containerisation for Linux on various platforms (including s/390 and z/VM but also VMware and Xen) at IBM. No matter what the platform was, the PMs would refer to the VM/container as an 'LPAR' (logical partition). Fun times.
Me too! I setup a couple of Linux LPAR on AS/400 - iSeries running OS/400. The distro was RedHat for PowerPC. One LPAR was for a Linux/Apache/PHP stack that used the DB2/400 database on the iSeries, and the other was to host an Oracle 8i instance. DB2/400 was a great RDBMS but there was an application requirement for an Oracle database, so the director of IT at the time said that if a DB is required then it must go on the iSeries since it hosts the primary DB the company used. I didn't care for his rationale, I just thought it was fun to be working with LPAR on the iSeries.
Hah I ran Fedora PPC on a Mac Mini around the same time. It was super cool because could could read all the firmware info - serial numbers, memory slots etc - as files in /proc rather than using 'dmidecode' like an Intel machine. Felt way more Unixy.
Unlike so many "modern" systems, these systems are designed to let you to put all of your eggs in one basket if you want to, and never even break a sweat! Their built-in level of reliability (they're almost bullet-proof) generally allows this, too. You may very well have legitimate reasons for not doing it, though, including regulatory requirements and such.
The AS/400 descendants are not as bullet proof as the zSeries. They are POWER servers similar to high-end Intel boxes, mostly the same hardware as their AIX machines - RAID, ECC memory, multiple PSUs but, IIRC, won't easily recover from failed CPUs or anything like that.
I'm not going to disagree with this, at least not concerning the typical POWER i box. But for the higher-end boxes at least (the ones that can handle 16 LPARS or more - or at least they could back in the day), I expect that there's considerable redundancy built-in. I doubt it's to the level of a higher-end zSeries, though.
It's been a long time now since I've managed hardware so I don't really know what the current situation is. But I happen to have ready access to a hardware document from around 2008 (the last time that I worked with such large system), and it lists a whole slew of RAS features that were copied from the mainframe. I expect that if I tracked down the corresponding document for a current high-end system that it would list even more.
That said, it was generally my experience that the whole platform family (low end to high end, new systems and old) was just rock-solid reliable - not at all like "The server crashed again - time to reboot it!" situation I usually found on the Wintel side of things. Plus stuff like malware was practically unheard of. And I've worked on systems that at any given moment might have thousands of users on them, and maybe tens of thousands to hundreds of thousands of jobs. From an operational perspective this might not have necessarily been the best idea, for a variety of reasons, but at least the system could handle it easily. And much like mainframes, unexpected outages were basically unheard of - if they ever happened they were extraordinary, jaw-dropping events. I'm not claiming the platform was/is perfect, though.
Whilst the ways that one does command-line stuff and full-screen editing with block-oriented (rather than character-oritented) terminals are an important difference to learn, there are, of course, things here that are not different.
* There's still "object" code, and a "link (edit)" step.
* The function key mappings such as F3 for Exit are Common User Access standards, and were to be found in some Microsoft and IBM softwares for PC operating systems in the 1980s; most prominently perhaps in the various "E" editors and clones thereof available on PC-DOS and OS/2.
* The underlines for the menu item hotkeys are also Common User Access things, as is F10 for bringing focus to the menu.
M. Bellotti would gain by learning about the "TE" line command. Xe would also gain by not abusing the word "legacy", especially since the so-called "legacy paradigm" of IBM's "panels" on block-oriented terminals is pretty much the same paradigm as forms on a WWW browser. (-:
FWIW, this book ("Fake Your Way Through Minis and Mainframes" by Bob DuCharme) provides a good short user-level introduction to z/OS (and VM, and VMS, and...):
Hilarious! (disclosure: I am a mainframe systems op/dev)
The best part is where they didn't know about the INSERT key. I can understand z and ISPF being a hassle if you're used to modern computing but the mainframe logic is actually logical, and probably predates whatever modern stuff you're on to... like "files". :)
Whenever I've laid hands on unfamiliar systems and their terminals, which has been a quite frequent occurrence over the years, one of the first things I've usually tried to do is get my hands on the terminal's handbook (or at least the condensed cheat-sheet version of it) to learn what all of its keys and key combinations did. (In this case the same would go for ISPF.) Assuming that the author is using some kind of 3270 emulation software, then she should have access to this stuff somewhere. In other words, she probably just needs to RTFM!
As someone who has been in IT for decades now and who over the years has moved between IBM, Wang, Sperry, DEC, Unix, DOS, Windows, Teradata, etc. platforms with aplomb, this article just has "Noob!" written all over it. :)
IBM is a special company that seems to not want anyone to use its products. They sure go out of there way to make things hard to access. But they keep them around forever. Happy to sell them to the right customer. So many overlapping products that don't make sense in aggregate. The company seems to be marketing-driven the last 10+ years, selling all manner of snake oil.
They're not interested in anyone but people willing to write million dollar plus checks using their products. Those same giant customers have large legacy investments and plenty of money to hire teams of people to deal with things that are hard to use. Those kinds of customers care much more about things working for 30+ years without downtime than they care about UX.
There's a whole world of solid, reliable, but unbelievably boring "institutional computing" out there that hackers usually don't touch because it's... well... boring. Java is the closest most hackers ever get, and that has more in common with Python or Ruby than COBOL or z/OS. Java is kind of a modernized mini/micro computer business language.
Sure z/OS is boring if your job is to write Cobol programs but if you are interested in sophisticated operating systems packed with very smart algorithms then it's a different matter.
You can read about amazing hardware and IBM's ability to fuse hardware, microcode and software capabilities that keep pushing the ability to solve business problems. Yes, most of this is needed to extract value out of incredibly expensive technology.
> There's a whole world of solid, reliable, but unbelievably boring "institutional computing" out there that hackers usually don't touch because it's... well... boring.
I can imagine if the salaries would be sufficiently (i.e. very) high and the complete culture would not be openly hostile towards the values of the hacker culture, I can easily imagine that hackers would be willing to touch it.
Return and enter and two different keys, but, on most modern systems, they perform a similar function. On z/OS Return moves down a line (similar to Tab, but ignoring all entries on the current line) and Enter actually sends to data off.
Once you get used to it, it's really no different to the Linux or Windows command lines. It's certainly dated, but that's what you get from running a system designed to be fully backwards compatible (with 24-bit, 31-bit and 64-bit addressing modes) that can continue to run software that's over 40 years old.
[For reference, the mainframe originally had 24-bit addressing. When IBM wanted to add 32-bit addressing, they found that people had been using the remaining byte to store other data, such as flags. So, to avoid breaking customer applications, the 32nd bit is used to identify whether the address is 24-bits or 31-bits]
((And yes, for the record, I am an IBMer, working in a z/OS product that's over 40 years old))