Medical Equipment Crashes During Heart Procedure Because of Antivirus Scan

Avernar · on May 7, 2016

"Merge says the antivirus froze access to crucial data acquired during the heart catheterization. Unable to access real-time data, the app crashed spectacularly.

The company claims that they included proper instructions in their documentation, advising companies to whitelist Merge Hemo's folders in order to prevent crashes from happening, so it seems that the whole incident was nothing more than an oversight on the medical unit's side."

Here's how I read that: The programmers of this piece of software assumed that some I/O operation would never fail and when it does the program shits itself. So instead of hardening their software to withstand loss of telemetry gracefully, which would cost time and money for the company, they just give instructions to disable scans on their folder.

Odds are good that somewhere this scan will happen (and it did). Either IT doesn't read the release notes or goofs the configuration or an antivirus update clears the white list. Might not even be the antivirus that interferes with the telemetry briefly.

But instead of having resilient software it's "the anitvirus software's fault" or "it's IT's fault" when something goes wrong because of their bad management/engineering decision.

sathackr · on May 7, 2016

Exactly this. As I was reading the article I hoped to find this exact point in the HN comments.

The fault lies in the bad software. It could have been the indexing service, online defrag, automatic updates, or any of the other various background processes windows runs.

If it is critical software, it should be designed in a way to not fail when something non-critical malfunctions, and even the critical pieces should be built with redundancy.

scruple · on May 8, 2016

I work for a medical devices company and I just want to say: We, specifically a few of us on the engineering staff, bring this sort of shit up constantly. I go hoarse having the same conversations over and over and over again about robustness in the face of failure, resiliency, redundancy, etc... The truth is that we're beholden to a board and an executive management team that, quite simply, doesn't give a fuck about our problems.

I'm not trying to excuse the company in the article or the company that I work for. And I do not work for the company in the article. I just wanted to point out that I do see how this can happen very easily and repeatedly.

emcrazyone · on May 8, 2016

I'm just curious. I work in the automotive sector and develop hardware and software using components that are advertised as functionally safe. I use harden RTOS from vendors who claim their RTOSes are in medical devices as well as military systems.

One such system is Disti (http://www.disti.com/)

In the automotive field, our software is MISRA compliant, static analysis is done (Klockworks - http://www.klocwork.com/) and we follow a very strict set of guidelines outlined in ISO25119 and ISO26262 for the construction and agricultural markets. Think self driving tractors and combines. For example: A tractor traveling down a field with a combine following it a few rows over separating and chopping things into a catcher all done with one person driving.

This shit can't happen where I work. Every component on our circuit boards has a MTTFd of 40 years. Hardware watchdogs can kill the system if software goes awry.

Software is written to readiness level called SRL-1, SRL-2, etc... Unit tests, peer reviews, etc... Functional safety in medical devices is covered under 510(K) (http://www.fda.gov/MedicalDevices/ProductsandMedicalProcedur...)

I find it amazingly short sited that antivirus software is even allowed on a medical device to begin with. I can't even imagine how this system even passed the easiest audit for software readiness.

How is it you "go hoarse having the same conversations?" Do you not have to meet FDA compliance criteria? Are you in the US?

I didn't read the whole article so I'm assuming this happened in the US. For me, we sell autonomous vehicles in the European markets where functional safety seems to be a bit more aggressive there right now for vehicles. Not sure about medical devices.

scruple · on May 8, 2016

> I find it amazingly short sited that antivirus software is even allowed on a medical device to begin with.

Well... Then you should consider yourself blessed to have never had to deal with the bureaucracy of a hospital IT department and administrative staff.

Who owns the medical device? Who paid for it? If it's a glorified Windows machine and it's attaching itself to a hospitals WiFi network... Who has to use this machine? Physicians, surgeons, anesthesiologists, radiologists, other specialists, nurses, staff? All of them need to be trained on it's usage, no doubt. They don't get that training in schooling. Who provides it? This and a million other things stack up. So, well, I mean it can start to make sense how these things end up with random AV software installed on them, right?

> How is it you "go hoarse having the same conversations?" Do you not have to meet FDA compliance criteria? Are you in the US?

Yes, we are. Yes, we do "have to meet FDA compliance." I can't define "have to meet" and I work here. Of course, I'm just an engineer. We have legal, executive, and other staff for those matters. I'm sorry, I'm not trying to be an asshole... I'm just trying to be honest about where I find myself in this situation.

chris_wot · on May 8, 2016

Sounds like an awesome job with good engineers but neglectful and irresponsible management.

If you are only making these warnings verbally, you might want to consider emailing your immediate manager with a list of concerns. Make it as neutral as possible and ask for guidance on how they want to address the issues. But if it's on the mail server, it will be good for discovery if the worst happens, and frankly given lives are at stake you probably need to show, in writing, that you were attempting to have the issues addressed.

Who knows? That might actually get traction. Might even save someone's life!

JackdawX · on May 8, 2016

> Yes, we do "have to meet FDA compliance." I can't define "have to meet" and I work here. Of course, I'm just an engineer.

You are not an engineer. This is a protected term in the US and other countries. If you were a professional engineer, you would be bound by a legal and moral framework preventing you from doing work on unsafe medical equipment.

There is a good argument that there should be a software equivalent of protected engineer status for this kind of work. This kind of story should be a wake up call. I personally had no idea that critical medical equipment would be running on MS windows...

lfowles · on May 8, 2016

Engineer alone is not a protected term in the US. "Professional Engineer" is.

As of 2012 you can take the PE Exam for Software Engineering [1].

[1]: http://ncees.org/about-ncees/news/ncees-introduces-pe-exam-f...

JackdawX · on May 9, 2016

Ahh, guess my info is out of date, thanks.

vardump · on May 8, 2016

> How is it you "go hoarse having the same conversations?" Do you not have to meet FDA compliance criteria? Are you in the US?

You have to deal with FDA pretty much regardless where you're based, if you want any kind of market for your medical device. A lot of countries define compliance as whatever is good enough for FDA.

FDA rules for software... what FDA wants is a paper trail.

sathackr · on May 8, 2016

I'm with you...I can see exactly how this can happen.

Unfortunately the only thing that can solve the apathetic board and executive management problem(who only see dollar signs) is the actuality, or realistic possibility, of significant financial loss, or loss of their personal freedom(prison) due to the negligence of the system. And a $10 Mil fine for a fault in something that you make $100 Mil off of is not significant. That's $90 Mil profit in their eyes. And they probably get to write it off.

Even more unfortunate, is that, in the situation that this happens, the "engineers responsible" will be fired, and the executives will resign with a nice golden parachute, and go on to do the same thing somewhere else.

But then you have the company that does do it right, spend the time, and the money to make a truly redundant, fault-tolerant system. But, they come in at a price point 20% higher than their competitor, who doesn't. Which company survives and which doesn't?

Sad, but, unfortunately the way it is. I don't know a practical solution either.

scruple · on May 8, 2016

I've thought about this a lot. I've had private conversations with the CEO which lead me to believe that their apathy is a, if not the, primary driver in this situation, at least within the company. Ultimately, they are the single individual who can force these changes in the departments. As things stand today, as far as I can tell, the CEO and the rest of the executive team got theirs and that's that. Anything extra is just that, extra.

We've been close to undergoing "major" scrutiny (as it was sold to me, it was A Big Deal) from the FDA before. I, personally, just a lowly and underpaid engineer, have saved executive staff from having to sign their names on that noose. I had a manager once who seemed to want to push it that far, to stand idle-by while the walls fell down around us. I, unknowingly at the time, prevented it from happening because I was trying to help our customers. I don't regret that decision, actual patients shouldn't have to suffer because of a management teams ineptitude. I do think about it often, though. I understand this is nebulous, and I'm sorry for that. This is a reality, though.

I guess that's the thing that really gets me, the FDA. We sell FDA approved devices. Where the fuck is the FDA? We send them paperwork and they are happy. I can only form the opinion that they, the FDA, are ill prepared to handle this situation; The actual situation, the "the medical devices industry is a fucking train wreck waiting to happen" situation, and especially so they are ill prepared to handle it at scale. Audits are cursory and almost as a rule non-technical. I suppose it'll take a Toyota-level incident to bring change about.

sathackr · on May 8, 2016

Along the same lines as your 'where the fuck is the FDA' comment -- I've worked in Financial and Healthcare systems on and off for about the last 10 years.

I have seen SSAE16 audited companies that haven't patched anything in years. FDIC examined institutions with ATM machines still running OS/2 Warp(actually probably more secure than the ones running XP, with no updates installed. Ever.)

I once found the management interface of a SAN with a public IP address directly on the device, no firewall rules of any sort, and the device still had the default username/password. It hadn't been patched or rebooted in over 2 years.

More shocking is that a review of the logs didn't show any successful unauthorized logins. Of course, they could have cleaned up after themselves, but further investigation was outside the scope of my engagement(They didn't want to know. They were happy to present that, despite the oversight, there was no indication that PHI had been accessed by unauthorized people. Their conclusion, not mine.)

chris_wot · on May 8, 2016

I can't help responding again. If you have tangible evidence of neglect or regulatory non-compliance, or even risks that are known about but not being dealt with by management - have you considered compiling this material and and reporting it to the FDA?

But as I've said before - I really hope you have written down your concerns to someone in management. If it gets to the point where negligence takes out the company, there's going to be an attempt to make someone a scapegoat. Depending on your role in the company you don't want to be held personally liable for the incompetence and ruthlessness of management...

noonespecial · on May 8, 2016

>Where the fuck is the FDA? We send them paperwork and they are happy.

When regulation becomes more about permission than proficiency, you'll get corruption instead of competence.

limelight · on May 8, 2016

> Unfortunately the only thing that can solve the apathetic board and executive management problem(who only see dollar signs) is the actuality, or realistic possibility, of significant financial loss, or loss of their personal freedom(prison) due to the negligence of the system.

Or developers refuse to build software without safety built in.

If they can't hire anyone to build their unsafe systems, they'll have to start building safe software.

Let the market work for you.

sathackr · on May 8, 2016

That sounds nice...but then you will be replaced by a developer that will toe the company line. You're making 'unreasonable' demands and holding up progress. 'We can fix that with version 2.0'

If every developer on the planet suddenly had a pang of consciousness, then something like this would work.

Fortunately I have never found myself in such a position, but I have seen it many many times.

limelight · on May 8, 2016

That's why we should probably require engineering certifications for working on safety-critical software. Working on such software should require demonstrating a certain level of knowledge and upholding a code of ethics.

I generally oppose certification for engineers, but solving collective action dilemmas like this and saving lives in the process is exactly where it would help.

sathackr · on May 8, 2016

How do you ensure someone upholds a code of ethics? Licensing is not the answer. I'm sure there are many PEs that find themselves in similar situations.

I know examples of people in licensed fields who have sworn to uphold a code of ethics, but have been caught up in very similar situations.

I can't find it now but I just saw a video recently of a rail bridge with a crumbling foundation that had just been signed off on by a PE and declared safe by the railroad.

WalterBright · on May 8, 2016

> get to write it off

A fine being tax deductible does not mean zero cost to the company, it means the profit is reduced before taxes are computed, i.e. the actual cost is reduced by the marginal tax rate. A tax credit means zero cost.

limelight · on May 8, 2016

It's not a decision which should be made at the level of executives though.

Presumably developers are the one's estimating how long things take. (If they're not, you have even bigger problems and I'm sorry.) The time to make it safe should automatically be included in those estimates.

Moreover, making it safe shouldn't be a separate part of the process. It should just be part of how you write software. It's either safe or it doesn't exist at all. (Compare this to how organizations like Google deal with concurrency: it's built in from the start.)

A reputable engineer wouldn't design and build a bridge which might collapse. A developer shouldn't build software which puts lives at risk, regardless of management pressure.

If they refuse to relent, there are plenty of jobs where safety isn't critical.

GlideCleanser · on May 8, 2016

> Presumably developers are the one's estimating how long things take.

This is not meant as a slight: I think you're grossly unfamiliar with software development outside of engineering-driven companies.

It's pretty much a guarantee that product managers are deciding these estimates. They might confirm with the developers, but the conversation probably went something like this:

"Does 3 weeks sound about right for this?"

"No, we'll need 6"

"Why?"

"Safety checks"

"Ok, we don't have 6 weeks. I can give you 4, but we're just gonna have to make do."

Is it scary that conversation happened about a piece of medical software? Absolutely. Would I bet $1k that it happens frequently? Absolutely.

> A reputable engineer wouldn't design and build a bridge which might collapse

Rarely does a single engineer design a bridge nowadays, so corporate liability and reputation (good luck landing more contracts if your bridge collapses) is a huge factor in much of that beyond simple ethics.

I would be shocked if anything happened to Merge as a result of this, whereas a company who designed a faulty bridge would be sued into oblivion.

Further, professional engineering in the US is a whole different game that involves licensing and regulations specifically to avoid that situation. Software "engineering" has no such equivalent currently.

Pinning the blame on the peons is a sure-fire way to make sure this situation never changes.

limelight · on May 8, 2016

Oh, I'm well aware of the difficulty of negotiating with product managers over timelines.

The difference is that they never should get the decision to cut safety checks. Cutting safety checks should be as ludicrous/impossible as writing half the code of each function to cut time.

The conversation should go like this:

PM: "Does 3 weeks sound about right for this?"

Dev: "No, we'll need 6"

PM: "Why?"

Dev: "That's how long it takes to build those 6 features."

PM: "Ok, we don't have 6 weeks. I can give you 4, but we're just gonna have to make do."

Dev: "Okay, which features would you like to cut?"

> Further, professional engineering in the US is a whole different game that involves licensing and regulations specifically to avoid that situation.

I'm aware. While I don't think the majority of software developers should be certified, we should require licensing for working on safety-critical applications.

GlideCleanser · on May 8, 2016

> The conversation should go like this:

I think you're missing the end to that conversation:

>PM: "Ok, we don't have 6 weeks. I can give you 4, but we're just gonna have to make do."

> Dev: "Okay, which features would you like to cut?"

PM: We can't cut any of them. We need features A,B,C in the product and we need it in 4 weeks.

Here we insert a rant from the PM about one of the following:

1) Leadership

2) Hard work

3) Threats about job security

4) Recalling that one time you delivered something ahead of schedule so why is this different

5) I see you getting up to get coffee at least twice a day so stop goofing off and get it done

I think you're vastly overestimating how much power/control said Dev has over the whole process at these sorts of companies.

Sure, they can quit, but if they felt empowered to quit they probably wouldn't be there in the first place: I don't think anyone's busting down the door to work at MedicalBusinessTM.

> we should require licensing for working on safety-critical applications.

Fully agreed, though with some misgivings.

Incorporating safety-critical software into the "professional engineering" spectrum would almost certainly require some things that are seen as near-heresy to the software community, like requiring a 4-year degree from an ABET-accredited program.

Still, I agree.

limelight · on May 8, 2016

I've managed to push back on PMs many times by redirecting them to trade between time and features (so they still feel like they're in control). That being said, you're right that I would never work somewhere that treats developers so porly.

> Incorporating safety-critical software into the "professional engineering" spectrum would almost certainly require some things that are seen as near-heresy to the software community, like requiring a 4-year degree from an ABET-accredited program.

The vast majority of software isn't safety-critical, so there would still be plenty of opportunities for developers who don't fit into rigid modes.

I 100% oppose having accreditations for all developers.

alephu5 · on May 8, 2016

I agree with your sentiment 100%, but if you insist on doing things safely while your colleagues do not, you might get a reputation for being slow and be earmarked for replacement. Perhaps it's worth losing a job over, but your replacement will cut corners so the net effect is patients unsafe + you have no job. It feels reminiscent of the prisoner's dilemma.

limelight · on May 8, 2016

Accreditations and professional standards are literally textbook solutions for solving prisoner's dilemmas.

mikekij · on May 10, 2016

This. I'm in the same situation.

brians · on May 8, 2016

There are lots of faults. The software failed. The process that directed a helpdesk tech to install AV was a failure of some manager. The decision to engineer systems and networks in a way such that AV seemed like a good idea was a failure of an architect.

jevinskie · on May 8, 2016

In my opinion, the software failed because the entire system (software, hardware, and humanware) failed to implement, holistically, a safety critical system. You simply cannot ignore the system as a whole. I'd wager we are in violent agreement. :)

dTal · on May 9, 2016

>The decision to engineer systems and networks in a way such that AV seemed like a good idea was a failure of an architect.

As ever, a relevant xkcd: https://xkcd.com/463/

LeifCarrotson · on May 8, 2016

I build software that does the exact same thing. We're running automotive tests, and our management/customers are unwilling to invest in solutions that will work in spite of the fact that Windows is not a real-time OS.

We use a National Instruments DAQ card, and need the PC to respond within 50 ms to issue new commands for hours or days. Remarkably, it usually (over hundreds of machines and decades of operation) does. When it doesn't, it's blamed on antivirus or firewall or technicians using the PC for other things while the software runs.

National Instruments provides real-time IO systems, but they cost a lot more than the basic systems. You can write driver-layer code that will run in real-time on Windows, but that takes longer.

Our customers and management, with varying levels of comprehension of the problem, elect to not spend that money. I hate to say it, but if we didn't make this compromise, there are competitors who would.

vardump · on May 8, 2016

> We use a National Instruments DAQ card, and need the PC to respond within 50 ms to issue new commands for hours or days. Remarkably, it usually (over hundreds of machines and decades of operation) does. When it doesn't, it's blamed on antivirus or firewall or technicians using the PC for other things while the software runs.

It works as long as full code path and data it requires is not paged out. Or some other thread doesn't consume I/O resources, etc.

In other words, it's not guaranteed at all.

Only way to get Windows to react reliably within 50 ms is in a kernel driver, as response to an IRQ. There's considerable jitter even in IRQ, but usually worst case service times are 200-500 microseconds. Depends a lot on other devices and on your IRQ priority. It's worse for passive level drivers (IRQL == 0).

50 ms guaranteed response time requires the code and data is in non-paged pool.

sillysaurus3 · on May 7, 2016

Software shouldn't necessarily try to account for errors in that manner. Usually, the most graceful thing to do is to exit cleanly.

For example, if there is a massive amount of data, it has to be stored on disk. It's too large to keep in memory. And if the point of the program is to transform that data in real time, then it has to have access to the disk.

The antivirus basically unplugged the disk. What can it do to recover? There's nothing to be done.

It should be able to survive that situation, of course. When the disk is plugged back in, it should be able to restart without any problems. But I think that's a different kind of resiliency than what you're referring to.

In this case, the only way to recover would be to copy the frozen data to a new area of the hard drive, assuming it retained read access. But such complexities result in brittle implementations, prone to acquiring bugs. What if the disk space runs out? So you check beforehand whether there's enough space. But what if some other program starts consuming disk space in the middle of your copy operation? And so on. It's an endless spiral of design complexity.

The situation in the article seems closer to hardware failure than a design oversight.

Avernar · on May 7, 2016

> When the disk is plugged back in, it should be able to restart without any problems. But I think that's a different kind of resiliency than what you're referring to.

Yes and no. I was referring to restarting internally when the error condition went away but restarting the app and waiting for telemetry to return can be a valid solution.

Think of your torrent software. If you crank your firewall to block it while it's running it will not crash. If your disk fills up it won't crash. When the network comes back or more drive space if freed it will restart it's internal mechanisms. You wouldn't want it to restart in these conditions. If it runs out of memory however choosing to exit might be the best recovery mechanism.

I think a life critical medical application can at least strive for internal restart and do an external restart if all else failed. The article stated they had to reboot the machine to get it back. Now that's way worse.

> The situation in the article seems closer to hardware failure than a design oversight.

Hardware failure is almost always a permanent condition. This was a "my I/O stopped briefly and would have came back if my code could handle it".

sillysaurus3 · on May 7, 2016

During a surgery, the program doesn't have the luxury of showing a screen that says "No telemetry available." Such a program would be considered equally unreliable. Worse, it would lead to confusion: "Why is the telemetry unavailable? What does 'Error Code 2931' mean?"

A spectacular crash immediately led to pinpointing the problem: The antivirus.

If the program's sole purpose is to transform a massive amount of data in real time, it must have disk access by definition. It can't not have disk access. What would you suggest it do?

Avernar · on May 7, 2016

Yes it does! Showing "no telemetry available" is exactly what it should do. Crashing = unreliable. Reporting an error condition = reliable.

Immediately? Took them 5 minutes to reboot the computer. The scan of the folder would take seconds let alone minutes. Pinpointing the problem is secondary. Not killing the patient is primary.

> If the program's sole purpose is to transform a massive amount of data in real time, it must have disk access by definition. It can't not have disk access.

And that is the mind set the programmers of the software had. You have to take care of error conditions. The processing can't have no disk access but no disk access can occur temporarily or permanently. What can you do? Pause the processing part of your program. Or make the processing part treat "no data" as valid input and display something else.

Imagine taking that viewpoint with an ECG machine: This machine displays a heart rate waveform. So it must have a heart rate input. If there is no heart rate we'll just crash requiring a 5 minute reboot.

Hell no! Draw a straight line and set off a buzzer!

Bluestrike2 · on May 8, 2016

I agree with you, but the flat line might not be the best example because that has a very specific meaning (asystole) that doctors will take certain actions based on without necessarily trying to verify it manually when time is already critical. You should never be able to confuse an error message for anything else.

Avernar · on May 8, 2016

> You should never be able to confuse an error message for anything else.

Exactly. Which is why "Can't read file sensors.dat" is way better than just crashing. Crashing is one of the worst error messages you can get because you don't know what happened.

nickpsecurity · on May 8, 2016

" Avernar 18 hours ago

Yes it does! Showing "no telemetry available" is exactly what it should do. Crashing = unreliable. Reporting an error condition = reliable.

Immediately? Took them 5 minutes to reboot the computer. The scan of the folder would take seconds let alone minutes. Pinpointing the problem is secondary. Not killing the patient is primary."

Well-put. Far back as Burroughs B5000, the best way to handle erroneous software or I/O was to freeze it, notify the administrator/user of the problem, and give them sensible options for how to proceed. They might restart the I/O, restart the app, modify erroneous data to proceed (rare here), and so on. Crash and reboot is a Windows 95/NT strategy where incompetence dominated. Today's Windows OS and tooling can do much better with little effort by developers.

Bluestrike2 · on May 8, 2016

But the spectacular crash didn't pinpoint the problem. That came afterwards, when the manufacturer was able to look into the crash.

In a situation like this, confusion is all but inevitable. As a developer, the goal should be to minimize that confusion to the greatest extent possible. A blank screen and crash introduces another step to the process as people wonder "what's going on?" instead of "shit, it threw an error." It's probably not a big deal, but with medical devices during surgery, that extra step could be hugely problematic.

chris_wot · on May 8, 2016

I wonder what surgeons think about software engineers? Except for open heart surgery, they don't normally do their fixes by stopping and starting the thing they are repairing...

wpietri · on May 7, 2016

I totally disagree.

Sure, usually the most graceful thing to do is exit and hope a human fixes it. But that's usual because the usual condition is that sudden failure is NBD and a human is right there to screw with it.

That's becoming less common, though. When software was mostly something running on a PC doing some boring office task, reliability didn't matter. But as software is running our airplanes, our cars, our medical devices, and even, as with implanted pacemakers and insulin pumps, our bodies, then reliability gos from NBD to BFD.

We see the way forward with things Chaos Monkey [1] and crash-only software [2] and the sort of design for failure you see in things like Agent supervisor hierarchies [3], where the way to reliability is through designing for failure recovery from the beginning and testing thoroughly to make sure it really happens.

[1] https://github.com/Netflix/SimianArmy/wiki/Chaos-Monkey

[2] https://en.wikipedia.org/wiki/Crash-only_software

[3] http://doc.akka.io/docs/akka/snapshot/scala/fault-tolerance....

sillysaurus3 · on May 7, 2016

It wasn't a rhetorical question. What could this program possibly do to recover?

If the CPU fails, no one would say the program was unreliable.

In this case, the disk failed, because the antivirus unplugged it. Was the program unreliable?

Avernar · on May 7, 2016

The disk did not fail. An I/O operation failed. The first is a permanent condition, the second can be permanent or transient. Big difference.

In Linux a signal can cause an I/O to fail. In Windows it's antivirus and other background tasks can cause I/O to fail.

What can it do to recover? Retry the I/O operation! It should keep trying until the operator tells it to stop.

sillysaurus3 · on May 7, 2016

In that scenario, your surgeon would see the program suddenly freeze.

The program likely looks like this: data acquisition -> transformation -> display transformation on monitor.

If the transformation step fails, the monitor will end up displaying (a) nothing, (b) random data, or (c) the most recent image. None of these help the surgeon continue surgery. It's the same as a crash.

If your environment fails, there's nothing you can do to recover. Planes aren't designed to survive the loss of a wing. Why is this case any different?

Avernar · on May 7, 2016

> In that scenario, your surgeon would see the program suddenly freeze.

Only if the programmer or his management were incompetent. The display routine should be running on a separate thread than the processing code. No whole program freeze should occur.

As for displaying random data, why would the programmer want to do this? Either display nothing or the last readings WITH a message that it's not real time.

It's not the same as a crash! A crash requires 5 minutes minimum guaranteed. Restarting instantly after telemetry returns can happen under a second in the best case which can be the difference between a live and dead patient.

> If your environment fails, there's nothing you can do to recover. Planes aren't designed to survive the loss of a wing. Why is this case any different?

There are different kinds of failure. Permanent and transient. Following the permanent procedure for a transient case can be fatal.

Take your airplane example. Loss of a wing is permanent. That would be like the CPU failing or an external cable being cut.

But your engines shutting down can be permanent or transient. Just like disk I/O failing. You'd use the transient procedure in this case. Keep trying to restart the engines. If they restart, great! You've just saved the plane.

Same with the disk I/O. The programmer should keep trying to restart the I/O. If it comes back, great! You've just saved the patient.

wpietri · on May 8, 2016

Definitely. Each component should do its best to keep on keeping on. The display program should keep displaying something, even it's just the most recent data with a big "connection lost" warning. The device should ring-buffer the data and upon reconnection the screen should show as much as possible. The OS should have a strong opinion that the surgery app is very important, and that should the app fail, it should be restarted instantly.

Moreover, this is the kind of thing that should come up in robustness testing. Things should get bumped and wiggled. They should get unplugged and turned off. If the software is really going to run on random Windows boxes, then it should be tested on random Windows boxes. (At which point somebody will hopefully say, "Wow, this sucks, let's make it an appliance.")

No matter what happens, it shouldn't result in a "mysterious crash right in the middle of a heart procedure when the screen went black and doctors had to reboot their computer".

sillysaurus3 · on May 9, 2016

I had to step away from this conversation because of how aggressive you were being. Now that no one is watching, we might try to have a productive conversation.

Please consider dropping the adversarial attitude. This place isn't like other sites. The way people converse is equally important to what they say. It's better to transcend than to dominate.

For example, we do not slip in underhanded comments like this:

> In that scenario, your surgeon would see the program suddenly freeze.

Only if the programmer or his management were incompetent

This is just short of a personal attack, which is against the rules. I know you probably didn't mean it that way, but look at how you're framing the debate. I felt as if I'd been teleported onto Fox News and forced to defend myself from an aggressive interviewer's mischaracterizations.

Now, you can take the stance that "It's not against the rules, so I can say whatever I want." That's true, you can. But we're worse off for it. We optimize for good conversation here.

The point I'm trying to get across is that if you really throw yourself into this community, wholeheartedly and without a feeling of having to prove something wherever you go, then this place has a lot to offer. You'll meet a lot of interesting people, you'll hear a lot of interesting stories, and perhaps you'll have an opportunity to contribute to something quite unexpected. But none of that will happen if you try to skewer your opponents wherever you go -- or if you see people here as opponents. We're people.

It doesn't matter what the conversation is. It doesn't matter whether it's about life-or-death, or that this one happened to be about a surgery. The goal is to put yourself in the other person's shoes and to ask yourself, "If I were them, why would I say that?"

Regarding our conversation, if you want to continue it, I'd be happy to. But unless you're trying to learn as much from me as I'm trying to learn from you, it's not going to go anywhere productive. And what would be the point? No one's looking anymore -- it's fallen off the front page, so it's just you and me here. But why should our conversation be so different just because nobody is watching?

There are things to be said, but I have no time to defend myself. You can characterize what I was saying however you want. Or, alternatively, you could ask me what I meant.

I won't pull one of those "I've been in the field for a pretty long time, so I bet you'll learn something..." routines. Those are tired refrains, usually coming from people who have long forgotten what it's like to be young and hungry. But I'm still pretty young, and money's low enough that I'm pretty hungry. Being unable to afford meat is unfortunate, but it's worth not having a job for a little while to throw myself into my research. See why there's no time to defend against aggression?

I think I wrote this because in many ways, you remind me of how I used to be. And if I could go back in time, I'd ask myself what I was doing and why. This type of discourse is an intellectual dead-end. No one is going to learn a thing from watching people try to tear each other apart. Maybe you didn't realize that's what you were doing. It's very easy to slip into that mindset without realizing it.

As for displaying random data, why would the programmer want to do this?

GPUs are bastards. They ignore what programmers want, almost by definition. And as someone who has spent way-too-many years wringing as much performance as possible from them, I assure you that this is a realistic characterization of a possible outcome.

Perhaps that piques your curiosity. If so, then that sounds like the start of a good conversation, no?

wpietri · on May 9, 2016

You shouldn't complain about aggressiveness when you start with it. You opened with an extreme position and denied all possibility of nuance. When multiple people suggested other possibilities, you made sweeping denials based on your imagination of how the program worked.

If you want nuanced dialog, start with nuance and make room for other people's opinions.

sillysaurus3 · on May 9, 2016

If that's true, why not quote me? Point out some of these extreme positions. Point out the aggression. You say I started with it. Are you so sure?

Things I did not say:

- It's reasonable for a program fault to reboot a computer.

- It's reasonable not to check error conditions.

- It's reasonable for a program to halt and catch fire.

I wish I were making up that last one, but I'm not. Here are some unreasonable quotes:

Think of your torrent software. If you crank your firewall to block it while it's running it will not crash. If your disk fills up it won't crash.

No one was saying it was okay for the program to crash.

'Halt and catch fire' is not generally considered a proportionate response

No one was advocating this.

I've no idea where you're getting this 'unplugged the disk' thing from, AV software does not work this way.

The AV software denied all disk I/O. If you have nowhere to put data, and no access to data, then you don't have a disk. You have a paperweight.

As for displaying random data, why would the programmer want to do this?

Obviously, the programmer did not "want" to do this. This is what happens when you try to do GPU programming and the I/O is suddenly cut. I've seen it, which is why I said it.

But your engines shutting down can be permanent or transient. Just like disk I/O failing.

The disk I/O didn't fail. It was completely cut off by the AV program. There was no chance of it resuming until the scan completed, which could take far longer than the 5 minutes required to reboot the computer.

Speaking of which, here's where that stupid "the program rebooted the computer" myth stemmed from:

According to one such report filed by Merge Healthcare in February, Merge Hemo suffered a mysterious crash right in the middle of a heart procedure when the screen went black and doctors had to reboot their computer.

To me, it sounds like they restarted the computer to get the AV program to stop. The program did not "crash so hard it rebooted the computer."

Here's how the program works:

Merge Hemo consists of two main modules. The main component is the actual medical device, connected to the catheters, through which data acquisition takes place. This component is connected to a local PC or tablets via a serial port.

The second component is a software package that runs on the doctor's computer or tablet and takes recorded data and logs it or displays it on the screen via simple-to-read charts.

So we see that the company does not have control over their environment. They have no say over what the doctor's computers are like. They have to live with the fact that the doctors' computers are running Windows, and that they run AV scans. It's not up to them.

This is important, because if the company had independently decided it was reasonable to deploy their software with an AV package, then the fault would lay with the company. But they didn't. Now, what can the company do?

Your point was that the software should behave gracefully in this environment. I agree; that was my point too.

The various people in this thread took what I said and morphed it into something so far from reality that I'm frankly a little worried that people are believing it. If I try to get a job, people might read this and conclude that I'm somehow advocating for 300-second crashes. Seriously?

My sole, singular point was this: Small programs are reliable programs. You can't have bugs in what you don't write.

That means a lot of things. But it does not mean "do not handle error conditions." I didn't even say that this program should exit. I said that the spectacular crash led to pinpointing the AV scan as the source of the issue.

I was called incompetent (indirectly), that my position was "extreme," and that I "denied all possibility of nuance." Ok. Sure.

I've re-read the entire article and this entire thread to double check myself and make sure that my assumptions are correct here, so if you see a mistake, please call it out with a quote.

I agree that I'm now being a little shall-we-say heated, and it's annoying that I'm now doing that because of how much I was provoked here. Actually, this is more amusing than annoying. If the whole world is claiming you came across poorly, then you came across poorly, regardless of what you think. I'm wondering where it all went wrong. So please, tell me: What aggression do you feel I started with? I'm genuinely hoping to learn here.

Isn't this all a little tedious? Why are we even doing this? Aren't there more interesting thoughts to think than litigating what someone did or didn't say? I don't know why this happened, and I don't know specifically what you want. But I'm open to suggestions.

sillysaurus3 · on May 9, 2016

I apologize. I thought I was better than ranting, but apparently not. That wasn't cool.

Thank you for the advice. I appreciate it. Sorry for the sour grapes.

Avernar · on May 12, 2016

Seems that a good rant was what you needed. :D

As I wrote in my other two posts tonight (more like morning, sigh) is that I tend to tune out emotional and aggressive writing styles. That's probably why my writing style tends to look aggressive. It's just the type of debates I tend to end up in (sigh, again).

So I apologize again if that got you upset at me.

wpietri · on May 10, 2016

Whoa. I was not expecting that. Welcome! May all of Hacker News learn from your grace.

Avernar · on May 12, 2016

> The AV software denied all disk I/O. If you have nowhere to put data, and no access to data, then you don't have a disk. You have a paperweight.

AV software does not deny all disk IO. It just denies write access to a file very briefly and then goes on to the next file as the article stated it was a scheduled scan.

So you do have a disk but a few files temporarily can't be written to (still can be read). The program will get an error code from the write function and can just try to write again.

> The disk I/O didn't fail. It was completely cut off by the AV program. There was no chance of it resuming until the scan completed, which could take far longer than the 5 minutes required to reboot the computer.

This is where you are incorrect. And AV scan does not lock the entire disk for the duration of the scan. It locks and releases each file as it scans them. Fire up process monitor from sysinternals and look for yourself.

Tried it with my AV scanner with a manual scan. Looks like mine doesn't even do a lock on most of the files when doing a manual scan. So at most the file just couldn't be deleted.

> My sole, singular point was this: Small programs are reliable programs. You can't have bugs in what you don't write.

I pretty much agree with you on that one. Smaller programs are more reliable than larger programs.

> What aggression do you feel I started with? I'm genuinely hoping to learn here.

I found it humorous that you saw aggression in my words and that wpietri saw aggression in your words. Me, I just learned to tune out that sort of thing in other peoples posts.

> I said that the spectacular crash led to pinpointing the AV scan as the source of the issue.

And that's what I was arguing about in my original post. You know what Root Cause Analysis is? If not read up about it here: https://en.wikipedia.org/wiki/Root_cause_analysis

My argument was that while the crash identified the AV Scan as a causal factor, it wasn't the root cause. From the wikipedia article: "Though removing a causal factor can benefit an outcome, it does not prevent its recurrence within certainty."

The root cause was that the programmer didn't handle the error code that his file was locked. There are many more causal factors that can trigger the exact same outcome: indexing service, backup program, shadow copy, etc.

Unrelated to our debate, the medical company only blamed the AV software and the IT Staff. Not one mention that their program had a bug.

The fact that their release notes warned against AV software means that they knew their program was deficient. That's what really pisses me off.

Avernar · on May 12, 2016

> Please consider dropping the adversarial attitude.

A debate is by definition adversarial. I do tend to be more passionate when debating certain topics. If I've come across to you as aggressive I apologize. It's just my style of writing and you can freely ignore any aggression you see in it.

> > Only if the programmer or his management were incompetent

> This is just short of a personal attack, which is against the rules.

That comment wasn't about you so it can't be a personal attack against you. It was about a fictitious programmer and his fictitious management used in our examples.

Attacking what you've written is not a personal attack against you. I will rip your words apart, try to prove they are wrong, show where you've either made a faulty assumption or an error in logic. That's what a debate is.

I will never, ever under any circumstances attack you. If you can see that distinction I will gladly continue to debate with you.

This whole thing basically ballooned from this statement of yours:

> The antivirus basically unplugged the disk. What can it do to recover? There's nothing to be done.

Those are the words I'm challenging. You have two points there. The first is that the antivirus unplugged the disk. While I know you're not being literal you're not being accurate either. It locked one or more files.

The second was that there was nothing the program could have done. To this I gave an example of a program that does handle this exact situation and more.

From your other comment replying to someone else but still quoting me:

> > Think of your torrent software. If you crank your firewall to block it while it's running it will not crash. If your disk fills up it won't crash.

> No one was saying it was okay for the program to crash.

This is the example program I'm talking about. I wasn't implying that you think it was okay for the program to crash. I was giving you an example of a program that can handle disk and network error conditions without the need to restart itself (automatically or manually) nor crash itself or the system.

> GPUs are bastards. They ignore what programmers want, almost by definition. And as someone who has spent way-too-many years wringing as much performance as possible from them, I assure you that this is a realistic characterization of a possible outcome.

I believe you in in that situation. GPUs have been designed with speed in mind and that makes for a very complex interface to them. But that reinforces a few arguments made by others in regards to this story, that being should medical equipment be using hardware not specifically designed for the purpose.

But going back to my comment of:

> > As for displaying random data, why would the programmer want to do this?

Based on your GPU comment above we may using different definitions of random data. I took your "random data" as "looks right on the screen but the numbers are wrong". If the programmer knows that his data source is temporarily unavailable, showing stale or corrupted data is the last thing he should do.

Retric · on May 8, 2016

Some aircraft have landed without all their wings.

Fail fast is fine for a dating app, it's not acceptable for a antilock breaking system. As to disk IO they should have kept a redundant disk for backup just in case. Remember a program can generally spend 0.05 seconds waiting and no big deal. A program that takes 300 seconds to reboot is far worse.

PhasmaFelis · on May 8, 2016

Normal software does not crash the entire OS because the antivirus was looking at a file it wanted. We expect better of text editors, for God's sake. There is absolutely no excuse for a life-critical system to fail to meet that rock-bottom standard.

stevetrewick · on May 8, 2016

Software should absolutely try to account for these kinds of errors and routinely does. When was the last time you saw a word processor or spreadsheet bork a machine so hard it needed to be restarted just because an AV scan kicked in? ReadFile and ReadFileEx both hand you a specific error if some part of the file you are trying to read is locked by another process because it's hardly rare. 'Halt and catch fire' is not generally considered a proportionate response. I've no idea where you're getting this 'unplugged the disk' thing from, AV software does not work this way.

ryandrake · on May 8, 2016

You can't get enough up votes. Crashing because an I/O operation fails? That's sounds like simply a bug in the software. The developer didn't handle an error properly, and QA didn't test the software on an environment with elevated I/O activity. I've done enough code reviews over the years and seen enough ignoring errors from read(), ignoring malloc() returning null, not handling exceptions, etc. Good developers give a shit but many just don't care at all and think crashing or exiting when you're out of disk space is just fine.

mappu · on May 7, 2016

Better to crash (and restart quickly into a known state) than to enter a rare, untested code path.

brians · on May 7, 2016

For software critical to human life, test the rare code paths.

CamperBob2 · on May 7, 2016

The easiest way to be sure a code path will run properly is to avoid writing it in the first place. This kind of application should be designed to run in a highly linear, predictable fashion on robust, fault-tolerant hardware.

Why is nobody questioning the propriety of using an off-the-shelf Windows PC in safety-of-life applications?

Avernar · on May 8, 2016

> The easiest way to be sure a code path will run properly is to avoid writing it in the first place.

Agreed. So the app shouldn't contain anything extra not related to it's primary function.

However, handling error conditions reported by the operating trumps the extraneous code rule. But there are many ways to handle an error, including ignoring it if that's the proper thing to do.

Crashing is never the proper thing to do. If the program had simply exited at the very minimum, a restart would have taken a lot less time than a complete reboot of the machine. The software crashed that badly that it required a reboot of the machine.

> Why is nobody questioning the propriety of using an off-the-shelf Windows PC in safety-of-life applications?

They are, in the other threads. But using a better OS for the task wouldn't prevent the coding error the programmer did.

Let's say they chose Linux. A signal goes off or something else happens and their read call fails. Since they expect all their I/O to succeed they crash just like the Windows box.

If they bothered to handle the error and check for EINTR they'd know it was interrupted and not a hardware failure.

My point is, changing operating systems doesn't protect you from poorly coded applications.

CamperBob2 · on May 8, 2016

Agreed, I'm not blaming the OS, just saying it's the wrong tool for this sort of job.

limelight · on May 8, 2016

> This kind of application should be designed to run in a highly linear, predictable fashion on robust, fault-tolerant hardware.

No, that's a recipe for failure. Hardware will fail. Period.

Hoping that nothing will fail and therefore not taking steps to mitigate it is akin to designing cars not to crash. [0]

Failures need to be explicitly designed for and tested. It's truly depressing that companies where failure is fine (ex. Netflix's Chaos Monkey) understand this, while companies where failure is deadly don't.

[0] https://news.ycombinator.com/item?id=11652940

Avernar · on May 8, 2016

Restarting quickly into a known state is not crashing. That is handling the error. How much of the program you restart is the question. Restart just a thread is better than restarting the whole program.

But their app crashed. And hard. It required a machine reboot to restart. While it returned the machine to a known state it wasn't quick.

And in medical software, all code paths need to be tested.

stevetrewick · on May 8, 2016

File I/O error handlers should not be a rare untested code path.

Joof · on May 8, 2016

Agreed. This is what happens when software fails catastrophically. People die and we lose confidence in software that could save lives if it worked properly.

dTal · on May 9, 2016

You'd think it would take longer to write the instructions than it would to just throw a try-catch block in there.

11thEarlOfMar · on May 7, 2016

/rant/

I can't tell you how many times we've chased down field problems that ultimately were the result of antivirus scans. It's been so bad, that one of the first questions we now ask when we get a tool-down report is "is there antivirus running and what is the configuration?"

Bringing Windows into the architecture of any type of capital equipment control system is a bane. A scourge. I mean to say, it really is a misappropriation of software. Imagine, "Yeah, Frank only knows VB, so that's what we used for the aircraft's cockpit GUI."

/xrant/

raverbashing · on May 7, 2016

This

This machines costs hundreds of thousands of dollars.

There should be no excuse for using Windows. None.

I would not be surprised if the "antivirus" thing was some PHB requirement

Amezarak · on May 7, 2016

Is there a reason to believe that choosing Windows was a bad decision?

The bad decision was installing antivirus software. Otherwise, most any modern OS would be fine. This machine probably shouldn't be connected to a network (if it was), the USB ports should be disabled; data can come off on burned CDs, autorun should be disabled, etc. That's how you deal with IA concerns on a standalone mission-critical system, not by installing antivirus.

ChuckMcM · on May 7, 2016

I think you have it backwards (no disrespect intended). When you evaluate the choice of Windows you have to acknowledge that it brings with it the vulnerability of viruses and so the necessity of anti-virus software. Either you own that decision, and as part of your support your tool provides the necessary antivirus and you also insure through testing configuration management that its configured appropriately, or you choose a different option up to an including writing your own system to manage the "time critical bits".

Having their software run in a Windows ecosystem that they do not have strict configuration management control over was a bad decision and on the basis of this failure report. That it did not result in patient injury or death was fortunate but is certainly not guaranteed.

Amezarak · on May 7, 2016

> When you evaluate the choice of Windows you have to acknowledge that it brings with it the vulnerability of viruses and so the necessity of anti-virus software.

No matter what OS you choose it is vulnerable to viruses. You and I will agree that the odds are your Windows system is much more at risk by at least an order of magnitude. But the IA people who demanded that this system run antivirus are just as likely to demand that Linux run antivirus, simply because the vulnerability theoretically exists and making that demand fulfills their CYA requirements. I've worked on standalone Linux systems that IA demanded have antivirus.

> Either you own that decision, and as part of your support your tool provides the necessary antivirus and you also insure through testing configuration management that its configured appropriately, or you choose a different option up to an including writing your own system to manage the "time critical bits".

According to the article, the software runs on the user's hardware. While they certainly could have made a decision to provide their own controlled hardware, it's entirely possible that hospital was not open to that option for cost reasons, for IT management reasons, whatever.

ChuckMcM · on May 7, 2016

   > No matter what OS you choose it is vulnerable to
   > viruses.

It is a bit more nuanced than that. While it is true that absolute security is generally deemed impossible, if you use a widely deployed operating system in your device there are both a number of actors trying to compromise it for different reasons, and a number of examples that can be acquired for testing different exploits.

By writing just enough "OS" to achieve your goals in an embedded system, and then designing a clean access API through which you cannot affect the underlying code (no "here download this new firmware" call) you can avoid that particular threat vector.

DyslexicAtheist · on May 7, 2016

> You and I will agree that the odds are your Windows system is much more at risk by at least an order of magnitude

it is a common misconception that Windows is still worse a platform than Linux when it comes to security. Not trolling ... I'm using Linux since '96 and built my life and career on it. Opinion of some people in infosec circles (@thegrugq @csoghoian ...) is that Windows no longer lags behind:

https://grugq.github.io/presentations/COMSEC%20beyond%20encr...

dantiberian · on May 7, 2016

There's two separate things that are conflated here:

* Is Windows security equal to or greater than Linux and OS X (Probably yes)

* Are the overwhelming majority of viruses written to target Windows systems (yes)

I really enjoyed this article on security economics which goes into this in more depth: http://tidbits.com/article/15939

wyager · on May 7, 2016

>No matter what OS you choose it is vulnerable to viruses

This may be theoretically true, but it is not practically the case. There is a reason Linux and OS X users almost never use antivirus software.

Using Windows on a medical device is inexcusable. It's a heart monitor, not a game system.

clay_to_n · on May 8, 2016

Many big-name vitals monitors seem to run Windows under the hood. They have a whole PC in there, connected to their hardware sensors. If you pay extra for network connectivity or a similar premium feature, I suspect it's the same software with some flag turned on.

Coincoin · on May 7, 2016

Medical devices are used by humans. Humans, for some reason prefer and know Windows better. Why shouldn't the devices use Windows?

Today, doctors from a hospital the other side of the country can diagnose your cancer realtime while you are still in the MRI machine, and they do it with Windows, because that's what humans know and use.

wyager · on May 8, 2016

> Humans, for some reason prefer and know Windows better

Probably because they've been exposed to it in a many settings, for better or for worse. Windows being popular does not imply that Windows is appropriate for any given task.

> Why shouldn't the devices use Windows?

Is this a serious question?

Windows is probably the least stable production OS in the world today.

Windows is extremely bloated compared to an appropriate embedded OS.

Windows is (practically speaking) the only OS where antivirus software is a fact of life. Viruses should not be a concern for medical equipment.

Windows is not even close to real-time.

I don't want to see "Your heart monitor is restarting for updates in 3... 2... 1..."

> and they do it with Windows

This may be technically true, but this in no way implies that Windows is uniquely suited for or appropriate for the task. They could also do it using a PlayStation, but this is probably not an appropriate platform.

MertsA · on May 8, 2016

The benefit you're describing is just end users already being familiar with the interface because it's running Windows. That's not the case here, the entire interface on this device is of the custom application itself. It doesn't matter that the user already knows that the start button can be used to launch new programs, they're never going to see any hint that the device is built around Windows unless it's broken. There are numerous disadvantages to using Windows for a heart monitor, everyone else has described this to death but the only real benefit to using Windows is that development doesn't require a software developer competent with embedded work. Now they can hire any Tom, Dick, and Harry from the local community college because it's just a Windows app.

limelight · on May 8, 2016

Users shouldn't be exposed to the OS here at all.

nickpsecurity · on May 8, 2016

"No matter what OS you choose it is vulnerable to viruses. You and I will agree that the odds are your Windows system is much more at risk by at least an order of magnitude. But the IA people who demanded that this system run antivirus are just as likely to demand that Linux run antivirus"

That's not true for what those systems should be really running which is separation kernel platforms. These isolated tasks in partitions using high-assurance kernels designed to not fail in every way you can think of and with almost no code to hit. The apps even donate their own resources for kernel calls. Interface is in untrusted VM that sends checked commands to real software running on isolated partition optionally on Ada or Java runtime for memory-safety. Anti-virus is not available and not necessary given the untrusted part is strongly contained & the trusted part is memory safe.

http://www.ghs.com/products/safety_critical/integrity-do-178...

https://os.inf.tu-dresden.de/papers_ps/nizza.pdf

Just a matter of using right tools for the job. Any medical device using Windows or any other complex OS isn't doing it right. Even OpenBSD would've been a better choice given it rarely is hacked, crashes, or needs updates. Antivirus software wouldn't run as it's not available for these. It would be a network appliance or something that didn't affect running system.

Diederich · on May 7, 2016

Is the day to day impact of antivirus on Linux the same as on Windows?

Cheapshot1 · on May 7, 2016

even better analogy is. Is the day to day impact of your own customized OS the same as windows. If the medical equipment ran independently we wouldn't have anything really to worry about.

raverbashing · on May 7, 2016

You're right. My bet is that they use windows because they want to save or get something from the machine (either to a USB drive or network)

And I would have followed the same steps you mentioned. The machine could work with Windows.

Windows CE might have been a better pick, so you can have things like RO filesystem, etc

"There is no reason to use Windows" is, as you mentioned, not a bad decision in itself, but since they do it the lazy way, it is awful. Not sure if there's an out-of-the-box way of firewalling everything in the Windows versions available at the time that machine was built

x0x0 · on May 7, 2016

I worked on similar software that ran on Windows, though the decision was made 18 years ago. It was driven by:

* GUI dev tools were an order of magnitude better than under linux

* nidaq (very common daq and dio cards) didn't have or had flaky linux drivers. usb may be a better choice today but was not then available. Plus it's convenient/less expensive to push functionality to the computer side and minimize embedded development.

* in the beginning (though we fixed this mistake), we shipped software, our machine, and a ni-daq card; users provided a computer. This was a nightmare. We ended up shipping whole computers custom built by Dell to our specs with ni-daq cards installed (and later glued in to prevent them slipping out during shipping) plus our custom system images. At the time, I don't think Dell would do this for linux; you could find other vendors that would, but they weren't easy to work with or much more expensive.

* Nobody is eager to switch OSes because every software update to the machine has to be FDA certified (ie kiss at least $100k goodbye), so the company is very risk averse. Beyond even the cost to rewrite what has evolved into a relatively complicated gui app.

* As mentioned elsewhere, once you leave windows you have to train people to use linux guis that historically had window managers / gui toolkits that didn't work like windows.

All of the above is complicated by the fact most of these medical device companies pay their devs shit and have trouble hiring good devs. (Engineering shortage, obviously!)

tremon · on May 7, 2016

every software update to the machine has to be FDA certified

Interesting. Does that imply that every Windows update Microsoft releases has been FDA-certified already?

icegreentea · on May 7, 2016

I don't know how it works out for stuff used during surgery, but for diagnostic devices (where if a device fails by becoming inoperable, the stakes are a lot lower), you can lump basically any software that you don't have total control and design history on (so this includes random proprietary software and most FOSS stuff) as "software of unknown providence" (SOUP).

Now based on the class of medical device you are developing, you can either get to use SOUP without justifying at all, or you have to explain/justify why you can use SOUP at all.

For example, even if your OS was Linux, because you don't have formal documentation on verifying that Linux infact does all the things its suppose to do (hell, does Linux even have a true spec?), you would then have to justify why you're using Linux. If a kernel update comes out, you then have to justify why you're using it.

What you WOULD have to do is verify/validate that your software that interacts with SOUP (so in this case running on an OS) still works correctly. You would not have to verify/validate the OS update per se.

HeyLaughingBoy · on May 7, 2016

His characterization is not quite accurate. FDA does not certify it; rather, the company building the medical device certifies to the FDA that it has been properly validated for use and the FDA accepts their validation.

Trust in the company's ability to properly validate the device is tempered by regular FDA (and other regulatory bodies) audits of their development process and process data, test reports and a general desire to stay in business. FDA can and will take your product off the market if it appears to be unsafe and the company does not respond appropriately.

x0x0 · on May 8, 2016

Yeah, you have to create a testing/validation procedure and run the procedure with your hardware and software. So for human diagnostic devices, we had to purchase samples and basically show that we diagnosed to our prescribed accuracy levels. So there's a bunch of wetlab time/disposables involved.

Our machines were actually purchasable in two ways: one certified for human diagnostics (FDA Class III, I think), and one not, for eg veterinary or scientific use. The former got 1-2x/year software updates, while the latter got monthly releases. We had to enforce usage requirements both with contracts, download restrictions, and in-machine checking. ie you couldn't use the non-diagnostic firmware or software on a diagnostic-certified device. You also had to use certified reagents -- enforced in hardware -- in diagnostic devices.

It's a lot of work, but given some of the software I've seen, and the fact that if these machines get the diagnosis wrong you either won't be treated for deadly diseases or will be treated in very toxic ways for diseases you don't have, the requirements are reasonable.

enraged_camel · on May 7, 2016

A lot of those machines have Windows embedded in them, because a lot of doctors and nurses are really, really bad with technology and cannot use anything more complicated than Windows's familiar user interface. Training involves "OK, now use the mouse and double-click this icon on the desktop to start the program."

I wish I was kidding.

undergrowth54 · on May 7, 2016

> I wish I was kidding

I know that "tired person in a hurry" is a trope, but considering how masochistic the health professions are about sleep, it really applies here.

stephengillie · on May 7, 2016

Windows is a familiar interface for computers. Is there a good reason for not using a familiar user interface?

tremon · on May 7, 2016

Huh? Of course there is. "Familiar interface" isn't even a functional requirement, it is only of secondary importance for any system. Actual functionality is much more important.

As an aside, if "familiar interface" is your only requirement, I'd suggest to install a door handle or light switch.

seanp2k2 · on May 7, 2016

Your comment may sound sarcastic on the first read, but I agree with the point; don't use Windows when what you needed could be done with a micro controller or a physical circuit. Not sure if the application here fits that at all, but I've definitely seen things like you-do-it check-outs, ATMs, billboards, etc running Windows. I would think that these systems would be much better off with something like a hardened micro controller running a remote display system vs a multi-gigabyte operating system.

With Windows 10, this problem will likely get a whole lot worse. What do you do when MS pushes a forced update to your deployed devices and it bricks some of them? http://m.theregister.co.uk/2016/05/06/microsoft_update_asus_...

What about when you can't disable Cortana (without breaking the start menu) and you're in a HIPAA / PCI environment? http://winaero.com/blog/how-to-uninstall-and-remove-cortana-...

As we've seen with things like SCADA, switching from Windows is not a silver bullet, but in terms of minimizing the complexity and attack surface, Windows seems to be starting at the opposite end of the spectrum.

tremon · on May 7, 2016

Oh, I didn't even mean my comment to be about Windows vs dedicated OS, at least not directly. When administered correctly and properly fenced off, using a modern Windows system is not a cardinal sin. Although I will admit that Windows 10, with its non-optional feature cadence, brings additional uncertainty.

However, I do question "it looks like Windows" as a valid rationale for what appears to be a single-purpose machine. I don't think it's likely that staff are using the operating room equipment as a desktop machine, so presumably they only care about the in-app user interface. And an application interface can be made to "look like Windows" regardless of what OS it's running on.

limelight · on May 8, 2016

Exactly.

If users are never meant to interact with the OS itself (airport terminals, ATMs, displays, etc.) then it makes absolutely no sense to use Windows.

developer2 · on May 7, 2016

>> if "familiar interface" is your only requirement, I'd suggest to install a door handle or light switch

This isn't far from the truth. You'd be better of providing a console with clearly labelled buttons and switches. No operating system exposed to the user. It's not very practical to implement, let alone make changes to, but it would be more dummy proof than any desktop application.

This is also why you see a lot of kiosk-like, full-screen applications, with no ability to actually work with the desktop interface itself. A single kiosk application that cannot be alt-tabbed out of, providing screens that are easy to navigate.

et2o · on May 8, 2016

Surprised I had to scroll this far to find someone blaming a physicians.

jcrawfordor · on May 7, 2016

I think a lot of people are misreading this article. It appears that the device is standalone and runs some embedded OS, but as a feature logs data to the doctor's existing computer. So it's not so much that they built the tool on windows as it is that they built the tool on the operating system they expected doctor's to be using.

I still think this was a poor decision, but it's a different kind of poor decision

chapium · on May 7, 2016

To be honest, there is an excuse, although not a very good one. The machine is costly because it is in high demand. R&D and pushing this through regulatory control are costly. This company has great incentives to cut costs to remain competitive.

heartbreak · on May 8, 2016

Perhaps the regulatory control should cost more.

ArkyBeagle · on May 8, 2016

Windows ( at least 7 & XP) can be stripped down to something approximating a properly running system. This isn't easy.

Having an anything-critical Windows machine on the open Internet might not be such a wonderful idea.

copperx · on May 7, 2016

I would expect these kinds of systems to be running a soft realtime OS. Or at the very least a run of the mill OS with no extraneous software running in the background.

Dwolb · on May 7, 2016

This. How are these devices not running on some sort of hardened OS seen in airplanes and automotive? Medical applications are mission critical (or some variant) and should have same (or better!) certification procedures set up for correctness and security.

artiscode · on May 7, 2016

Second this. It is terrifying to know that mission critical, medical grade software runs on a consumer operating system. Military/aerospace systems have numerous requirements and clearly defined practices and ways of developing these systems, often going through various layers of documentation and using specifically designed programming languages(like the Z programming language) to write specifications, which are then re-written into code, but it seems like medical industry has been neglected.

ktRolster · on May 7, 2016

Then this story about viruses at a nuclear power plant won't make you feel any better: http://www.reuters.com/article/us-nuclearpower-cyber-germany...

Some great quotes:

"Mikko Hypponen, chief research officer for Finland-based F-Secure, said that infections of critical infrastructure were surprisingly common"

"Hypponen said he had recently spoken to a European aircraft maker that said it cleans the cockpits of its planes every week of malware designed for Android phones. The malware spread to the planes only because factory employees were charging their phones with the USB port in the cockpit."

zyxley · on May 7, 2016

> The malware spread to the planes only because factory employees were charging their phones with the USB port in the cockpit.

The moral of the story: If you include any kind of port in something, people WILL plug things into it sooner or later.

ams6110 · on May 7, 2016

Sounds suspect to me. Aircraft computers are not running Android.

copperx · on May 7, 2016

I'm pretty sure the entertainment system runs on some version of Android.

clay_to_n · on May 8, 2016

Is there Android malware that, when connected to a Windows PC, spreads Windows malware? Sounds reasonable in this situation.

koluft · on May 7, 2016

They run USB, it doesn't matter what OS is on the other side.

ams6110 · on May 8, 2016

Sure it does. If you plug a USB drive full of Windows viruses into a Linux box, nothing will happen.

radicalbyte · on May 7, 2016

I've been working in Medical for a couple of years... it's because Medical is extremely fault tolerant. They put up with a lot of rubbish that wouldn't be accepted in other industries.

Aeronautical and Automotive are both engineering driven, Medical isn't, it's a big grey area.

wyldfire · on May 8, 2016

You might be surprised to realize that many applications of medical devices are not used for affirmative life support and therefore should not be held to the same standards as aviation.

This application (cath lab activity logging) is not a life-support activity. Product failures of any kind (either due to design error or product defect) represent a diminished capability of diagnosis and treatment. This does not represent a risk of harm to a patient.

That said, some medical device manufacturers treat this aspect of design very seriously and go to great pains to use defeatured and heavily restricted OS and settings.

beachstartup · on May 7, 2016

correlation doesn't imply causation.... except when someone selected windows, and then made 50 other moronic decisions also.

half the work done in this industry is just dealing with stupid decisions made by stupid people. i just accept this now.

tluyben2 · on May 7, 2016

But when Windows is chosen because it is easier to find coders who will 'remain within the budget' (cheapcheap) and cannot even make sure the virus scanner doesn't run during procedures or at all it goes a bit too far.

Amezarak · on May 7, 2016

According to the article, it was the hospital IT that misconfigured the antivirus, not the application developers.

Is there some reason to believe this wouldn't occur on a Linux system? There are plenty of dumb IA requirements that antivirus be installed on Linux, too.

zanny · on May 7, 2016

Hardware involved in heart surgery running a Linux system should have literally the bare minimum of software to run it. That means kernel, init, networks support (dhcpcd, systemd-networkd, if absolutely necessary samba, sshd, etc) and custom software running on top of that as nobody. Not some ~20GB of Microsoft crap driving the system to crash every couple days.

Amezarak · on May 7, 2016

> Not some ~20GB of Microsoft crap driving the system to crash every couple days.

I agree the system should be running as few services and as little other third-party software as possible, but let's be fair. Since at least Windows 7 / 2008R2, particularly for an offline system, the OS is not going to crash unless there is a hardware problem. It's not clear the OS crashed even in the article - "the screen went black" (the application went black?) and they "had to reboot" doesn't give us enough information.

A modern Windows system, like a modern Linux or a modern FreeBSD, is stable and will stay up for as long as you need it to, unless as I said before, there is a hardware problem. (Or in the case of consumer Windows, you do an update.)

EDIT: According to the actual report, the OS was not rebooted, the application was. There was no Windows crash.

> On (b)(6) 2016, a customer reported to merge healthcare that, in the middle of a heart catheterization procedure, the hemo monitor pc lost communication with the hemo client and the hemo monitor went black. Information obtained from the customer indicated that there was a delay of about 5 minutes while the patient was sedated so that the application could be rebooted. It was found that anti-malware software was performing hourly scans. With merge hemo not presenting physiological data during treatment, there is a potential for a delay in care that results in harm to the patient. However, it was reported that the procedure was completed successfully once the application was rebooted.

pdkl95 · on May 7, 2016

"Cheaper" or "remain within the budget" doesn't excuse using inadequate parts that don't meet the design requirements.

Unfortunately, this total disregard for safety isn't just software anymore. When we stat skipping lessons that we've know for a looooooonng time (such as why a split bobbin is an important feature in a transformer[1]), we have evidence of a serious need for strongly enforced regulation.

[1] https://news.ycombinator.com/item?id=11474730

tluyben2 · on May 7, 2016

I know it doesn't, but unfortunately it is often how it works. More than I want to remember I have seen things like 'but lives depend on this!' or '100s of millions can get lost if this doesn't work!' and yet when the RFPs come back and something like InfoSys is chosen because big name and cheaper than experts in the field.

Edit;

> we have evidence of a serious need for strongly enforced regulation.

Better education? But I guess strongly enforced regulation will force companies to not go for the cheapest solutions they can get away with which in turn will require people with actual knowledge in the field which will require better education, somehow.

pdkl95 · on May 7, 2016

Better education is always a great idea. Unfortunately, regulation becomes a necessary fix for immediate problems.

Note that regulation is the nicer option; the other way to force people to get the necessary education is liability, which could get really ugly in the case of medical devices.

tluyben2 · on May 7, 2016

There have been 'predictions' in the past of software creators being made responsible for the software they write in a liability way. Which will get very messy indeed. And grind the software world to a halt. Regulation is the nicer option and definitely the more realistic option.

makomk · on May 7, 2016

Sometimes it's all about tradeoffs. For example, a split bobbin is actually undesirable in most transformers because it reduces coupling between the windings, reducing the efficiency of the power supply. That's why if you look inside a better-quality power supply you'll often find that they have a split bobbin for the input common-mode choke and then a single bobbin with layered windings for the main transformer.

dredmorbius · on May 7, 2016

Only half?

I mean this quite seriously: you're hugely underestimating. Possibly unaware.

rsync · on May 8, 2016

"Bringing Windows into the architecture of any type of capital equipment control system is a bane. A scourge."

We need to go much further than this, since many people will "solve" this problem by using a different platform than Windows.

In reality, bringing networking of any type into capital equipment control systems or critical infrastructure is the bane ... the scourge.

Whatever convenience or perceived function that networking (including very local networking, like USB) is dramatically outweighed by the additional attack surface.

Go back to sneakernet and check your facebook at home, Mr. Nuclear Plant Worker.

CaptSpify · on May 8, 2016

The issue with this is: Many of these systems need to send data to other system on the network. You'll have to send the data over some kind of PACS network, or you'll just have to use a USB.

I agree that's a "better" setup than networking, but I still don't think having staff plug random usb devices into your medical equipment is a great idea either.

rwmj · on May 7, 2016

And if you want PCI-DSS [credit card handling] certification, then you'd better be running AV software, even when it's completely inappropriate.

CaptSpify · on May 7, 2016

https://xkcd.com/463/

The whole structure is wrong. I used to work in medical equipment repair. Windows Embedded is running so many devices it's not funny. But it's not just Windows that's the problem.

I put a linux-system on a PACS network to diagnose equipment. It was a headless, and we asked the IT group to block it off from the Internet.

Hospital IT: "Does it have antivirus?"

Me: "..."

SixSigma · on May 7, 2016

List of FDA medical equipment recalls for 2016

http://www.fda.gov/MedicalDevices/Safety/ListofRecalls/ucm48...

At least three of them are Class 1 - May cause death

And all of those are software related, none run Windows

http://www.fda.gov/MedicalDevices/Safety/ListofRecalls/ucm48...

kosmic_k · on May 7, 2016

That is astoundingly horrifying, especially the Class 1's which were distributed for over five years.

SixSigma · on May 7, 2016

I read every recall, food and medical, from 2000-2015 for a university research project. tbh I'm surprised anyone is still alive !

icegreentea · on May 7, 2016

The bar for recall is actually relatively low. Basically, when you find a fault (somehow, maybe in regular QC in manufacturing, or something bad actually happens in the field, or some engineer is fucking around), the question is "can this affect patient safety/outcome in the field". If the answer is at all not a certain "no", then you're probably thinking recall at that point, unless you can adequately root cause and contain it.

Since the nature of "oh oops" is that they tend to affect systems in ways that are not anticipated, there's often insufficient evidence to rule out danger to patients, and therefore there's a recall.

For example, if you sold 10,000 diagnostics machines, and then discovered that because of stack up of tolerances in electrical components, something like 1 in 100,000 machines will have a fault that affects the customer safety. However, because your original analysis (during design phase) did not show this problem, you never bothered recording the actual performance characteristics of 40% of the components involved in the stack up.

Now you're in a pretty awkward situation, that could result in a recall. And it could very well by that all 100,000 machines sold are just fine.

ptaipale · on May 7, 2016

Thank you; I hadn't spotted that XKCD quote about teacher and condom, but it will be very appropriate the next time someone asks me about the antivirus in the server software we do (on Linux).

dchichkov · on May 8, 2016

Let me surprise you, with the code quality that sometimes is running in what is actually 'life-critical' software.

Back in the nineties, I wrote a nice piece of some 300kb of C code, for DOS/x86. It was a complete software package, controlling medical equipment that was testing speed of blood coagulation. These tests are crucial in the patient post-operation recovery.

This piece of C code had some hardware control code, some statistics, a bit of math, some visualisation, GUI, etc. Normally, you'd imagine a team of 2-3 people, carefully written test cases, dedicated QA person, and a year of time to write something like it. And independend lab, that would certify the thing. Well... in that case, yes, there was independent certification... but...

It was just one developer, and I was 13, when I wrote it ;) During after-school time, in around 4-6 months. And I must say, I still sometimes have chills, when I think of the code quality, and, um, unorthodox solutions of 13-year-old myself. Yes, I've had some years of experience at the time, both writing software and designing hardware, and advice from my parents, who both could write software. But, at the time, I've had zero formal training, aside from reading K&R and PC XT manuals ;). So, you might imagine the code quality ;) Even, no need to imagine, I actually still have it somewhere in the archives :)

Keyframe · on May 8, 2016

Quite a story! I wrote a program for a dentist office (not my dentist) when I was 13 and got paid! It was a program to track their patients and stuff though, nothing critical. What was peculiar about this was that the dentist asked me if I can write the software in Turbo Pascal. That's the only language he, kind of, understood and he wanted to maintain it later when I write it. I didn't know Pascal, I only knew C at the time, but I accepted the challenge and wrote my first and only program in TP. It was kind of elaborate, especially for a 13 year-old, but also fun (BGI!).

state · on May 8, 2016

I understand why you probably don't want to put it up, but boy would it be fun to look at the code you're describing.

dchichkov · on May 8, 2016

I probably will put it up. It's a nice inspirational story for teens out here. Doubt there'd be any repercussions, no one cares about some random code on GitHub. And the equipment is hopefully taken out of service years ago, it was more than 20 years back. I wish I knew how long it had been used, but there've been only about 10-20 units sold, I think.

I vaguely remember adding extra features for a year or so (like adding support for HP laserjet printer). But one of the founders of the company (on the business side) had some health problems, and I guess that had played role in very small number of units sold. The only feedback that I've had, is pretty much that my father took me to a lab once, that had a unit deployed, for a support call. And I've seen some real printouts with patient names, from the unit. The lab assistant seemed to be happy with the device. I remember them showing me some blood plasma and teaching me to count cells, during lab tour ;)

pdkl95 · on May 7, 2016

Is it going to take more deaths to convince people to learn from the Therac-25[1]? If you aren't designing for safety first, you have no business working on medical devices or anything else that might be a dangerous when it misbehaves.

[1] http://sunnyday.mit.edu/papers/therac.pdf

chestervonwinch · on May 7, 2016

I am not the parent poster, but may I ask why is this comment being down-voted? I'm not speaking for the parent, but he or she seems to be implying that medical equipment with anti-virus software with automatic updates (used as such) may potentially compromise a patient's safety, and may be indicative of further bad design practices, which could result in, at worst, death. Is this somehow off-topic, or not worthy of discussion?

pdkl95 · on May 7, 2016

That's exactly right. The article mentions that the doctors were fortunate enough to have five minutes during which they could reboot the device. If they were in the middle of some other procedure that had tighter time constraints, a reboot could have easily killed the patient.

Just like the Therac-25, this isn't about a single problem (the antivirus or the race condition in the Therac-25's software). Designing for safety has to happen at all levels of design. Using Windows (or Linux, or any other complex OS) in a medical device shows that the designer wasn't even considering the safety of major parts of their design.

Designing medical devices with an OS that can be infected with malware (and thus need an antivirus) is the same kind of idiocy that puts a car's steering and brakes on the same CAN bus as the music player and emergency radio. It's a sign that the designer needs either more education or a different job before someone is injured or killed.

jschwartzi · on May 7, 2016

Because it's really disingenuous to say that the medical device industry hasn't learned anything from Therac 25. The concept of two-fault failures is an industry standard that was learned from Therac.

The fact is that in the Medical software industry the best practice is to manage the entire software configuration of the medical device. Failing to do so, and especially failing to adhere to the guidelines of the manufacturer, is negligent at best. We all know that the behavior that led to the hazard is the wrong thing to do and that somebody screwed up.

The only other real insight that can be gained from this incident is that it's very important to have configuration management procedures that are easy to follow, and it's important to verify that they were correctly followed. I can't tell whether they were in this case, but I suspect given the use of Off-the-shelf software that there was some manual sequence of steps required to adhere to the approved configuration. Given that, I would have expected an error of this magnitude, because it's well known that humans make mistakes whenever they are made to follow a sequence of steps. The configuration should have been verified at installation time, at least.

If you're interested in the kinds of things the industry has to consider in the US, take a look at the FDA guidance for the 510k submittal process.

iask · on May 7, 2016

There are a couple of things here from my POV, first - I would replace the head of their IT and any senior IT staff - who seem to look for the quickest-then-cheap solutions. Dumb ducks who don't spend the extra time understanding the importance of the infrastructure and the software they install. And also replace the service vendor, if they have one.

I've seen this happen time and again, where companies have some 3rd party service vendors who would install AV software on anything they can get their hand on, even a microwave or coffee machine - just to tell the client "my bill is expensive, but you can feel secure, we installed AV". I despise these folks with a passion.

The problem is not Windows. It's a lack of knowledge and understanding. Simple.

For god's sake - it's 2016 - dump the Anti Virus software. I am gonna make t-shirts this summer with this ;)

technion · on May 8, 2016

    I would replace the head of their IT and any senior IT staff

It's a very good bet the senior IT team were following orders from somewhere else in the chain here.

GunboatDiplomat · on May 7, 2016

Why on earth is medical equipment running standard Windows? This is the ideal location for some basic RTOS or even just an embedded Linux. Seems like a huge cost and risk for no gain.

tluyben2 · on May 7, 2016

It's cheap to find Windows programmers and even cheaper to find ones that are not hindered by knowledge about software quality and safety. That's not their fault; no-one ever told them something like that exists.

nonbel · on May 7, 2016

>"hindered by knowledge"

This is a great phrase. "Joe was hindered by knowledge of what a p-value means and so didn't claim he discovered a key to understanding the disease."

riyadparvez · on May 7, 2016

That was my first thought too. And why there is an anti-virus running? This equipment should not be connected to the internet nor some staff should plug-in a flash drive on the first place.

ars · on May 8, 2016

That's not true. They have to keep a record of the data for the patient file. So it does have to communicate remotely in some fashion.

GunboatDiplomat · on May 8, 2016

Could communicate with a gateway bridging a private medical devices network and a public network. That seems a reasonable way to provide control and access.

devonkim · on May 7, 2016

I believe that it's strongly related to the dashboard software being targeted towards "familiarity" by the doctors / customers and having old GUI-centric software from back in the mid-90s ported to run on Windows over the decades prettied up back when there was nothing really viable for end-user accessible embedded systems besides even more grotesquely expensive custom software with even less capabilities and far worse SDKs than anything under Windows back in, say, 1994. This isn't that different to me than banking software with a COBOL backend with Java middleware and a PHP frontend that's extremely common for retail banking sector.

Given the sheer amount of overhead in enterprise BS medical hardware has (read: sales people likely get the biggest chunk of the absorbed costs) it wouldn't be a surprise to me that the engineering teams amount to a skeleton crew while 70%+ of the personnel involved are non-engineers that override the professional decisions of the engineers.

technion · on May 9, 2016

I feel there are two very contradictory views on HN. The first of these, is that anything safety critical needs to land on an RTOS. The second of these, is to avoid C.

Both of these have reasons behind them and appear to make sense. Leaders in the RTOS space appear to be QNX and vxworks. Suitable languages people raise are Rust and Go.

Based on some Google time, neither of these platforms support either of these OS's. Multiple "Introduction to vxworks" documents are all exclusively in C.

In terms of accessibility, safe languages are far easier for someone to get their hands on, test to death, than some of the OS's recommended.

kosmic_k · on May 7, 2016

I never realized just how lucky I was to have been born when I was. I can't imagine building embed devices which aren't running a very simply super loop or an ARM RTOS.

steven2012 · on May 7, 2016

Antivirus scans are one of those things added on IT checklists to cover their ass whenever something wrong happens.

But it rarely is useful. It only causes problems. We've seen so many issues related to virus scans throughout the years it's crazy.

What's better is to lock down the servers with only minimal access. I haven't used virus scan on my main desktop for over 10 years because I don't click on weird emails and I don't go to sketchy websites ever. Sure there's the risk of malware from ads I suppose, but I'm not that worried.

jconley · on May 7, 2016

Most of the time IT is just implementing policy from the CIO, which is basing it on the requirements of the company's insurers. Insurance companies require some very annoying things like Anti-virus. It's like having a lock on your office. You do it so the insurance company will pay you if someone comes in and steals your stuff.

AnthonyMouse · on May 7, 2016

It's more like the requirements cronies put into defense contracts to make sure the contractors make a lot of money.

The reason "security requirements" documents require antivirus is that companies like Symantec make sure they're in the right position to be the ones asked when someone is writing up a security requirements document, so that their answer can be "make sure you install antivirus (and here's the contact info for our volume licensing center)."

jcrawfordor · on May 7, 2016

Yeah, you don't click on weird emails and don't go to sketchy websites. Try managing IT security for an enterprise of 10,000 employees. A/V will save your ass hundreds of times every single day.

Computer professionals rarely understand the use case for A/V precisely because they are not the use case. In most all applications, A/V serves first as a safeguard against stupid user behavior, and only second as a safeguard against more advanced penetration (and in the latter case, one with only rare success). I'd bet that the #1 way enterprises are getting breached is still malicious email attachments, that's certainly true in my experience.

qb45 · on May 8, 2016

> I haven't used virus scan on my main desktop for over 10 years because I don't click on weird emails and I don't go to sketchy websites ever.

Haha, I used to be like that as a teen in Windows 9x era until one day I ran tcpdump on the router ;)

YeGoblynQueenne · on May 7, 2016

>> The antivirus was configured to scan for viruses every hour, and the scan started right in the middle of the procedure.

>> The company claims that they included proper instructions in their documentation, advising companies to whitelist Merge Hemo's folders in order to prevent crashes from happening, so it seems that the whole incident was nothing more than an oversight on the medical unit's side.

So "RTFM"? Not very helpful.

combatentropy · on May 8, 2016

> they included proper instructions in their documentation, advising companies to whitelist Merge Hemo's folders in order to prevent crashes from happening, so it seems that the whole incident was nothing more than an oversight on the medical unit's side.

And the hospital included full instructions to the software company on how to properly perform a heart transplant, so they were baffled why the programmer just let his teammate die of heart failure.

Come on, this kind of stuff should be a zero-configuration hardware-based black box, with its own buttons, screen, etc. --- not something that needs to be (or even can be) connected to something outside the vendor's total control.

ezoe · on May 7, 2016

This situation is even funnier(and sadly very seriously flawed) in Japan.

Medical equipment require an authorization to use. Any change to the medical equipment requires another authorization or it's prohibited.

By "any change" , it includes Windows Update(it changes the system obviously).

The result: they use anti-malware software to protect(or rather, believed to protect) unpatched Windows.

At least one anti-malware software company(Trend Micro), marketing that their software can protect the medical equipment in such situation.

exhilaration · on May 7, 2016

But... what about AV/malware definition updates? Doesn't that fall under "any change"?

symtos · on May 7, 2016

and what about security updates to the snakeoil they sell, eg. https://bugs.chromium.org/p/project-zero/issues/detail?id=69...

Blackthorn · on May 7, 2016

> Any change to the medical equipment requires another authorization or it's prohibited.

Honestly, this isn't a bad decision. If the device was tested and certified with specific software, a software upgrade is not guaranteed to not cause a problem.

symtos · on May 7, 2016

using software with known problems in order to avoid potential problems from an upgrade does not seem like a non-bad decision

Blackthorn · on May 7, 2016

Is the medical device working right now? Yes. Could, upon upgrading, the device stop working, possibly in a subtle way that might kill somebody? Yes.

The approval process for medical devices is rightfully difficult. Software upgrades, even if they seem trivial, should not be a backdoor process of bypassing testing and approval.

symtos · on May 7, 2016

...and could software deployed to the device by some random who just exploited some well-known security flaw that never got patched, kill people?

Blackthorn · on May 8, 2016

This whole discussion has been about stuff air-gapped. That's not a guarantee when you have USB ports, but it does help.

callesgg · on May 7, 2016

Putting antivirus on equipment at all indicates a much bigger problem.

That the equipment is somehow configured to be susceptible to viruses.

datenwolf · on May 7, 2016

Well at least in EN62304 the installation of AV on medical devices is recommended. The whole thing reads like as if it was written by people who picked up a few buzzwords and read a few articles in a computer magazine.

stevetrewick · on May 8, 2016

From the linked report :

Based upon the available information, the cause for the reported event was due to the customer not following instructions concerning the installation of anti-virus software; therefore, there is no indication that the reported event was related to product malfunction or defect

I beg to differ. I'd consider a momentary loss of file I/O due to lock contention causing a machine to require a reboot a shocking defect in - say - a word processor (which, notably, do not have this problem). That this risk is apparently known and the vendor's sole mitigation is to document a 'Don't do that then' is absolutely 100% an indication of a product defect, even in the absence of an actual occurrence.

kinai · on May 7, 2016

this reminds me of IT crowds bomb disposal robot: https://www.youtube.com/watch?v=z88b96ECZCE

just perfect

toomanythings2 · on May 8, 2016

I don't think they say this device is controlled by Windows but it must be. Why professional software and instruments even consider using Windows is beyond me.

malbs · on May 8, 2016

We've had issues with the latest versions of kaspersky. A burst of network activity is almost guaranteed to crash a machine.

It took us a while to isolate Kaspersky 10, and it's not even any particular component inside of Kaspersky, but only when all features are enabled. We tried different permutations of features to try and isolate the cause of our crashes, but as soon as you have any one feature disabled in, the crashes stop, Very frustrating because ultimately our clients laid the blame at my feet (new software feature, new release, blah blah blah), and not exactly much you can do in the way of hardening against this particular crash, the app generates a burst of network data, and boom, blue screen/instant reboot.

coldcode · on May 7, 2016

I worked at a financial company that ran its production Oracle database servers on Windows in the same network as the staff (no firewall) and ran virus checkers on them. Performance was terrible of course.

angersock · on May 7, 2016

Okay, seriously, I need to say something, because I doubt most of the people commenting in this thread have ever dealt with either health IT, healthcare software, or any of the related nonsense.

There are kinda four flavors of machine setup I ran into while in that field: big server banks for on-site hosting (think huge enterprise VM farms, for data warehousing and record storage and virtual desktop hosting), care provider systems (think like tablets, doctor office computers, nurse workstations, room workstations), cart computers (used for things like running the sonogram or cardiogram equipment, or for other studies), and actual integrated devices (for, say, data collection).

The care provider systems are usually comically locked-down, tablets and phones having the meanest management software they can (no apps, limited connectivity, remote wiping, and so forth). Workstations tend to be centrally managed, have images pushed regularly (ha!), and often use AD and smartcards to handle authentication. One place I've seen took this a step further, and basically just booted users directly into a VM hosted on the server farms mentioned earlier. You can't use USB devices, you have highly-regulated clipboard access, and so forth--this is done to prevent HIPAA breaches. Which is kinda silly given other workarounds, but whatever makes people feel safe and the CIO happy. These workstations run some enterprise version of Windows, probably 7 Pro. Those silly-long extended service agreements you see on Microsoft? Hospitals are some of the people keeping that alive, and they will pay obnoxious amounts of money for the privilege.

The cart computers are typically like the workstations in terms of functionality, but they may have software specific to the device they're talking to. They might not be as locked down (e.g., only acting as thin clients to a remote VM), but they are still running Windows.

The device computers may run some kind of RTOS. In some cases, they'll be running a customized Windows CE installation--which is totally reasonable. There are a lot of good guarantees that that can give a development shop, least of all that they can call up Microsoft instead of StackOverflow and say "Hey, this function does x, it's documented as y, and we're paying you a lot of money, so what the fuck?". Windows Embedded (which is I think the successor, am not sure).

In all of these cases, Windows itself works pretty damned well.

It runs the software everybody needs, it has the enterprise deployment stuff figured out through decades of improvement, and really there is no reason to be scoffing at its choice.

Now, if folks have goofed up and thrown a stupid AV policy on the machine, that's a different question entirely. Health IT is full to the brim of people basically just punching a clock and being unable to get anything done in a reasonable amount of time. Sometimes, they do awesome things, but mainly they are just custodians standing between doctors and really really stupid policy decisions that seemed good at the time.

EDIT: Removed unrelated example at top.

saganus · on May 7, 2016

Wtf?

"The antivirus was configured to scan for viruses every hour, and the scan started right in the middle of the procedure."

Who configures an antiviurs for an hourly scan on a doctor's computer?

pritambaral · on May 9, 2016

It wasn't even a doctor's computer, it was apparently an operating-room equipment computer.

Kristine1975 · on May 7, 2016

Why is there a virus scanner on a PC inside the operating room?

Don't tell me that PC is connected to the internet...

rs999gti · on May 7, 2016

I was going to ask this as well. Why does this PC need to be connected to the internet? If it doesn't need to phone home while operating as a heart monitor then there is no need to have antivirus or have this PC connected to the internet.

Also, plenty of devices not connected to the internet run Windows: ATM's, Billboard, Monitors, etc.

Dumb IT is to blame for this mistake.

jcrawfordor · on May 7, 2016

> Also, plenty of devices not connected to the internet run Windows: ATM's, Billboard, Monitors, etc.

I hate to break it to you, but, in practice... these things are all typically connected to the internet.

mirimir · on May 8, 2016

The need to get updates, I bet.

billforsternz · on May 8, 2016

"A critical medical equipment crashed during a heart procedure due to a timely scan triggered by the antivirus software installed on the PC to which the said device was sending data for logging and monitoring."

That should be untimely. The opposite of timely.

firebones · on May 8, 2016

For what it is worth, Merge is now part of IBM Watson.

http://www.merge.com/News/Article.aspx?ItemID=660

Welcome to the Health Cloud Powered by Watson.

fla · on May 7, 2016

How can a medical device be certified for running on 'user hardware' (=uncontrolled environment).

Something is probably missing from the article. IMO, the device in question wasn't critical at all, and a failure could be expected.

fencepost · on May 7, 2016

I see a bunch of folks talking about whether PCs are connected to the Internet and "why was it running antivirus in the first place?" It's called Defense in Depth.

It Does Not Matter if the device is connected to/able to reach the Internet.

First, it probably can reach the Internet in some way simply by being networked. I don't think I've ever seen a medical office (can't speak about hospitals) where medical diagnostic equipment was on a fully-separate network able only to talk to other network equipment and specified data destinations (PACS servers).

Second, I'm not concerned about unpatched, unprotected machines being infected from the Internet. Odds are they're running a restricted version of Windows, with a custom shell and a lot of stuff stripped out. I'm concerned that they're going to be infected by another machine on the network that's gotten infected. With all the past SQL Server security issues a decade or more ago, how many people think those SQL Server boxes could be directly reached from outside the local network?

The conjunction of those two is that even if you firewall all that stuff off, the PACS servers are still on both networks, and are probably running much more interesting and vulnerable stuff than the device controllers.

Sure you can fully wall everything off - it's really easy, just do your X-rays onto film, burn your MRIs and ultrasounds onto CDs, and print your EKGs for later scanning. Oh, and listen to people complain about how out-of-date your systems and procedures are.

There are other factors that come in as well - sure, every device manufacturer could provide fully bespoke diagnostic displays developed from the ground up in artisanal software shops providing full employment for assembly programmers working on embedded systems, along with cohorts of graphic designers creating glorious steampunk-styled interfaces. That's a beautiful dream, keep having it.

For the rest of the world, creating a UI on that custom embedded system running on something from RIM/Blackberry (yeah, they own QNX) is just going to get them crap from people because of A) how clunky it probably looks and B) How could they even consider allowing direct user interaction with the RTOS that was chosen to ensure that the dangerous bits in contact with patients/radiation/irradiated patients were safe?

There's a beautiful world out there somewhere where everything is safe and secure and seamless and updated. The rest of us live in worlds where Joe in Marketing's PC gets infected with something that allows an attacker to start scanning the network for unpatched vulnerabilities on any system, which leads to an out-of-date install of IIS on a legacy server that hasn't been updated because there's no longer a contract with the vendor (or no vendor) but it's around because there's a statutory requirement to keep the data on that system for 7-10 years.

There's a lot of ugliness out there. Antivirus is a way to try to ensure that when (not if) some of it hits you the repercussions are minimized.

tremon · on May 7, 2016

it probably can reach the Internet in some way simply by being networked

That is simply untrue, you can (and in many cases should) have unroutable subnets. But even if true, that only slightly changes the question: why is operating room equipment networked in the first place? That you've never encountered a proper setup doesn't excuse not having it.

fencepost · on May 8, 2016

I phrased that badly - it's not that they can reach the Internet, it's that with the exception of true high-security fully-airgapped locations, if the machine is networked then it's almost guaranteed that the Internet (or something on it) can effectively reach out and touch that machine even if it's only via other systems.

I don't work in a hospital environment, haven't for more than a decade and wasn't interacting with clinical systems even then, but my understanding is that a very significant amount of medical equipment was networked even then, and was at least in theory capable of streaming HL7-formatted data to other internal systems for reasons of patient care, billing, or both. How much of that happens in the real world instead of being theoretical is something I can't say, but I'm sure in the 15+ years since I was working with HL7 that hospitals and equipment haven't gotten less networked.

fiatjaf · on May 7, 2016

[flagged]

dang · on May 8, 2016

There's no place for this kind of flamewar ignition on HN. Please don't post anything like this again.

We detached this subthread from https://news.ycombinator.com/item?id=11650792 and marked it off-topic.

dlp211 · on May 7, 2016

Get out of here with that nonsense. You may not be a fan of paid software, but the Windows Kernel is just as good as any FOSS kernel today in regards to stability. The Server and embedded SKUs also come with a ton of the extraneous stuff removed that one would ever be worried about. The issue here is that someone decided that a machine that should only ever be connected to an air gapped network needed anti-virus software.

Disclaimer: I work for Microsoft.

wyager · on May 7, 2016

>the Windows Kernel is just as good as any FOSS kernel today in regards to stability.

I disagree strongly, but this is beside the point. Medical hardware should not be using any operating system that's not hard real-time and thoroughly vetted. Ideally they would use no OS at all. Even Linux, which is drastically more appropriate for embedded systems, is a questionable choice for medical equipment.

iris-digital · on May 7, 2016

The kernel is not the problem with Windows on medical devices.

dang · on May 8, 2016

> Get out of here with that nonsense.

Please don't respond to inflammatory trollishness by making the thread still worse. It's hard to resist provocation, but important. The rest of your comment is fine and would have been more persuasive without the first bit.

zxcvcxz · on May 7, 2016

> the Windows Kernel is just as good as any FOSS kernel today in regards to stability.

No it's really not. I've used Windows and Linux a lot and in 10 years of using Linux I've only twice had a kernel panic while using non-experimental software.

On Windows I've had countless BSODs.

tjohns · on May 7, 2016

To be fair (and I'm really not a Windows fan), many of the BSODs on modern versions Windows can be attributed to shoddy third-party drivers.

And as a point of comparison, I've had tons of kernel panics on MacOS over the years, for various reasons. Sometimes defective hardware, sometimes odd software interactions, and sometimes for reasons I can't explain.

Windows has a lot of problems. The kernel itself is not really one of them.

(That said, I still wouldn't use Windows on an embedded device, much less a life-critical one. Even using Linux would give me pause in this scenario.)