Hacker News new | past | comments | ask | show | jobs | submit login

Works great. You can already find questions on Stack Overflow from people getting their database deleted

https://stackoverflow.com/questions/63067062/elastic-search-...

Edit: The person raising that question is working for Atlassian (Jira), looks like Atlassian got their database deleted lol




This edit is speculation.

> I'm running an elastic search for a personal project on google-cloud and I use as a search index for my application.

He very clearly says it’s a personal project. Trying to learn new topics outside of your direct responsibilities, while employed, is very common in the software industry. Not everyone that works at a company is involved in databases at that company.


And getting a lesson in security for free it seems, it sucks but security is important.


Free? It would have required more effort, but they could have encrypted all the data, and then sent the key to a well-known white-hat security researcher, or someone who could be trusted to administrate important cases (they'd of course be free to ignore it). The encryption could be done on the compromised server with a forEach, so it'd be a single request.

I think some people in this thread want to be a bit too "absolutist" about it. Everyone's servers were exposed to heartbleed, spectre, meltdown, etc so the absolutists would apparently want the whole internet deleted.

Edit: It would be helpful if down-voter could explain (I might learn something).


> and then sent the key to a well-known white-hat security researcher

Would you like it if someone involved you in adjudicating potentially illegal (under CFAA & others) without your consent?

This is clearly not a white hat hacker looking to teach people lessons about security. If it were, they could have furnished a list to the major cloud providers of broken instances and given them time to notify and remediate.


>Everyone's servers were exposed to heartbleed

No just my Webserver/HAProxy. The difference is, don't expose services that are not meant to face the Inet directly.

Production-Type Webservers are, SSH, VPN, HAProxy etc are.

Databases, devel-webservers, NFS, Samba are not!

Sure even the best hardened Service can have vulnerabilities, but that's how life is, better have a door with a key than one without, even when someone is capable to open your door with a Lock-pick.


> don't expose services that are not meant to face the Inet directly

I did not (in the slightest) suggest that people should do this. I was commenting on the "free-ness" of the lesson (read the comment I was replying to). It could have been more "free" with a little more effort. Straight-up deletion wasn't the only option.


>Straight-up deletion wasn't the only option.

No but a good one.

>It could have been more "free" with a little more effort.

Even White-Hats work not for free (for companys). Don't build Cars if you don't know how a break work, don't build IT-Services if you have no the slightest idea how to secure them.


Fair, but I don't think you've added much to the thread here.


Same same ;)


I wonder how many of the deleted databases are just people learning with databases of dummy data?


Nice, now they learn never expose a DB directly to the net additionally...bonus points ;)


Except of course they probably already knew that, they just accepted to risk to their toy database as a trade-off for the convenience of being able to directly access it over the internet.


Have fun to re-setup it again then...


That was edited in afterwards. https://stackoverflow.com/posts/63067062/revisions whilst it very well may be a personal project, it certainly wasn't "very clear".


Maybe he edited it to clarify after strangers on the internet got him in trouble at work having implied that his employer had suffered a data breach?


Yeah, imagine if someone started stalking you on the Internet and making up random assumptions about your employer based on the questions you asked on a question-and-answer site. Oh and wait, they are doing this slanderous gossip under a pseudonym themselves, so you can't even call them out personally!

Stuff like this is what drives away underrepresented groups from engaging on the internet. Maybe everyone who upvoted and participated in the uninformed speculation from 'user5994461 should reconsider.


I expect if the author learns about this exchange he would never ask a question that has even a remote possibility of inviting speculation about levels of his knowledge or practices at a company where he happens to work at the time (in other words, pretty much any question at all) using his name.

That kind of public ridicule and possible resulting flak from management is why more and more developers participate in knowledge exchange by asking questions from under a throwaway pseudonym (much like user5994461), and only answer or edit other questions from accounts that connect to their identity in any way.


Sounds like a good public service. I’d much rather have my data deleted until it’s secured than have it stolen by someone else.


depends on the data. it could be public records


Databases can be public and secure. If a database can be deleted, it is not secure.


It sure was public...

This is not what people mean by open data


>If a database can be deleted, it is not secure

True, but a deleted database is secure again ;)


Good.


Oops no welfare for you!

I understand that some people won't learn without encouragement but it's not a good thing for all.


This attack uses public write access, which is how they can delete stuff. I think we can agree that this is not good, and I also think we can agree that a database shouldn't be exposed as-is without an application layer or API on top

Ultimately, companies like MongoDB and Elasticsearch are culpable for selling database technology that is insecure by default, presumably because that's the easiest way to boost their metrics for the VC overlords.


Write being the important keyword

They could have altered the data and no one would have been the wiser


online databases that can be written and deleted by anyone on the internet are no good at all. The data can't be trusted. Of course no welfare for you! All I do is to replace all the names with my name and I can take all the welfare in the whole country! Or for example, doing a search for names and replacing all female names with male names ... how can you trust a database like that?

Making decisions based on a writable database (to the world, and not just from data sources like census, etc) is utterly useless.


Consider Facebook/Twitter as anyone-writable databases. Your comments translate perfectly.


Facebook, Twitter, or even Mediawiki, don't permit any random IP address full database access. (Or had better not.)

Rather, for the first two, large numbers of agents may request access limited to a specific account, with limited capabilities granted.

Even Mediawiki, with an extraordinarily open access model (painfully so in most cases) has checks on extensive abuse, and gradations of permissions.

Suggesting that any of these are comparable to full DBA access as the Meow attack (with considerable merits0 targets suggests an exceeding poor grasp of distinctions or misreading of GP's comment.

You can do better.


Vandalism is not a good public service.

> I’d much rather have my data deleted until it’s secured than have it stolen by someone else

There are multiple logical fallacies in this sentence. First is the use of the world 'until' which is ambiguous here; it suggests that your data can be 'undeleted' after the DB has been secured or you would rather not have any data stored anywhere that is not secured. Either option to me seems like an incorrect read of your comment but I'm not sure. And "than have it stolen by someone else" seems to imply that you know that this data was never copied and cannot be stolen still. I think that seems incorrect, unless there is something I missed that assures everyone that the data could not have been stolen during these hacks.

Lastly, your personally preferred outcome for your personal data is not a measure for all of society, but you grant it that "public service" label as if your preference matters above everyone else's. You don't know what other people think about their data. You don't know what the data even is. What if some of it was just a hobby project for someone, with no financial implications of unsecured data or of data loss, but with emotional attachment to their data? Do they not matter to you?

A blind deletion of unknown data belonging to unknown people is not a public service.


I assume the comment was partially in jest. But this would actually work well if it was consistent and fast. If databases get wiped before you have time to put anything important in them then noone gets hurt.


Yeah, it's bad for the industry right now, but this is just a transition period! Once we get through the pain of losing a few databases, the new steady state where nobody's data is stored in world-writable databases will be better for everyone, and that will be worth the cost.

Consider if this happened five years ago, it would have had a smaller cost than happening today. And it was probably going to happen at some point, so better that it happened five years ago than today. By the same argument, better that it happened now than at any point in the future.

I'm not sure how serious I am about this argument but...at least a little bit? I guess the alternative argument is that any day now software vendors would have all moved to secure-by-default platforms where deploying a world-writable Redis in production would have been so difficult that it rarely happened.


If you have Docker then make sure you have a firewall on top of it, otherwise it will expose pretty much what any docker user wants !


What do you mean by that?


Docker uses it's own iptables rules which have priority over the system ones. Therefore, even if you have an iptables-based firewall blocking all ports, a docker service will still be reachable, unless configured not to be in docker itself.


I do not understand what you mean by "priority over the system ones"

A docker container can have internal ports exposed explicitly, or use host networking. In any case these are ports exposed by the docker-proxy executable - an executable like any other on the system.

Then come the iptables rules of the system (which open or not data flow to the ports exposed by docker-proxy).

Or is it different?


Ah, now I get what you mean - that entries such as

    ACCEPT     tcp  --  anywhere             172.19.0.10          tcp dpt:8843
are created by docker, independently from the configuration of iptables themselves.


Taking precedence was not the ideal word - it uses the same ip tables, but it inserts its own table as the first one. Therefore it 'ignores' system rules, which might come at a surprise.


> But this would actually work well if it was consistent and fast.

So not too concerned about partition tolerance, huh?


No, think about it, stolen or deleted? Which option serves your clients better given the generally awful situation?


This isn't about benefitting the single organization in the moment. This is about over time, moving everyone towards being more secure.


That depends entirely on the data and the client.


> There are multiple logical fallacies in this sentence.

No, there aren't any fallacies in that sentence and can't be.

The statement expresses a personal preference; to be fallacious there must be some logic that can be unsound. That is, it must start from some premises and then derive a conclusion. To find a fallacy, you have to show that at some point the conclusion does not follow from the premises.

Since it's a simple assertion, it is implicitly sound. (The graph of premises to conclusions is just a single node.) And since the author knows with certainty what his preferences are, we can take it as true. It's fruitless to argue with people about what their preferences are.

> First is the use of the world 'until' which is ambiguous here

Virtually all "fallacies" you see online are just people typing their thoughts in a hurry. Take advantage of interaction and ask them to clarify.

> Lastly, your personally preferred outcome for your personal data is not a measure for all of society, but you grant it that "public service" label as if your preference matters above everyone else's.

And as a member of the public, if it serves my interest, it is a public service to some extent.

Now, fair enough, you're trying to attack it as not being some broader notion of a public service. You have that broader notion in mind, but you don't explain what it is.

Instead you apply your internal definition through "as if..." which puts you in the territory of inventing a claim they simply never made. That's not even fallacious, it's pure fiction.

> A blind deletion of unknown data belonging to unknown people is not a public service.

You do make some claims, mostly coached as questions, that might lead to this conclusion. You never plainly state your premises, nor do you connect them to this conclusion.

So after all that, your conclusion is a non sequitur!


It can be, imagine I saw a fire alarm and pressed the button because I thought a fire started, it didn't and I learnt that the fire alarm only looked like it was working, knowing that this would not be fixed for 24 hrs I choose to smash the alarm so it's visibly broken. Is that vandalism?


If you can't look after people's sensitive data you don't deserve to have it.


I completely missed the poor consistency from the "I would rather" comment above. I would also prefer my data deleted and not stolen, but had to read your comment to realize there is no evidence to suggest that. It is funny how much I assume being at least partially aware of my ignorance of the topic.


>Vandalism is not a good public service.

It is, better than to steal the data, you know what a really bad service is? Let your Database wide open, and expose your customers data (maybe?) for everyone to read.


I'm working on a personal project and not at all related to my work. I accidentally kept ports open :facepalm, sorting things out now :)


First thing I always do on any new VPS is to sort out SSH (disable root login, disable password login), set up fail2ban, install and configure ufw... and if I need to set up something like redis or similar, make sure it only listens to internal connections and also that it is decently auth'd. For deployment and other things I make users that can only write to certain directories; no sudo. It's nothing new or special but it gets lost in distributed systems.

It's a lot more work when doing it in the cloud and spinning up these things from docker containers in K8S...but you're entirely to blame if you don't know what you're deploying and don't understand any of the potential threats.


Do you know of any good resources for learning this stuff? I'm interested in being able to do this sort of thing on a small scale, but there seems to be an awful lot that I don't know I don't know.


https://github.com/konstruktoid/hardening

What the parent post said is pretty much it in a nutshell, but I use that GitHub for basic Ubuntu server setup.


When is didn't know better, I was always bitten by Docker circumventinging ufw.


Recommend to setup two subnets in your project. One public and one private. This prevents this sort of issues, instances in the private subnet simply don't get a public IP, they can't be reached over the internet.

For reference, the standard practice in a company is to have a (third) separate subnet for databases, with zero internet access (no NAT gateway). Connection must be explicitly opened from/to database clients. It's a nightmare to manage on premise but it works really well in the cloud with firewalls allowing traffic based on instance tags.


> Recommend to setup two subnets in your project. One public and one private.

This is very good advice. We recently had a uni project where we had to use a MongoDB database. Somebody just apt-get installed a mongodb onto a DO droplet called it a day. Two days later the only remaining records prompted us to transfer x amount of BTC to a adress that was store in our DB. It just contained dummy data, but it is worrying that something like this apparently happens to lots of companies as well.

The only thing I find weird is that ElasticSearch itself does not offer a way to handle authentication, it was just enabled by a plugin that was paid (it seems like its free now).


> The only thing I find weird is that ElasticSearch itself does not offer a way to handle authentication, it was just enabled by a plugin that was paid (it seems like its free now).

"Wierd" is an interesting euphemism for "irresponsible." Defaults are very important. Insecure by default is insecure for 90+% of deployments.


I have _some_ sympathy for ElasticSearch and Redis, having designed/built their software under the assumption it isn't ever intended to be publicly accessible over the internet.

I have a bunch of fairly important personal documents in a filing cabinet with no lock. And I'm perfectly fine with that. I wouldn't keep it in my front yard, because that's obviously stupid, but keeping it inside behind my locked door and upstairs in my office? A perfectly acceptable risk (for me and my files).

I do agree that ElasticSearch do a quite poor/irresponsible job of pointing out their cabinet has no lock. I think Redis do a better job, but are seriously let down by all the internet tutorials that just say "sudo yum install redis" as a minor intermediate step in getting example-todo-list-de-jour working - without even a footnote explaining that anybody who actually visited the redis site now has instructions on how to p0wn your box. ( http://antirez.com/news/96 ) I do think the "Securing Redis" section of this page - https://redis.io/topics/quickstart - deserves to be much closer to the top - I'd have put it before the how to download/install/start instructions myself (though I _think_ recent versions of redis only bind to localhost in the default config, maybe?)


If your assumptions are repearedly demonstrated invalid they are wrong.

Change them.


Personally, I reckon that applies at least as much (if not more) to the devs installing random software packages onto internet connected and un-firewalled servers - as it does to database developers who document clearly that their software is not intended and is actively unsafe to install on directly internet connected servers...

Cave ne recipiens donum...


If a thing should not be run in a given configuration then it should not be runnable in that configuration.

The vendor / developer has both awareness and capability to ensure this.


> Somebody just apt-get installed a mongodb onto a DO droplet called it a day. Two days later the only remaining records prompted us to transfer x amount of BTC to a adress that was store in our DB.

If the default install does this, then I'd blame the package /distro maintainers. It should definitely at least only listen on localhost by default, with stern warnings what is going to happen if you change that without setting up proper security.


MongoDB only binds to localhost for at least the last four versions (4+ years). Someone would have had to install a really old version or intentionally configure it to listen to public IP.


ElasticSearch does offer authentication.

Most of our services were created like a POC & deployed to production, & I joined my company fairly recently.

We had a planned release this week to secure ES. And Saturday, we got "meow"ed


Regarding elasticsearch, that’s actually fine.

Just block access to it on your firewall to the public ports and require people SSH or VPN for access if needed.

It’s not


Where can I find a tutorial or a guide about it for, let's say, Ubuntu? Would this be a good start: https://www.digitalocean.com/docs/networking/vpc/how-to/enab...


The DO tutorial is a good start, but as another poster mentioned further down, check out: https://github.com/konstruktoid/hardening

note: the DO tutorial will hold your hand a little; the hardening doc expects a (minor) degree of familiarity


Thanks. I saw this git repo earlier and it looks interesting (even though most of my machines are on LTS 18).

I don't see anything about subnets in there though. Did I miss something?


> It's a nightmare to manage on premise but it works really well in the cloud with firewalls allowing traffic based on instance tags.

It's not though. Subnetting and firewalling are like the foundation of any corporate network.


Issue is with AWS this setup instantaneously bumps the bill up from a few dollars a month to a few tens of dollars a month. Deal-breaker for personal projects. But, you can still secure the database with whitelisted IP addresses, which is what I do.


The top-voted answer links to this HN page. I'm stuck in an infinite loop.


Nah, you're just in an unbounded recursion - don't worry, I can already now tell you that it ends with stack overflow.


I dunno, does Chrome implement tail call optimization? :)


Nah stack overflow learned their lesson and switched over to free monads.


Ha! Clever!


You should configure a timeout.


You can cheese infinite loop detection by setting a max number of edge traversals (that way you don't just loop longer when someone speeds up the happy path).

For real code, you wouldn't generate a web page with 5 million entries in it, so you can be pretty sure that the data is bad even if it's not cyclical (but it probably is)


Good tree^H^H^H^Hgraph traversal algorithms have a history stack specifically to detect and deal with loops.


>tree^H^H^H^Hgraph

If we pretend we're using readline here, ^W (yank previous word) and ^U (yank to the start of the line) should save you some key presses.

Some recommended bedtime reading:

https://catonmat.net/ftp/readline-emacs-editing-mode-cheat-s...

https://en.wikipedia.org/wiki/GNU_Readline#Emacs_keyboard_sh...


These are not emacs commands. They aren't even unix shell commands. They are TTY commands, some of them dating back to the dot matrix teletype terminals.

My favorite is ^U, which 90% of the time lets you start over on a password prompt when you are sure you just fat fingered but not sure how badly.


Thank you, that one is really useful! The one I had always remembered was ^H for whenever no other way to delete characters works. I need it in some SQL REPLs which aren't configurable.


I think you might be conflating these (or maybe I should say the gp is). Nevertheless.

Do you have a reference to the history of key combos like ctrl f, b, n, p and a and e? Those are typically referred to as emacs style navigation and I am genuinely unaware of history of those as common tty control codes outside of emacs for cursor movement. They weren’t dec vt control codes. Ctrl-U was though and even has ASCII assignment as “NAK”. Ctrl-H and C are similar.. but people don’t typically refer to those as “emacs” keys.


> Do you have a reference to the history of key combos like ctrl f, b, n, p and a and e? Those are typically referred to as emacs style navigation and I am genuinely unaware of history of those as common tty control codes outside of emacs for cursor movement.

I've always heard of it as an ASCII control character and gets its history from Unix interpretations of really old IBM keyboards which got its history from typewriters. I've literally never used emacs for anything other than ^X to exit; I'd rather use cat than emacs. I use vim. But ^H has worked for me as intended on a serial console and on telnet.

Some diving through wikipedia:

[0] says: Pressing the backspace key on a computer terminal would generate the ASCII code 08, BS or Backspace, a control code which would delete the preceding character. That control code could also be accessed by pressing Control-H, as H is the eighth letter of the Latin alphabet.

[1] says: In some typewriters, a typist would, for example, type a lowercase letter A with acute accent (á) by typing a lowercase letter A, backspace, and then the acute accent key. This technique (also known as overstrike) is the basis for such spacing modifiers in computer character sets such as the ASCII caret (^, for the circumflex accent).

[2] says: Unix (command line and programs using readline): Ctrl+H = Delete previous character

[3] supports [0] and says: Caret notation is a notation for control characters in ASCII. The notation assigns ^A to control-code 1, sequentially through the alphabet to ^Z assigned to control-code 26 (0x1A). Often a control character can be typed on a keyboard by holding down the Ctrl and typing the character shown after the caret.

However, it's worth noting that it also says The meaning or interpretation of, or response to the individual control-codes is not prescribed by the caret notation.

But, despite that, ASCII describes control character 8 as backspace [4] [5].

According to this Wikipedia article, work on ASCII began in 1960 and its first release in 1963 [6].

Emacs' first release was in 1985 [7].

I suggest that perhaps in your bubble control sequences are referred to as emacs style navigation even if it's not necessarily the most historically accurate. I'm glad you're working with *nix enough to be familiar with emacs and there's always new old things to learn. There's a lot of history to learn and understand.

[0] https://en.wikipedia.org/wiki/Backspace#%5EH

[1] https://en.wikipedia.org/wiki/%5EH

[2] https://en.wikipedia.org/wiki/Control_key#Table_of_examples

[3] https://en.wikipedia.org/wiki/Caret_notation

[4] https://en.wikipedia.org/wiki/Control_character#In_ASCII

[5] https://en.wikipedia.org/wiki/ASCII_control_characters#Delet...

[6] https://en.wikipedia.org/wiki/ASCII

[7] https://en.wikipedia.org/wiki/Emacs


Oh boy, was that a rabbit hole.

I read your post and thought, "I've misremembered the story."

But the Wikipedia page for the Teletype-33 claims that it 1) had control characters, and 2) was inspiration for some of the ASCII character set, which was defined later in the same year:

https://en.wikipedia.org/wiki/Teletype_Model_33


1) If it‘s a tree, it ain‘t got no loops 2) The stack isn‘t to deal with loops, the „visited“ flag at each edge is there for that. The stack (for DFS, BFS would be a queue) is there to keep track of which nodes have been visited such that you can construct a path from the starting node to the one you‘re looking for.

Obviously there are variants to this, depending on what you‘re actually trying to achieve with it. My point is that a stack would be a very inefficient way to deal with loops.


Modifying the graph turns it into shared (mutable) state. Your code is still re-entrant, but it's no longer concurrent.


1) you're right, I edited my message to reflect that I meant a graph traversal algorithm.

2) a visited flag on an edge? That won't support simultaneous traversals. Keeping a stack is a lot more efficient than permitting only one traversal at a time.


I‘m not sure why you‘re bringing concurrency to the table.

My point still is that looking something up in a stack (did I visit this node?) costs O(n) time, so the BFS will degrade from O(m+n) to O(m*n+n).

To come back to the concurrency, if you can index your edges in some way, you can also store the visited flag in a separate datastracture to support concurrent access (one „flag store“ for each access).


> I‘m not sure why you‘re bringing concurrency to the table.

Not using data structures that enable concurrency prevents performance improvements since modern hardware is, in general, more parallel than vertical.


I hate to laugh when people's hard work is being destroyed, but this is some impressive trolling by the attacker:

> An interesting theory as to why the attacker used the term "meow" is because cats like to drop (or knock) items from tables.


I hadn't considered this. I'm enjoying this even more.


Atlassian is not on Google Cloud, they are an AWS shop. I suspect this is an unrelated personal project.


How much would you bet against that guy having fairly highly privileged AWS IAM access in Atlassian's account?


I worked at Atlassian and have a high degree of trust in their production accounts.


Atlassian's IAM privileges were pretty damn strict from what I remember when working there.


If meant as a public service, it would have been much less destructive to use the change passwords API [0] to set random passwords for all of the users.

[0] https://www.elastic.co/guide/en/elasticsearch/reference/curr...


Given that "unsecured" means "data are accessible and modifiable by anyone", creating tremendous externalities for all referenced in the data, , I'm happy with deletion.

FTA:

One of the first publicly known examples of a Meow attack is an Elasticsearch database belonging to a VPN provider that claimed not to keep any logs.


In some cases, I might be tempted to agree with you, but this is blindly being applied by an automated attack. What if some of that deleted data is volunteer-canvassed anonymized survey data of homeless people, and its loss sets back a homeless relief program by months, resulting in several people freezing to death this winter?


The data may be modified at any time without a trace, rendering it void.

Secure your damned database.

The fault and responsibility lie with the deploying organisation and tools vendor. Meow is just the messenger.


But if they had used the password changes API to assign random passwords to all accounts, as suggested, then the data couldn't be modified. Am I missing something?


Parent's point is that any conclusion one could make from the data is worthless because, being public and unsecured, it could have been modified by any Internet user at any time before a password was set.


Correct.


My understanding is that password-secured DBs aren't vulnerable to Meow remediations.


Then people should feel bad their negligence did cost lives.


Both the DB admins and the attackers should both feel guilt. However, if the attackers simply assigned randomly-generated passwords to all of the accounts, then no data would be lost and the DB admins would still have their DBs temporarily become inaccessible while they figured out how to force-reset their passwords. If you're going to go for disruption, I think the suggested lockout gives a much better ratio of good being done to potential damage being done.


Something tells me that this level of pain would prove insufficient for education.


I wonder how much customer info they were leaking before then.


The question was posted early Friday morning. If this drove their services, I suspect a lot of us would've known about it a lot sooner than now...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: