Hacker Newsnew | past | comments | ask | show | jobs | submit | abronte's commentslogin

Coupa | San Diego | Full-Time | https://www.coupa.com

We are looking for another full time engineer to join our data platform team. This team manages everything from extracting data to delivering insights. We also help other teams answer their questions with data. If you are a data engineer with interests in data science, this would be a great fit for you as we do a little bit of everything.

Stack: Python, Spark, AWS (EMR, S3, EC2 etc), Git, Jupyter, Parquet, Mysql

You can apply here: https://jobs.lever.co/coupa/5cdd0957-37cd-4419-8537-de60b3bd...


Coupa | python/spark/data engineer | San Diego or San Mateo (pref San Diego) | ONSITE Full Time | http://www.coupa.com

Coupa helps businesses track and manage how they spend money.

Specifically we're looking for somebody to join our Data Insights team. If you like python/spark/data give me an email at adam.bronte@coupa.com


I have the 4gb vmware vps (http://www.ovh.com/us/vps/vps-cloud.xml)

  _04/17/2014 - VMPLAN - DATACENTER - OS - AUTHOR_
  ```
  CPU model:  AMD Opteron(tm) Processor 4386
  Number of cores: 4
  CPU frequency:  3100.000 MHz
  Total amount of RAM: 3943 MB
  Total amount of swap: 1998 MB
  System uptime:   10 days, 2:25,
  I/O speed:  169 MB/s
  Bzip 25MB: 5.34s
  Download 100MB file: 15.8MB/s
  ```


I think they have some more work to do, I just hit a limit... http://i.imgur.com/SQRvzWw.png


Yeah, as far as I can tell they removed the 300 MB limit from the limits page less than an hour ago; I'm sure they're still cleaning up the old limits as we speak.


I just pushed the custom settings feature live.

You can click the "options" link at the bottom of the email or get it here http://dailyhn.com/options.

You can change your timezone, delivery time, the top X items to save, and the point threshold.


You read my mind :). I want to fork this code base and apply it to reddit.


1) Heres a preview image http://i.imgur.com/6D46M.png

2) It takes any story that makes the top 10 on the front page and are over 40 points. (I plan on making this customizable)



We have a pretty big e-commerce scraping project, and we don't really run into many problems regarding blacklisted ip's. There are a few sites who we get consistent bans from, but with elastic ip's it's pretty much a non-issue. I have yet to see a site that ban's all amazon ip's.


I run http://whirlpool.net.au and I religiously check the Amazon EC2 forum announcements[1] for new IP ranges to ban.

[1] https://forums.aws.amazon.com/ann.jspa?annID=1030


> I run http://whirlpool.net.au

An excellent and very useful forum! It seems like whatever topic I'm searching for, google(.com.au) returns a useful result on your site.


Would you mind telling why?


Why?

Name me one good reason. Name me one.


Shitloads of rogue bots doing "social media monitoring".

Shitloads of rogue bots stealing content for black-hat SEO.

Shitloads of rogue bots harvesting email addresses.

Shitloads of rogue bots submitting spammy replies.


So maintain a blacklist of elastic IPs. If it's too big for you, make it a community effort.

Those are bad reasons to close your site to all of AWS.

As nupark2 mentioned, there are legitimate users routing traffic through EC2, even some bots that you'd want to visit your site. Archive.org comes to mind (many of there scrapers are or were behind AWS). Closing your site or app to a large swath of the web is the wrong solution. It's like killing a spider with a bazooka.


Unlike the assumptions you're limited to making, I know how much of my AWS traffic is human, and it's really very very very small. The sad reality is I'm sick and tired of rogue bots, and the tiny sliver of collateral damage can fill out the CAPTCHA validation every so often.

(I also blacklist GWS, rackspace, linode, softlayer, reliablehosting, ovh.net, node4, netdirect, layer42, all TOR exits... it's actually a pretty huge list.)


I whitelist archive.org, and they've never hit through AWS.


That's unfortunate. I don't know what the answer is, but real people do route their traffic out of AWS endpoints.


Far fewer than you might think.


It is annoying when it happens and you're on something like Heroku.


Why is there no "forgot my password" form?


Completely unrelated to dotcloud, but HN has no "forgot my password" function that I could ever find when I needed it. On that I'll agree that having one is super useful.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: