Hacker Newsnew | past | comments | ask | show | jobs | submit | more tegansnyder's commentslogin

There are a lot of folks reevaluating their crawling engines lately now that Chrome headless is maturing. To me there are some important considerations in terms of CPU/memory footprint that go into distributing a large headless crawling architecture.

The stuff we are not seeing open-sourced is the solutions companies are building around trimmed down specialized versions of the headless browsers like Chrome headless, Servo, Webkit. People are running distributed versions of these headless browsers using Apache Mesos, Kubernetes, and Kafka queues.


I started one of the first 56k dial up ISPs in Nebraska when I was 13 years old with the help of my parents. In high school I partnered with some friends after reading a CNET article about a WISP in Washington state. My friend's dad had a few businesses that needed to share a T1 line so we connected them wirelessly and sold the excess capacity to farmers in rural Nebraska and Kansas. It was an amazing experience. Working with networking gear, setting up FreeBSD servers, learning about NEMA enclosures, antennas, polarity, frequency hopping FHSS vs DSSS, 2.4,5.2,5.8GHz, and 900Mhz. We had our ups and downs and learned a lot about what gear worked the best. My senior year of high school we sold the company. I still browse my google history to feel nostalgia from those days :)


+1. I had a similar experience, starting one of the first ISPs back in Brazil, in 94-95. Me and my two co-founders came from an academic background, so we were using BITNET for a few years by then, and saw firsthand when the internet started to open up for commercial access.

Overall it was a great learning experience, but incredibly challenging. There was nothing like OP's guide (or HN, or Google for that matter), so the knowledge had to come from books, Usenet news, and mail lists. It helped that we had experience managing Unix servers and WAN networks at the University, but still had tons of things to figure out, from setting up dial-up lines, to keeping httpd servers up and running (we decided to use Linux 0.99 beta from the get-go, but it kept crashing, usually in the middle of the night; later we discovered it was a race condition in the multi-serial port card driver, which manifested only when several ports were under heavy use, hence only happening at night).

Although we could solve most of the technical challenges, we were absolutely clueless on how to run a business: selling, bizdev, billing customers, hiring and building teams, etc. I have fond memories for those (usually bad) decisions, but can't stop thinking how things would be different if I could magically go back and do it all over again :)

We quickly reached a few thousand customers, and realized it was turning into a capital intensive business - phone lines in Brazil in the 90's were crazily expensive, plus Cisco routers, server upgrades, etc. We decided to pivot towards B2B, thus becoming one of the first corporate ISPs: web hosting, security consulting, leased lines, some web development.

We sold the company 5 years later, when the market started consolidating, right before the dotcom crash.


Fellow brazilian here - would love to hear your story on a longer form if it exists.


I would love to hear more about your experience. Did you write about it anywhere?


There are various posts on DSLReports from my early days. Unfortunately a lot of the pictures I posted are no longer available. Here is an example: http://www.dslreports.com/forum/r3891399-Split-Wire-WISP


That's a great story and incidentally a similar one to another Nebraska ISP I'm familiar with (KDSI). I think they got started when a larger industrial company had excess bandwidth. Sounds like your story was a lot of fun for a high schooler. Who did you sell it to and why?


I know of KDSI! We ended up selling to a guy starting a new business called RCOM in Kearney, NE. I worked with him for half a year getting everything transitioned. We sold our Kansas operations to telco called Nex-Tech. At the time the business was funded by my good friend's dad. Collectively we decided it was best for us to focus on our further education and go to college.


>We had our ups and downs

If it was intentional, great job :)


That talk might interest you, someone doing similar in the 90s in Germany: https://media.ccc.de/v/34c3-9034-bbss_and_early_internet_acc...


+1 for hearing more about this experience. If you wrote about it, please do share a link.



Haha, this is legit. Nice find.


Wow. I forgot about that video!


That is amazing! Can you tell us more about the experience?!


This would make a great article if you write it up.


Very cool. How much did you sell for?


Can anyone recommend a good tutorial or reading on how to setup content-based (visual) image searching using a CNN to process images. I'm looking to build a POC of a reverse image search trained with in-house product data. In the past I've used the imgSeek but it is dated and not using neural nets.


For a POC you could use Tagbox, just a REST API packed as a Docker container from https://machinebox.io, here the blog post https://blog.machinebox.io/visual-search-by-machine-box-eb30...

Disclaimer: I did both of them so, I'm a little bias :)


Manufacturing, finance, areospace, automotive industry, and medical industry are a few other places hiring data scientists.


Doesn't this data already exist in the form of ISP data brokers? I'm thinking of data that makes its way to into the hands of some marketing companies that show anonomized URL level traffic for a given website. Essentially giving you ability to see analytics on a website you don't own. Anybody know who the big players/ISP data brokers are?


A old friend of mine and I visited a corn field in Nebraska ten years ago in search of some wreckage left behind from a crash in 1966. With the help of a metal detector we located metal identification plates, various small pieces of metal, and an ash tray (smoking on planes was once a thing). https://en.m.wikipedia.org/wiki/Braniff_Flight_250


Many vendors will also rent out them by the day. There used to be a company in Denver that did back in my WISP days.


Congrats David and Team!


Interesting read. I was speaking with some colleagues just yesterday about a potential pet project to identify which of our customers have eCommerce websites.

The concept would involve processing millions of companies names found on the "Bill TO" field of sales records. Then using these records to populate a ElasticSearch index for use with Graph Query API to help further normalize/dedup the company names that share similar string semantics. The next stage of the process would be to scan the normalized, dedupped, list of company names and attempt to locate the company website URL by crawling the first page of Google search results. This would need to be metered because I assume Google would block me if I performed rapid attempts. After gathering a list of company URLs the plan would then shift gears into attempting to identify if any of the companies websites contain the typical components that make up an eCommerce website. Think searching the HTML for all variations of "add to cart", "shopping cart", "my account", etc.


That is a more comprehensive list. Thanks for sharing.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: