Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have a scraper for a site that used to offer an API for their publicly available site but removed the API with no warning. The info is still available to the general public, but only through their website. I created a scraper for the public page, but shortly after they switched to loading some public information through Javascript so my HTML scraper couldn't see it anymore. I ended up having to write an application around Selenium to load the Javascript and import this public information. I'm just waiting for them to start randomizing the CSS classes to make scraping even harder. The content is static, even as data changes on the server it does not refresh on the page unless you reload the page.

There is no reason why your page should refuse to load plain text without Javascript enabled.



> There is no reason why your page should refuse to load plain text without Javascript enabled.

On a technical level sure. SPA's should pre-render data before sending it to the client.

The problem is that's a ton of extra work when the client will have to fetch data anyway - so it's difficult to justify the time to management.

EDIT: If their page fetches the data with JS you might actually have an easier time figuring what their API looks like instead of scraping the rendered page. You might find there's more data available than is rendered too.


I once showed someone how to use dev tools to see an API request a web app makes and how to recreate it with your own code. He’s since done all sorts of “life hack” automations. Love teaching people this stuff!


> There is no reason why your page should refuse to load plain text without Javascript enabled.

Sure there is. You prefer writing javascript and you want to serve your site through a CDN.

You might not think that's a good reason, but that's certainly a reason.


Until the ADA comes along and demands you create an accessible to the blind site.

I've often wondered when the laws would start to be applied and I think its coming


> It is a common misconception that people with disabilities don't have or 'do' JavaScript, and thus, that it's acceptable to have inaccessible scripted interfaces, so long as it is accessible with JavaScript disabled. A 2012 survey by WebAIM of screen reader users found that 98.6% of respondents had JavaScript enabled. [0]

and that was 7 years ago. There may be certain complicated interactions that are a challenge for screen readers but simply because a page relies on JavaScript for rendering doesn't automatically mean it is inaccessible to screen readers.

[0] https://webaim.org/techniques/javascript/#reliance


I have a website that's a full page map. I care about accessibility - is there any way I can make this meaningfully accessible to the blind?


Look at WCAG 2 (web content accessiblity guidelines) they specify tags and elements common screen readers will understand to help make you site accessible.

This is a really good resource: https://accessibility.18f.gov/

A lot of frameworks now have some accessiblity built in if you add the correct attributes.


I suspect you're being glib, but you could look at https://wiki.openstreetmap.org/wiki/OSM_for_the_blind


I'm confused. Why would a blind person be any less likely to use JavaScript?


To test for a blind persons ability to render your website a good method is a CLI browser. Neither the blind persons device or a CLI browser will render javascript


I’m no expert in this space, but if free, common, and easily accessible tools render your site readable and usable, I’m not sure how refusing to use those tools would be a claim under ADA.

There are plenty of real ways that sites are unusable by screen readers, using Javascript to download dynamic content shouldn’t be one of them.


Why on earth would you think that? The screen readers tie into modern browsers like Safari or Chrome.


What does a CDN has to do with it?


Because if your application bundle is a fixed asset — like a JS SPA that fetches it’s data from an API then you can distribute your entire application via an inexpensive CDN.

As soon as your application bundle is rendered on your servers dynamically then only part of your site can be delivered via CDN.

Basically going all-JS gives you an app model where your sever side code doesn’t even know or care about HTML or the web or whatever. It just pushes JSON or whatever around and it largely client independent.

Great model when you need to support iOS, Android, web, desktop.


Using a CDN like their talking about likely means your html is static and only served from the CDN.

For many SPA's the only actual html is a header, container div, and a call to the app's js. Ignoring the header there might only be say half a dozen lines total.


Chrome headless does that really well.

> google-chrome --headless --run-all-compositor-stages-before-draw --virtual-time-budget=25000 --print-to-pdf='foo.pdf' URL

Edit: Plus

> pdftotext -raw foo.pdf


> There is no reason why your page should refuse to load plain text without Javascript enabled.

Of course there is! It’s my site and I can do what I want with it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: