Hacker News new | past | comments | ask | show | jobs | submit login
PhantomJS - minimalistic headless WebKit (phantomjs.org)
96 points by yakto on March 22, 2011 | hide | past | favorite | 20 comments



Note: not genuinely headless. It still requires you run an X server, such as Xvfb.

As far as I know, there's no genuine headless Webkit.

For a genuinely headless system, consider Chris Lord's "offscreen" branch of Gecko, discussed here: http://blog.mozilla.com/ted/2010/07/29/moz-headless-screensh...

I'd be interested in sponsoring someone who wanted to bring the offscreen branch up to mainline, and Chris would be willing to mentor you in such an effort.


Yeah, I wanted something like this a couple years ago and was disappointed to find no true headless library. I spent a lot of time looking at WebCore, but eventually gave up after a while.

I wonder if there's a way to extricate some of Chrome's rendering logic to do genuinely headless stuff. I know that things are rendered in separate processes and then piped to the browser window process and I have to wonder if there's a way to launch one without the other, and thus, without any requirements for a display server like X.


If you're okay with using Gecko instead of Webkit, the headless branch does compile into a standalone executable you can fork out to, it's what we do.

The issue with Gecko was less about rendering offscreen and more about rendering widgets (dropdowns, input boxes, etc.) which are normally handled by the windowing system.


EA's branches of WebKit, available at http://gpl.ea.com, are designed for integration into console video games and don't require any sort of X server or OS interface. It's not the most up-to-date but it's sufficiently modularized to adhere to the LGPL without violating the Sony/Microsoft developer agreements.


This might be interesting. Remember to check all of the products listed, as we only release our latest EA WebKit version when a game ships with it.

"EA WebKit supports Win32, Win64, Playstation 3, and Xbox 360 platforms" according to the documentation, but perhaps the modularization and the EA Raster primitives gets you further along than some of the other forks.

If anyone has any questions about EA WebKit, let me know and I can pass them along.

EDIT: ...Or you can ask the parent poster, who actually works on them. :)


I use htmlunit for scraping, it's a headless browser although using WebKit will be far better.

It's sad that WebKit lacks some easier integration (good COM/.NET object in Windows).


Crowbar is a Gecko-based scraper, if you're interested in using a real browser: http://simile.mit.edu/wiki/Crowbar


The issue is using Crowbar in latest Gecko versions.


I've used htmlunit as well, basically because it seems to be the most mature. I find it chokes on some google maps js.


We created a genuinely windowless branch of Chromium two years ago named Awesomium; it's free for non-commercial use and is used in several major games and commercial projects: http://www.awesomium.com


"Runs on Windows and Mac OSX Intel platforms."

If you only run on Windows or OS X, it's windowless, but it's not headless.

Headless means something running on a command-line console, and you can't run either of those operating systems that way.


We're working on a Linux port for use server-side but that's not to say our current framework is not headless. We developed this branch to be completely independent of any windowing systems (and have even coaxed Flash and Silverlight to render correctly in such an environment)-- check out our HelloAwesomium sample in the SDK, it runs straight from the command-line and outputs a JPEG.


A capybara driver for this would be fantastic...


Can it be used to render pages? That's still something I desire.


Yes. Either as screenshot images, or as full dom with readable text, even if text was js-written. That's the use-case that gets me hot-n-bothered, too.

Now if Google would use such a system for all their crawling, instead of forcing us ajax-app authors to implement their hackish hash-bang "solution." I'm guessing they would if they could, but the cpu cycles are just too expensive still.


The offscreen Mozilla branch I linked to above can. We use it for our private bookmarking/archiving site. Our headless Gecko drives the thumbnails and previews on qumbler.com.

(I originally wrote "delicious/pinboard clone" instead of "bookmarking/archiving site" but I realized we've had it since before del.icio.us launched.)


really? delicious was 2003; muxway, the predecessor was in 2001.


Like I said, private site, not productized at all. :) We have web-based group bookmarks going back to August 2002, and we have static records going back a few years before that. We starting archiving and versioning pages in September 2002. I don't see that we had tagging until sometime in 2005.



This is brilliant.

Not exactly an award winning comment, but it's literally what I thought when I was reading the quick start and got to the rendering section.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: