Hacker News new | past | comments | ask | show | jobs | submit login
Ghost.py is a webkit web client written in python (jeanphix.me)
177 points by davidbrai on April 26, 2012 | hide | past | favorite | 15 comments



This appears to be a wrapper for PyQt4's QTWebKit.

http://www.riverbankcomputing.co.uk/static/Docs/PyQt4/html/q...


Ah it's going to be a quite out of date version of Webkit then. Anyone know which version exactly? We're using PyQt and this has been a huge issue.


Here you'll find the exact revision for each Qt version: http://trac.webkit.org/wiki/QtWebKit

E.g. I use Qt 4.8 (via PyQt 4.9) and thus I have WebKit trunk from 2011-05-05 (r85855).

There is also PyQt4.QtWebKit.qWebKitVersion().

If you have the Qt source at hand, check src/3rdparty/webkit/VERSION.


Thanks


I have done this somewhat more directly with PyQt4. I wonder wether it's useful to have this layer in between. Whereby it seem to have some useful tools. This is the main code:

* https://github.com/jeanphix/Ghost.py/blob/master/ghost/ghost...

* https://github.com/jeanphix/Ghost.py/blob/master/ghost/utils...

If you are interested, this is some own code where I just use PyQt4 directly:

* https://github.com/albertz/google-books-export/blob/master/g...


I once used a similar technique to mass download some elevation data files from the USGS website.

I forget the exact details, but fetching the URL just kicked off a job on their server and returned some Javascript to execute. The Javascript did "something" while the data was being fetched/processed on the server, and eventually decided when it could start the real download.

I spent a while trying to figure out the Javascript, but finally came up with the PyQt/WebKit approach.

It's the ugliest download code in the world, but it's up on GitHub: https://github.com/jl2/GIS-Stuff/blob/master/map_download/ne...

I'm not sure how useful something like Ghost would have been. I was basically using it as a glorified urllib.request, though, and it doesn't look like that's the main use case for Ghost.


I just tried to use dryscrape [1] for a project. It's great when it works, but it's not liberal enough in what it accepts [2], so it gives off showstopping InvalidResponseErrors (which make sense when the library is using for BDD, but not when you are using it to get at javascripty download links).

This ghost.py looks great, I'll give it a go after dinner tonight.

[1] https://github.com/niklasb/dryscrape

[2] http://en.wikipedia.org/wiki/Robustness_principle

[3] https://github.com/niklasb/dryscrape/issues/6


Could someone describe the key differences to Phantomjs (indirectly referred to in credits via Casper.js)

Phantomjs has just recently stopped their Python support.


Phantom js has to run as it's own process, so no support for node or anything similar. This looks like it can run within something like django.


You can use PhantomJS with Node e.g. with child_process and messages via stdout. Works pretty well. Running this in its own process context might actually be a benefit e.g. when you spawn multiple phantomjs browsers

There is a project on github where this was taken a step further via dnode ( https://github.com/sgentle/phantomjs-node ) - you get access to the phantomjs objects, get/set properties and access the phantomjs api methods.

Will certainly have a deeper look into ghost.py


Is it possible to use Phantom JS or Ghost.py to access a web site that is using Facebook Connect? Just wonder if anyone has tried it yet.


Is it possible to run jQuery inside Ghost.py to parse HTML with it?



To use with django's LiveServerTestCase?


Looks pretty useful. Def. going to play around with this a bit.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: