Ghost.py is a webkit web client written in python

pie · on April 26, 2012

This appears to be a wrapper for PyQt4's QTWebKit.

http://www.riverbankcomputing.co.uk/static/Docs/PyQt4/html/q...

wahnfrieden · on April 27, 2012

Ah it's going to be a quite out of date version of Webkit then. Anyone know which version exactly? We're using PyQt and this has been a huge issue.

albertzeyer · on April 27, 2012

Here you'll find the exact revision for each Qt version: http://trac.webkit.org/wiki/QtWebKit

E.g. I use Qt 4.8 (via PyQt 4.9) and thus I have WebKit trunk from 2011-05-05 (r85855).

There is also PyQt4.QtWebKit.qWebKitVersion().

If you have the Qt source at hand, check src/3rdparty/webkit/VERSION.

wahnfrieden · on April 27, 2012

Thanks

albertzeyer · on April 27, 2012

I have done this somewhat more directly with PyQt4. I wonder wether it's useful to have this layer in between. Whereby it seem to have some useful tools. This is the main code:

* https://github.com/jeanphix/Ghost.py/blob/master/ghost/ghost...

* https://github.com/jeanphix/Ghost.py/blob/master/ghost/utils...

If you are interested, this is some own code where I just use PyQt4 directly:

* https://github.com/albertz/google-books-export/blob/master/g...

jlarocco · on April 27, 2012

I once used a similar technique to mass download some elevation data files from the USGS website.

I forget the exact details, but fetching the URL just kicked off a job on their server and returned some Javascript to execute. The Javascript did "something" while the data was being fetched/processed on the server, and eventually decided when it could start the real download.

I spent a while trying to figure out the Javascript, but finally came up with the PyQt/WebKit approach.

It's the ugliest download code in the world, but it's up on GitHub: https://github.com/jl2/GIS-Stuff/blob/master/map_download/ne...

I'm not sure how useful something like Ghost would have been. I was basically using it as a glorified urllib.request, though, and it doesn't look like that's the main use case for Ghost.

candeira · on April 27, 2012

I just tried to use dryscrape [1] for a project. It's great when it works, but it's not liberal enough in what it accepts [2], so it gives off showstopping InvalidResponseErrors (which make sense when the library is using for BDD, but not when you are using it to get at javascripty download links).

This ghost.py looks great, I'll give it a go after dinner tonight.

[1] https://github.com/niklasb/dryscrape

[2] http://en.wikipedia.org/wiki/Robustness_principle

[3] https://github.com/niklasb/dryscrape/issues/6

fpp · on April 27, 2012

Could someone describe the key differences to Phantomjs (indirectly referred to in credits via Casper.js)

Phantomjs has just recently stopped their Python support.

zackzackzack · on April 27, 2012

Phantom js has to run as it's own process, so no support for node or anything similar. This looks like it can run within something like django.

fpp · on April 27, 2012

You can use PhantomJS with Node e.g. with child_process and messages via stdout. Works pretty well. Running this in its own process context might actually be a benefit e.g. when you spawn multiple phantomjs browsers

There is a project on github where this was taken a step further via dnode ( https://github.com/sgentle/phantomjs-node ) - you get access to the phantomjs objects, get/set properties and access the phantomjs api methods.

Will certainly have a deeper look into ghost.py

treenyc · on April 27, 2012

Is it possible to use Phantom JS or Ghost.py to access a web site that is using Facebook Connect? Just wonder if anyone has tried it yet.

daGrevis · on April 27, 2012

Is it possible to run jQuery inside Ghost.py to parse HTML with it?

schwuk · on April 27, 2012

Why not use pyquery (https://bitbucket.org/olauzanne/pyquery/)? Plays nice with WebTest (http://webtest.pythonpaste.org/) and django-webtest (https://bitbucket.org/kmike/django-webtest/).

jMyles · on April 27, 2012

To use with django's LiveServerTestCase?

chrishacken · on April 27, 2012

Looks pretty useful. Def. going to play around with this a bit.