Because PhantomJS can load and manipulate a web page, it is perfect to carry out various page automation tasks.
Since the script is executed as if it is running on a web browser, standard DOM scripting and CSS selectors work just fine.
The following useragent.js
example demonstrates reading the textContent
property of the element whose id is qua
:
var page = require('webpage').create(); console.log('The default user agent is ' + page.settings.userAgent); page.settings.userAgent = 'SpecialAgent'; page.open('http://www.httpuseragent.org', function(status) { if (status !== 'success') { console.log('Unable to access network'); } else { var ua = page.evaluate(function() { return document.getElementById('qua').textContent; }); console.log(ua); } phantom.exit(); });
The above example also shows the approach to customize the User-Agent string seen by the remote web server.
As of version 1.6, you are also able to include jQuery into your page using page.includeJs
as follows:
var page = require('webpage').create(); page.open('http://www.sample.com', function() { page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() { page.evaluate(function() { $("button").click(); }); phantom.exit() }); });
The above snippet will open up a web page, include the jQuery library into the page, and then click on all buttons using jQuery. It will then exit from the web page.
Make sure to put the phantom.exit()
statement within the page.includeJs
or else it may exit prematurely before the JavaScript library is included.
Suppose you have an instance of the webpage:
var page = require('webpage').create();
What can be extracted and executed on it?
If window.history.forward would be a valid action
If window.history.back would be a valid action
Can be set to an object of the following form:
{ top: 0, left: 0, width: 1024, height: 768 }
It specifies which part of the screen will be taken in the screenshot
The whole HTML content of the page
The cookies. They have this form:
{ 'name' : 'Valid-Cookie-Name', 'value' : 'Valid-Cookie-Value', 'domain' : 'localhost', 'path' : '/foo', 'httponly' : true, 'secure' : false }
TODO
Contains modifiers and keys TODO
The current library path, usually it’s the directory where the script is executed from
If the page is loading or not
The percentage that has been loaded. 100 means that the page is loaded.
TODO
Where the sqlite3 localstorage and other offline data are stored.
The quota in bytes that can be stored offline
Similar to clipRect but takes real paper sizes such as A4. For an in depth example, check this example: printheaderfooter.js.
The elements that are plain text in the page
The current scrolling position as an object of the following form:
{ left: 0, top: 0 }
The settings which currently only has the useragent
string, e.g page.settings.userAgent = 'SpecialAgent'
;
The page title
The page url
The browser size which is in the following form:
{ width: 1024, height: 768 }
The name of the browser window that is assigned by the WM.
The zoom factor. 1 is the normal zoom.
List of all the page events:
For more information check this in depth example: page_event.js.