PhantomJS and JavascriptExecutor

My main project at TheStreet is a web scraper, and it relies on PhantomJS browser to run user interactions, make DOM adjustments and take screenshots. Taking screenshots is an area where PhantomJS excels over the other Selenium WebDriver implementations.

The thing that works so well for PhantomJS is that the viewport is not constrained to a screen, so it takes screen captures that are as long as the page content is. I just set the viewport to 1366x1 and let the content extend the height of the screen.

One thing that doesn't work well is rendering fonts. Running on AWS the rendering of fonts is extremely unstable. The same site can render as a serif or sans-serif font, and it's not clear why. The problem with this is that it makes the screenshots very different even if the site isn't changed. I'm using pixel-level analysis of the screenshot to detect changes, so this sets off lots of false positives.

PhantomJS uses the JavascriptExecutor interface to allow arbitrary JavaScript to execute in the browser. The interface allows me to pass in the script as a string and optionally some arguments. Arguments are exposed as an arguments array variable. I find that it's easier to understand the scripts when I set arguments by string replacement rather than use the arguments capability.

I use the capability of the JavascriptExecutor interface in several different ways, but the relevant example here is that I can change the style of elements on the page by manipulating properties via the DOM tree. I have a script that recursively sets the font to Arial on all DOM nodes -- PhantomJS renders standard fonts like Arial without any issues. This script generally takes under 20 ms for the pages I work with. Now all of my screen captures are consistent and this source of false positives is no longer occurring.

I also use the JavascriptExecutor capabilities in PhantomJS to adjust the CSS style attributes and check document status on page load. It has proven to be a useful capability to build on.

Comments

KayalJune 5, 2019 at 1:29 AM

Wow!!! This is an innovative post and this is the best explanation on this topic. keep up the great work...
Linux Training in Chennai
Linux Certification
Social Media Marketing Courses in Chennai
Pega Training in Chennai
Oracle Training in Chennai
Oracle DBA Training in Chennai
Tableau Training in Chennai
Unix Training in Chennai
Excel Training in Chennai
ReplyDelete
Replies
SS TechnologyJuly 6, 2022 at 3:43 PM
The delightful article you have posted here. This is a good way to increase our knowledge. Continue sharing this kind of articles, Thank you.Web Data Extraction Service
ReplyDelete
Replies

Add comment

Greg Leib on Software

Search This Blog

PhantomJS and JavascriptExecutor

Labels

Comments

Post a Comment

Popular posts from this blog

ReactJS, NPM and Maven

Solved: Unable to Locate Spring Namespace Handler

Cryptic Facebook Message