Screen Capture with PhantomJS

Since PhantomJS is using WebKit, a real layout and rendering engine, it can capture a web page as a screenshot. Because PhantomJS can render anything on the web page, it can be used to convert HTML content styled with CSS but also SVG, images and Canvas elements.

The following script demonstrates the simplest use of page capture. It loads the GitHub homepage and then saves it as an image, github.png .

To run this example create a new file called github.js . Copy and paste the above code into the github.js file. In the command line, run this newly created script with PhantomJS:

Beside PNG format, PhantomJS supports JPEG, GIF, and PDF.

In the examples subdirectory, there is a script rasterize.js which demonstrates a more complete rendering feature of PhantomJS. An example to produce the rendering of the famous Tiger (from SVG):

which gives the following tiger.png :

Another example is to show polar clock (from RaphaelJS ):

Polar Clock

Producing PDF output is also easy, such as from a Wikipedia article:

You can change the size of the screenshot and the webpage using the page’s attributes:

Canvas can be easily constructed and converted to an image. The included example colorwheel.js produces the following color wheel:

Color Wheel

It is possible to build a web screenshot service using PhantomJS. Some related projects make it easy to create such a service.

  • Advertise with us
  • Explore by categories
  • Free Online Developer Tools
  • Privacy Policy
  • Comment Policy

How to create a screenshot from a website or html with PhantomJS in Node.js

Carlos Delgado

Carlos Delgado

  • December 31, 2016
  • 26.7K views

Learn how to create a screenshot from a website or even plain html using PhantomJS in Node.js

PhantomJS is a headless WebKit scriptable with a JavaScript API multiplatform, available on major operating systems as: Windows, Mac OS X, Linux, and other Unices. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

In this article, you will learn how to manipulate the PhantomJS CLI with Node.js using the webshot module.

Requirements

You will need PhantomJS (installed or a standalone distribution) accesible from the PATH (learn how to add a variable to the PATH in windows here). In case it isn't available in the path, you can specify the executable to PhantomJS in the configuration later.

You can obtain PhantomJS from the following list in every platform (Windows, Linux, MacOS etc)  in the download area of the official website here .

Note: there's no installation process in most of the platforms as you'll get .zip file with two folder, examples and bin (which contains the executable of PhantomJS).

Implementation

PhantomJS is a command line tool (CLI), therefore we would need to use this utility with Node.js using a child process. However, we won't reinvent the wheel and you neither, to make this task easily for us, use a third party module, in this case we are talking about the node-webshot module. Node Webshot provides a simple API for taking webpage screenshots. The module is a light wrapper around PhantomJS, which utilizes WebKit to perform the page rendering.

To install this module in your project, execute the following command in your terminal:

Note: however, the webshot module has a prebuilt of phantomjs included as a dependency, located in your-project/node_modules/phantomjs-prebuilt/lib/phantom/bin/phantomjs.exe and it's automatically used if not any phantomPath is providen, therefore the usage of webshot would work without any configuration.

Save it in your project if you need to using the --save parameter, after the installation you'll be able to require the module using require('webshot') .

As mentioned previously, you need phantomjs accessible from the command line, in case you don't, specify the full path to the executable by providing the phantomPath option:

Webshot tries to use the binary provided by the phantomjs NPM module, and falls back to the phantomPath if the module isn't available.

Create screenshot from website

You can create a screenshot from any website, just provide the website URL as first parameter and the output file as second parameter:

Create screenshot from html file or plain html string

You can create a screenshot from a html string, just provide the markup as a string in the first parameter, the output filename as second parameter and specify in the options that you're using plain html:

You can create a screenshot from a html file by setting the siteType to file and provide as first parameter of the webshot function, the absolute path to the file:

Alternatively, you can read the content of the file using the filesystem module and set the siteType to html:

You can set more options in the object, see the available options in the docs of the repository here .

Change format of the screenshot

The generated screenshots format can be either png , jpg or jpeg . To change the output format, set the streamType with a string with the format (besides note that the output filename needs to have the same extension):

Webshots options

In the same way you do with the CLI of PhantomJS, you can set options dinamically within an object for the webshot module (and for PhantomJS). The following table shows all the available options for webshot and for PhantomJS:

PhantomJS callbacks

Arbitrary scripts can be run on the page before it gets rendered by using any of Phantom's page callbacks, such as onLoadFinished or onResourceRequested. For example, the script below changes the text of every link on the page:

Note that the script will be serialized and then passed to Phantom as text, so all variable scope information will be lost.

Happy coding !

Senior Software Engineer at Software Medico . Interested in programming since he was 14 years old, Carlos is a self-taught programmer and founder and author of most of the articles at Our Code World.

Become a more social person

Related articles.

How to use PhantomJS with Node.js

How to use PhantomJS with Node.js

  • February 09, 2017
  • 47.8K views

How to create a screenshot of your website with JavaScript using html2canvas

How to create a screenshot of your website with JavaScript using html2canvas

  • March 18, 2017
  • 101.1K views

How to create a screenshot of a website using SnappyBundle (wkhtmltoimage) in Symfony 3

How to create a screenshot of a website using SnappyBundle (wkhtmltoimage) in Symfony 3

  • September 15, 2016
  • 13.7K views

Creating screenshots of your app or the screen in Electron framework

Creating screenshots of your app or the screen in Electron framework

  • October 12, 2016
  • 47.3K views

How to clone a website (download HTML,CSS, JavaScript, Fonts and Images) using Website Scraper in Node.js

How to clone a website (download HTML,CSS, JavaScript, Fonts and Images) using Website Scraper in Node.js

  • February 10, 2020
  • 125.8K views

How to download the source code (JS,CSS and images) of a website through its URL (Web Scraping) with Node.js

How to download the source code (JS,CSS and images) of a website through its URL (Web Scraping) with Node.js

  • February 05, 2017
  • 108.9K views

Advertising

Free Digital Ocean Credit

All Rights Reserved © 2015 - 2024

Codementor Community

  • React Native
  • Machine Learning
  • Rasberry Pi
  • Selenium WebDriver
  • Visual Studio
  • Ruby on Rails

Codementor Events

Scaling PhantomJS: Taking Thousands of Full Page Screenshots Every Day

Scaling PhantomJS: Taking Thousands of Full Page Screenshots Every Day

This article will show you how to use PhantomJS at scale to make multiple website screenshots as a RESTful service. I implemented this service in my own way, and there are many different ways to do this, but do keep in mind that I am talking about a real life example that serves 1000+ customers a day. The use case I’m talking about is Custodee , a SaaS App based on web crawling, analytics, and screenshots.

Challenges of creating full page screenshots

After building and running Custodee (you can read about how I built and launched it on Product Hunt here ), I went through the trouble of creating full-page website screenshots. While doing this, I was confronted with three problems: Modern technologies like Chrome Selenium still have problems taking full-page screenshots without faking them. This means Selenium will take multiple screenshots by scrolling down and putting them together into one image afterwards. This won’t work for most websites because of fixed HTML components. Older technologies like PhantomJS, as well as newer ones, are all quite heavy on CPU and RAM resources. Optimizing it to run at scale and small servers was quite important to me to keep server costs as low as possible. PhantomJS is buggy and will crash randomly from time to time. This needs to be handled.

PhantomJS in my back-end architecture

Because I fell in love with Node.js, I experimented with a lot of different npm packages, but none of them did the job. Therefore, I decided to just use a PhantomJS wrapper and build the functionality on top of it.

The code in this article is a simplified version of my Custodee back-end, which ran on multiple servers and crawls thousands of websites per day — see the diagram representing Custodee’s architecture below.

  • The website is on the front-end server with Node.js and AngularJS.
  • There can be many more back-end servers, depending on the traffic (hence the +n ).
  • This is the reason why there’s the AWS ELB (load balancer) — it routes the traffic to the back-end servers, which shares the load between them.
  • Importing and saving all the images as a REST API (the images are sent from the back-end servers)
  • Pushing premium users’ images to their Dropbox

(If you’d like to read more about this, you can refer to my previous post .)

For this post, I stripped down my current back-end implementation to just do full-page screenshots, so my examples won’t become too complex. You can get the whole project from my [GitHub]((https://github.com/TonySchu/) and run it on your local machine. I included a simple front-end to use the service, but you can just do post requests as well.

To run it on your local machine, just download it here and follow the instructions.

After running the node application, you can test it on localhost:8089.

Because the process of creating a new PhantomJS instance uses a lot of CPU resources, I could not just start multiple browsers for each crawling process. After testing different approaches, I knew these aspects had to be addressed: Reuse an existing PhantomJS instance for as long as possible, but close it before it crashes. Because everything works in async and PhantomJS can’t handle multiple operations at once, all functions have to be designed to handle async functions (create a browser tab, open an URL, extract the HTML, render the screenshots, etc.) Limit the back-end to run a maximum of four parallel instances. Otherwise, the server will crash. (I tested this on small EC2 instances on Amazon Web Services. If I want to run more than four instances, I have to use a bigger server or scale the number of servers.)

This is a simple API to post website links and a username to the application. On your local machine, the endpoint will be http://localhost:8089/api/phantom/:user . The last part of the URL ( :user ) will be used to create a folder on your machine to store the screenshots.

The Crawler

Here, we’ll start with the CrawlerObject , which will be used to pass around data between the different processes. It contains the current iteration, website URLs, the PhantomJS instance, and configurations. Because PhantomJS is quite hungry for resources, I limited the application to run only fourprocesses in parallel. You can change this in the global variable “maxInstances,” depending on the power of your machine or server. This should work well on a small EC2 instance on AWS.

Create a PhantomJS instance

This function creates a new, fresh PhantomJS instance. Its process ID will be stored in a global array, so we can always check how many instances are currently running or use this ID to kill a buggy instance on our server. The newly created CrawlObject is now passed into the next function createWebsiteScreenshots . The first thing to check is the current index of the CrawlObject. This is important because the more websites you run through a PhantomJS instance, the more likely it is to randomly crash.

While optimizing Custodee’s back-end, I figured out that 20 iterations is a good limit to work with. After 20 iterations, I will shut down the active instance and continue the crawling process with a fresh one, just to be safe. If you ask yourself — why not just use a fresh instance for each website screenshot every time — it is mainly because the creation process is the main reason for the high CPU usage. Reusing an instance to render multiple websites is very important in keeping the usage of the server’s resources low, and in making the screenshot process faster.

Rendering website screenshots

Instead of just running a normal loop over this function, we are always waiting for one iteration to finish before we call the next one. This is necessary for the same reason we are reusing a PhantomJS instance multiple times. It is just super heavy on a machine and will not work at scale. The way PhantomJS works is that you can’t do operations in parallel, like taking screenshots, working with the HTML content, or even clicking and navigating on a website. (Yes, you can do inputs, clicks, file uploads, but you need to wait until each step is completed before beginning the next step.)

After setting up the necessary properties and configurations, we can call page.open , which will create a browser tab of with the specific URL we want to crawl. Now, we can do operations like getting the HTML content or creating full-page screenshots. The PNG file will be stored under /public/images/username , and can be directly called from the server, like on the application’s index.html .

Any kind of error will be caught and handled with a new start of a fresh PhantomJS instance. This will insure that it will not fail on a specific website crawl. Keep in mind that I wrote also some helper functions like continuePhantomLoop() . This is not delivered by npm PhantomJS, so feel free to check the whole code on GitHub.

How to scale PhantomJS

As already mentioned, the application is limited by the configuration to run only X number of instances in parallel. The way I scale this service to crawl thousands of websites per day is to run it on multiple small servers in parallel. This way the API calls will go through a load balancer, which will direct and balance the requests to various servers, depending on the current load and number of users. I’ve also automated the servers to boot and shut down depending on the amount of requests. In production, the images and data are then pushed to another server, which can be called from the front-end. Images are also pushed to premium users’ Dropbox accounts.

😃

I am also happy to answer any questions in the comments below. 👇

Enjoy this post? Give Tony Schumacher a like if it's helpful.

post comments

Leave a like and comment for Tony

Markdown

static website forms

Hello Tony, if it’s not secret what EC instances are you running? Some of the websites are really huge with many pictures and could easily take 1.5Gig of memory.

Many sites have implemented the content lazy loading feature, how are you handling this?

I am using small instances of EC2 servers. This is the absolut minimum. Micros will sometimes work but in most cases it will crash due to missing RAM. In 95% 2GB Ram are enough to handle my use cases.

You can avoid lazy when you automatically scroll down to the page and wait for the content to load. You can also inject custom Javascript into the site for this.

Thanks for this article. Did you try pupetter ?

IMAGES

  1. [Code example]-PhantomJS to screenshot website div for Spring MVC

    phantomjs screenshot div

  2. How to create a screenshot from a website or html with PhantomJS in Node.js

    phantomjs screenshot div

  3. PhantomJS Get Web Page Screenshot

    phantomjs screenshot div

  4. Web Page Screenshot with PhantomJS

    phantomjs screenshot div

  5. PhantomJS Tutorial 2

    phantomjs screenshot div

  6. Scaling PhantomJS: Taking Thousands of Full Page Screenshots Every Day

    phantomjs screenshot div

VIDEO

  1. Model Baju Tunik Terbaru 2022 || Model Atasan Terbaru dan Setelan

  2. Earphones wholesale market ₹50

  3. Screen Samsung S22 Plus Incoming Call & Google Meet & WhatsApp

  4. BQ Strike P & Redmi 10 & Samsung J710 screen recording calls / Incoming Calls

  5. laptop me screenshot kaise le. How to take screenshot in laptop. #screenshot #viralshorts #laptop

  6. Only Samsung Galaxy screen calls recordings / Incoming Calls / Big mix

COMMENTS

  1. javascript - how to screenshot a div using phantomJS - Stack ...

    how to screenshot a div using phantomJS Ask Question Asked 6 years, 11 months ago Modified 5 years, 6 months ago Viewed 4k times 2 Hi does anyone here can help me to screenshot my div using phantomJS? How can i Screenshot my #dropzone then append it on the same page? Please help. javascript jquery html phantomjs screenshot Share Follow

  2. Screen Capture with PhantomJS

    Screen Capture with PhantomJS Since PhantomJS is using WebKit, a real layout and rendering engine, it can capture a web page as a screenshot. Because PhantomJS can render anything on the web page, it can be used to convert HTML content styled with CSS but also SVG, images and Canvas elements.

  3. javascript - Phantomjs: take screenshot of the current page ...

    1 I'm trying to use Phantomjs to capture a screenshot from the same page that the user is on. For example, A user is on my-page.html and has made some changes to the elements of this page, now I need to take a screenshot of an element (DIV) inside this page (my-page.html) and save it.

  4. java - PhantomJS to screenshot website div for Spring MVC ...

    PhantomJS to screenshot website div for Spring MVC, Tomcat and iText use Asked 9 years, 9 months ago Modified 9 years, 9 months ago Viewed 4k times 4 I've been introduced to the power of PhantomJS and CasperJS to take website screenshots.

  5. How to use phantomjs to take screen shot with dynamic data ...

    1 Answer Sorted by: 1 Sure, just add the functionality you want after page.open () and before page.render ().

  6. How to create a screenshot from a website or html with ...

    Node Webshot provides a simple API for taking webpage screenshots. The module is a light wrapper around PhantomJS, which utilizes WebKit to perform the page rendering. To install this module in your project, execute the following command in your terminal: npm install webshot

  7. Scaling PhantomJS: Taking Thousands of Full Page Screenshots ...

    This article will show you how to use PhantomJS at scale to make multiple website screenshots as a RESTful service. I implemented this service in my own way, and there are many different ways to do this, but do keep in mind that I am talking about a real life example that serves 1000+ customers a day.