How to Scrape Multiple Pages with ParseHub by Clicking

In this guide:

  • Scrape multiple pages with ParseHub by clicking

Scraping multiple pages has never been easier with ParseHub. In this guide we’ll go through how to spider your way through pages by clicking. I recommend reading my first guide on [How to Scrape JavaScript Webpages with ParseHub] to get started with some of ParseHub’s functionality.

This guide will assume you have signed up for an account, logged in, downloaded the desktop app, successfully launched the app, and logged into the desktop app.

Multiple Pages by Clicking

Start entering this url into the ParseHub browser and start the project:

https://finance.yahoo.com/quote/TSLA?p=TSLA
Start a new project with Yahoo Finance

Our goal here is to go through each of the tickers (ie. TEVA, TSLA, WDC, etc.) by clicking to their respective page and then getting their price and time.

On the main_template, with the first selection, click through at least 2 or 3 tickers (highlighted in green) so all of the other tickers get highlighted in green as well. This will turn the selection into a multi-selection signaled by the “Begin a new entry” command:

Click on the tickers until all tickers are highlighted green

On the “Begin a new entry”, click the + symbol to add a click command. This will say on each of the selection (highlighted green), click on them:

Add the click command

A pop-up will appear. On this pop-up, click on “Create New Template” and add a template name. I used “ticker_template”. This command says, upon clicking the ticker, go to a template that we’re going to create:

Create a new template on the pop-up after initiating the click command

Your new template should now appear right under the main_template. Your new page should also reflect the first click. In my case, it goes to: https://finance.yahoo.com/quote/TEVA?p=TEVA. If it doesn’t, just put that into your browser and open up the new template again.

On the new template, let’s add the command “Begin New Entry” under Page:

On the new template, begin a new entry

Under the Begin New Entry command, add the “Select” command:

Add a select command

During the Select command, click on the full ticker name and rename the command to “full_name”:

Select on the title or full ticker name

Add another Select command and click on the dollar amount, rename the command to “price”:

Select on the market price

Do the same for the date/time:

Select on the date/time

I always recommend grabbing the URL even if this data will be a duplicate of the parent data (will be apparent later). This is so you can cross-check that you’re grabbing from the correct pages. Do so by adding the “Extract” command and the first default will be extracting the URL:

Extract the url also
$location.href is the page URL

Save your project and Test Run:

Test run

Check the results on the bottom right during your test run. Make sure there are data from multiple pages:

Check the bottom right for multiple pages of data

Here’s what the data looks after running the crawler and downloading the CSV:

run_results.csv

If you’re having trouble, try saving this as a .phj file and importing it: [https://gist.github.com/alex4hoang/c1]. Check it against your own to see if you’re doing it right.

 

Comment