In this guide:
- Scrape multiple pages with ParseHub by clicking
Multiple Pages by Clicking
Start entering this url into the ParseHub browser and start the project:
Our goal here is to go through each of the tickers (ie. TEVA, TSLA, WDC, etc.) by clicking to their respective page and then getting their price and time.
On the main_template, with the first selection, click through at least 2 or 3 tickers (highlighted in green) so all of the other tickers get highlighted in green as well. This will turn the selection into a multi-selection signaled by the “Begin a new entry” command:
On the “Begin a new entry”, click the + symbol to add a click command. This will say on each of the selection (highlighted green), click on them:
A pop-up will appear. On this pop-up, click on “Create New Template” and add a template name. I used “ticker_template”. This command says, upon clicking the ticker, go to a template that we’re going to create:
Your new template should now appear right under the main_template. Your new page should also reflect the first click. In my case, it goes to: https://finance.yahoo.com/quote/TEVA?p=TEVA. If it doesn’t, just put that into your browser and open up the new template again.
On the new template, let’s add the command “Begin New Entry” under Page:
Under the Begin New Entry command, add the “Select” command:
During the Select command, click on the full ticker name and rename the command to “full_name”:
Add another Select command and click on the dollar amount, rename the command to “price”:
Do the same for the date/time:
I always recommend grabbing the URL even if this data will be a duplicate of the parent data (will be apparent later). This is so you can cross-check that you’re grabbing from the correct pages. Do so by adding the “Extract” command and the first default will be extracting the URL:
Save your project and Test Run:
Check the results on the bottom right during your test run. Make sure there are data from multiple pages:
Here’s what the data looks after running the crawler and downloading the CSV:
If you’re having trouble, try saving this as a .phj file and importing it: [https://gist.github.com/alex4hoang/c1]. Check it against your own to see if you’re doing it right.