In this guide:
- ParseHub desktop navigation
We will go through some basic, but powerful features with ParseHub to parse a single page.
Single page crawl
Let’s start with putting this link [https://www.target.com/c/furniture/-/N-5xtnr] into the address bar in the desktop app. Going forward, all actions are on the desktop app.
We want to grab all of the products on this single page:
Let’s start by clicking the button “New Project” on the top left. Proceed with clicking “Start Project with this URL”.
The left side bar will be persistent throughout the process and it’s how we’ll write the commands or steps we want the computer to do. We are greeted with:
- main_template: the default template name we start with
- Select page: the app will initially select the entire page, that will be our playground
- Empty selection(0): starting selection, we begin by selecting an element we want to grab or extract
For our goal of getting all of the products on the page, there are a few routes, but let’s start with this:
- Select the first product_name on the top right. You’ll notice that this image will turn green, that is the element we’ll extract. You’ll also notice that the other product_names will turn yellow, that denotes what ParseHub thinks you want or are similar.
- If you select the product_name to the right of the one you initially selected (highlighted in green), you’ll notice that some of the other product_names will be highlighted green and some won’t have highlights.
- Click on the ones that don’t have highlights until you see all of the product_names are green aside from the “trending items” at the bottom. You’ll likely just have to click that top right one with the two white chairs.
- Next, let’s select the product_image of each product. We do this by clicking the + button to the right of “Begin new entry in selection1”, and selecting “Relative Select”
- Starting with the top left product again. Click on the product_name, and then click on the product_image of the product. In this case, we’d select “Futon with arms — Room Essentials(tm)” and then the futon image.
- Do this until all of the product_images are selected with green highlights.
- Let’s repeat this Relative Select for the product_price
- Notice at the bottom right, the data will be populated
- Alright, we have the commands needed to grab the data we want. Let’s make sure we save. It’s under the hamburger options.
- Let’s run this project. With the same hamburger menu, select Run. Proceed with “Save and Run”.
- After the job is finishes, click on the CSV button to save the .csv file. Success! Your run_results.csv file should look like this.
If you’re having trouble, try saving this as a .phj file and importing it: https://gist.github.com/alex4hoang/14
- See if you can add additional commands to parse through 5 more pages
- See if you can utilize the “starting_value” feature to parse through 5 of more pages
- Leave a comment if there are other data points you can capture!
Disclaimer: I am not affiliated with ParseHub, Target or target.com, and this guide is for educational purposes only.