Wednesday, 3 October 2018

Create a data-set using Google images

Create a data-set using Google images. For the moment, this guide is for Ubuntu 16.04 only, although it might inspire for extending it to other systems/versions.

Pre-requisites

Find out which version of ChromeDriver is the latest. Go to official ChromeDriver page to find out. Let's assume, the latest version is 2.42 (which is true on the moment of writing, as of 2018-09-29).
Install `unzip` if you don't have it yet
sudo apt-get install unzip
Install `xvfb` for running `Chrome` in a headless mode
sudo apt-get install xvfb
Download and install latest `ChromeDriver`
wget -N http://chromedriver.storage.googleapis.com/2.42/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
chmod +x chromedriver
sudo mv -f chromedriver /usr/local/share/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver
Install `selenium` and `pyvirtualdisplay` libraries
pip install pyvirtualdisplay selenium
Install `google-images-download`
pip install google-images-download

Download images

Usage examples.
Note
Note, that path to the ChromeDriver is given using --chromedriver directive.
Let's download 500 images for keyword pikachu and another 500 images for keyword charmander:
googleimagesdownload -k pikachu -l 500 -o datasets/ -e --chromedriver /usr/bin/chromedriver
googleimagesdownload -k charmander -l 500 -o datasets/ -e --chromedriver /usr/bin/chromedriver
Produced directory structure:
├── datasets/charmander/1. 250px-004charmander.png
├── datasets/charmander/2. charmander-pokemon-onesie-pak-kostuum2.jpg
├── datasets/charmander/3. charmander_single_front_e55283a2-abd7-4d21-a38c-7ef7b7e053d8.png
├── datasets/charmander/4. ba8562e15a96fb034776becd3b25d895.png
├── datasets/charmander/5. 713l97x5d9l._sx466_.jpg
├── datasets/charmander/6. main-qimg-86d0a95219674e71acd12b8a40426021.jpg
├── datasets/charmander/7. latest?cb=20160809154953.jpg
├── datasets/charmander/8. latest?cb=20140627122905.jpg
├── datasets/charmander/9. draw-charmander-step-22.jpg
├── datasets/charmander/10. 61bto1ihovl._sx425_.jpg
├── datasets/charmander/11. pokemon-charmander-face-collector-print-1.23.jpg
├── datasets/charmander/12. e70ed0b54f9230c56c2e3bd2958d68a4.jpg
Note
Note, that along with the image files saved locally, an image metadata in JSON format would be saved in the logs directory (relative to the current path).
Example metadata:

[
    {
        "image_description": "Charmander (Pok\u00c3\u00a9mon) - Bulbapedia, the community-driven Pok\u00c3\u00a9mon ...",
        "image_filename": "1. 250px-004charmander.png",
        "image_format": "png",
        "image_height": 250,
        "image_host": "bulbapedia.bulbagarden.net",
        "image_link": "https://cdn.bulbagarden.net/upload/thumb/7/73/004Charmander.png/250px-004Charmander.png",
        "image_source": "https://bulbapedia.bulbagarden.net/wiki/Charmander_(Pok%C3%A9mon)",
        "image_thumbnail_url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQNAbOgVn5VTxFl1XVCSwuFb1EwV9GCx-Q4RIOA_MZrN4aZn8jUzg",
        "image_width": 250
    },
    {
        "image_description": "Charmander Pok\u00c3\u00a9mon onesie pak kostuum kopen? Bij FeestinjeBeest.nl!",
        "image_filename": "2. charmander-pokemon-onesie-pak-kostuum2.jpg",
        "image_format": "jpg",
        "image_height": 1148,
        "image_host": "feestinjebeest.nl",
        "image_link": "https://feestinjebeest.nl/wp-content/uploads/Charmander-Pokemon-onesie-pak-kostuum2.jpg",
        "image_source": "https://feestinjebeest.nl/product/charmander-pokemon-onesie-pak-kostuum/",
        "image_thumbnail_url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSjs98SgDd9ugBmFKbzCWXBKljQYw3lBubISYMl3cpb6Bz0ltOZfQ",
        "image_width": 1079
    }
]

No comments:

Post a Comment