Scraping Websites with X-ray, AWS Lambda and Serverless

For my BrickCompare project, I decided to a microservice to do the task for scraping websites for the pricing data I needed. Amazon Web Services with their AWS Lambda service was the perfect service for the task.

Scraping Websites with X-ray

I had already decide to use the node.js platform to run my microservice as I was familiar with it. So then I had to select a module that could get me started on website scraping. Initially I selected noodlejs as it looked to be easily to use and had decent documentation. But after writing about 10 or so scrapers for different websites, I found that it was rather buggy and did not return consistent results.

Continue reading “Scraping Websites with X-ray, AWS Lambda and Serverless”

Using Ansible on Google Cloud Shell [How To]

Google Cloud Shell on Google Cloud Platform

On the Google Cloud Platform, Google provides Google Cloud Shell for free to easily manage your services. Whenever you launch the shell, on the backend, the platform creates a new VM instance to drive the shell. This shell instance will if there is more than 30 minutes of inactivity. Because of this, the persistence between sessions is limited. However there is 5GB worth of storage within you $HOME (~) directory which persist between sessions.

Using Ansible on Google Cloud Shell

The Google Cloud Shell provides a great place to to run your Ansible playbooks since you will be able to clone your git repositories into the persistent home directory. However if you try to install Ansible through the package manager (apt-get in this case), every time the session terminates, you lose your Ansible install.

Continue reading “Using Ansible on Google Cloud Shell [How To]”

Google Cloud Shell and Private GitHub Repositories [How To]

Google Cloud Shell on Google Cloud Platform

On the Google Cloud Platform, Google provides Google Cloud Shell for free to easily manage your services. Whenever you launch the shell, on the backend, the platform creates a new VM instance to drive the shell. This shell instance will if there is more than 30 minutes of inactivity. Because of this, the persistence between sessions is limited. However there is 5GB worth of storage within you $HOME (~) directory which persist between sessions.

Using Private GitHub Repositories

To use private GitHub repositories, first you have to generate a new SSH key (or use an existing one) and make sure to add it to GitHub.

Because of the limited persistence of Google Cloud Shell, adding the SSH key to the SSH agent will not persist between sessions. To get around this, create a SSH config file ~/.ssh/config and add the following lines to it.

Host github.com
    HostName github.com
    User git
    IdentityFile ~/.ssh/id_rsa

Replace ~/.ssh/id_rsa with your SSH key want to use with GitHub. Now whenever you use git to interact with your GitHub repositories, the SSH key is automatically used. And because the config is stored within your home directory, this will persist between all your Google Cloud Shell sessions.