Migrating the Blog Part - 3

This is part of a series of posts about how I completed the migration of my blog from Blogger to a self-hosted solution based on AWS S3.

  • Part 1 - Decide on where to host the new blog and which platform I would use
  • Part 2 - Export all the content out of Blogger and new blog design
  • Part 3 - Import all the content into the new blog (This post)
  • Part 4 - Fix up all the content issues
  • Part 5 - Redirect all old content to the new site

So if you remember from Part #2, the result of the export - was an XML file - with all my content in it.

The tool that I settled on was blog2md - after trying the first two tools that were provided by the Hugo community - this one was the tool that suited my needs the best.

Here are the steps I went through to install the package

  1. Install nodejs on my Mac (you can follow the instructions the platform of your choice
  2. Clone blog2md to my working directory
    git clone https://github.com/palaniraja/blog2md.git
  3. cd to the relevant directory
  4. Install all the required dependencies. npm install
  5. copy the xml export of my blog to the current directory For example
    mv ../04/blog-08-16-2019.xml .
  6. Run blog2md to convert the content
    node index.js b blog-08-16-2019.xml blog

What this command did was the following.

  1. Created a folder under the current working directory named blog.
  2. Went through the whole xml file, post by post, and converted the files to Mardown format and placed all the files in the blog folder.
  3. If the post had comments then it will create another Markdown file with the same name as the post - and append the -comments to it - an example you can see below.


Some Caveats

There are a few things that I would like to point out - that bit me in the butt and I had to work around.

  1. The conversion tools do not like draft blog posts - I did not have the time or the energy to dive into the code to find out why - so what I did to solve this was I removed the draft posts I had in Blogger - which solved the issue. If you had your whole life story and auto-biography saved as draft posts - it probably would not have been a good solution for you.
  2. I originally ran this process on an instance in the cloud - a really small instance - similar to a t2.nano and since this was a really big file, and the conversion process is highly CPU intensive - it kept on bombing out - and until I finally understood that there were not enough CPU resources on the instance performing the conversion (the devil is in the little things) - it was something that I battled with - for no real reason.
  3. Since I had moved the comments on my blog to Disqus a really long time ago, and had disabled the blogger comment system when I did so - I have no interest in the -comment files that were generated, so I deleted them.

I have completely disabled comments on my blog for the time being - since 99.99% of it was spam anyways.

Next up in Part #4, the part that by large took the most amount of my time, fixing up the mess of content I had accumulated over almost 12 years of blogging, A.K.A my technical debt.