Migrating the Blog Part - 3
This is part of a series of posts about how I completed the migration of my blog from Blogger to a self-hosted solution based on AWS S3.
- Part 1 - Decide on where to host the new blog and which platform I would use
- Part 2 - Export all the content out of Blogger and new blog design
- Part 3 - Import all the content into the new blog (This post)
- Part 4 - Fix up all the content issues
- Part 5 - Redirect all old content to the new site
So if you remember from Part #2, the result of the export - was an XML file - with all my content in it.
The tool that I settled on was blog2md - after trying the first two tools that were provided by the Hugo community - this one was the tool that suited my needs the best.
Here are the steps I went through to install the package
- Install nodejs on my Mac (you can follow the instructions the platform of your choice
- Clone blog2md to my working directory
git clone https://github.com/palaniraja/blog2md.git
- cd to the relevant directory
- Install all the required dependencies.
npm install
- copy the xml export of my blog to the current directory
For example
mv ../04/blog-08-16-2019.xml .
- Run blog2md to convert the content
node index.js b blog-08-16-2019.xml blog
What this command did was the following.
- Created a folder under the current working directory named
blog
. - Went through the whole xml file, post by post, and converted the files to Mardown format and placed all the files in the
blog
folder. - If the post had comments then it will create another Markdown file with the same name as the post - and append the
-comments
to it - an example you can see below.
Some Caveats
There are a few things that I would like to point out - that bit me in the butt and I had to work around.
- The conversion tools do not like draft blog posts - I did not have the time or the energy to dive into the code to find out why - so what I did to solve this was I removed the draft posts I had in Blogger - which solved the issue. If you had your whole life story and auto-biography saved as draft posts - it probably would not have been a good solution for you.
- I originally ran this process on an instance in the cloud - a really small instance - similar to a t2.nano and since this was a really big file, and the conversion process is highly CPU intensive - it kept on bombing out - and until I finally understood that there were not enough CPU resources on the instance performing the conversion (the devil is in the little things) - it was something that I battled with - for no real reason.
- Since I had moved the comments on my blog to Disqus a really long time ago, and had disabled the blogger comment system when I did so - I have no interest in the
-comment
files that were generated, so I deleted them.
I have completely disabled comments on my blog for the time being - since 99.99% of it was spam anyways.
Next up in Part #4, the part that by large took the most amount of my time, fixing up the mess of content I had accumulated over almost 12 years of blogging, A.K.A my technical debt.