Use s3stat To Troubleshoot Your Migration from Wordpress To S3

Introduction

Last month, I followed the example of Full Stack Python and migrated this blog from Wordpress to Amazon Web Services (AWS) Simple Storage Service (S3). The S3 hosting approach gives me the following features:

I configured Cloudwatch to dump the logs to a separate, dedicated log bucket. Cloudwatch dumps the logs in a raw format, so I need a separate Architecture to parse and analyze the logs.

Trade

In general, I need a service to ingest the logs, a service to parse/ transform the logs (i.e., create actionable key/value pairs), a service to store the key/value pairs and finally a Graphical User Interface (GUI) to view the logs.

Amazon does not provide a turnkey solution for this user story, so I faced two high level approaches:

NOTE: If you represent any of these companies and would like to update the bullets above, feel free to fork, edit and create a pull request for this blog post.

S3STAT

I decided to try s3stat because their cheap bastard plan amuses me. From their website:

How It Works

  1. Sign up for a Free Trial and try out the product (making sure you actually want to use it)

  2. Blog about S3STAT, explaining to the world how awesome the product is, and how generous we are being to give it to a deadbeat like yourself for free.

  3. Send us an email showing us where to find that blog post.

  4. Get Hooked Up with a free, unlimited license for S3STAT.

Test Drive

It took about thirty seconds to connect s3stat to my Cloudfront S3 logs bucket. s3stat provides both a wizard and web app to help you get started. Once I logged in, I saw widgets for Daily Traffic, Top Files, Total Traffic, Daily Average and Daily Unique. s3stat also provides the costs to your AWS account.

Splash Page

Troubleshooting

I clicked the other menu items and noticed that my new S3 hosted website threw a lot of error codes.

Error Codes

I noticed that people still clicked links from my old web page. I could tell because when I migrated from Wordpress to S3, I took the dates out of the URL. If a user bookmarked the Wordpress style link (which includes date), they would receive a 404 when they attempted to retrieve it. I highlighted the stale URLs in red below.

Error Pages

When I moved from Wordpress to S3, I submitted a URL map to their migration tool to migrate my comments to fit with my site's new URL approach.

I present a snippet of my URL map below. This map removes the date from the URL and sets the protocol to HTTPS.

http://freshlex.com/2017/03/13/pass-bootstrap-html-attributes-to-flask-wtforms/, https://www.freshlex.com/pass-bootstrap-html-attributes-to-flask-wtforms.html
http://freshlex.com/2017/04/06/add-timestamp-to-your-python-elasticsearch-dsl-model/, https://www.freshlex.com/add-timestamp-to-your-python-elasticsearch-dsl-model.html
http://freshlex.com/2017/04/29/connect_aws_lambda_to_elasticsearch/, https://www.freshlex.com/connect_aws_lambda_to_elasticsearch.html
http://freshlex.com/2017/05/27/install-rabbitmq-and-minimal-erlang-on-amazon-linux/, https://www.freshlex.com/install-rabbitmq-and-minimal-erlang-on-amazon-linux.html

I fix the dead link issue by uploading a copy of the current web page to a file location on S3 that matches the old Wordpress style. I wrote a script that performs this. I simply concatenate the contents of my URL map into the script, and the script creates the necessary directory structure to ensure the old Wordpress style links work (for those who bookmarked my old URL).

#!/bin/bash
cat url_map.csv | while read OLD NEW
do
  DIR=`echo $OLD | cut -f4-7 -d'/'`
  FILE=`echo $NEW | cut -f4 -d'/'`
  mkdir -p $DIR
  cd $DIR
  ln -s ../../../../$FILE index.html
  cd ../../../../
done

I upload the new files and directories to S3 and the old URLs now work. I want to encourage, however, users to use the new links, so I update robots.txt to exclude any of the old style URLs. Search engines, therefore, will ignore the old Wordpress style links.

User-agent: *
Disallow: /2016/
Disallow: /2017/
Sitemap: https://www.freshlex.com/freshlex_sitemap.xml

I use s3stat to sanity check the error pages and notice that one error returns a weird URL.

Error Pages

I attempt to click a stale link (that follows the Wordpress aproach) and of course get a hideous error.

Bad Link

After some investigation, I notice that I did not include this weird URL in my URL map. It turns out, a user that goes to my site with the new links will see an 'also on Freshlex' callbox from Disqus that points to the old URL.

Bad Link

Thanks to s3stat, I identified the root cause of the issue. I quickly go back to Disqus, and add the weird URL to the migration tool.

Bad Link

After the migration tool works its magic, the 'also on' box now points to the correct, new URL.

Bad Link

Conclusion

Thanks again to s3stat for providing an excellent product, as well as hooking me up with a free lifetime subscription thanks to the cheap bastard plan!

Show Comments