For those who don’t know, my blog runs on WordPress – an open-source platform to create websites. It’s estimated that 43% of the Internet runs on WordPress. Every couple of years my blog needs to jump over some technical hurdle. In 2008, I chose to make the move from Serendipity over to WordPress due to the fact that WordPress enjoyed significantly more market share and support. I was in this for the long haul. In 2015, I migrated from a self-hosted WordPress instance to the managed WordPress.com cloud as I wanted to focus more on blogs unless on server administration and security.
Now, in 2023 I’m fully embracing Gutenberg – the modern way to structure content inside of WordPress introduced a few years ago. Rather than a large islands of html, Gutenberg organizes content into blocks that fit together on a page. In many ways, it’s like building content with Legos that snap together in known ways rather than a mess of Scrabble tiles all over the floor.
Within WordPress, there are two key components to consider – the theme and the content. The theme is what the blog looks like. Think of the design of the blog: menus, headers, letters, and overall layout. I’ve had several themes over the years. The content Is the actual text of my posts and photos.
In the old days, authors created content inside of WordPress using HTML. The content was mostly unstructured – but if it rendered in the browser, I was good. Originally, Gutenberg was only available for the content of pages and posts. Because the content was authored with Gutenberg and the site theme was using the legacy model – I always found weird interactions between the content and the theme. Again, I was struggling more with site management rather than content creation.
More recently, site owners could create Full Site Editing themes with Gutenberg – giving full creative freedom across the site. I recently updated to the Twenty Twenty Three Theme – a default theme that comes with WordPress that supports full site editing with Gutenberg. I had delayed migrating the 450 pieces of content over nearly 20 years – until now.
I wanted all of the content on my site to work the same way. That meant finding in migrating all of the older posts into the new format. I would need to:
- Find each post using the classic format
- Migrate each post to the Gutenberg block format
- Compare the new version of each post to the old version and look for deficiencies
- Correct and republish any post that needed changes
Obviously, I wanted to automate as much of this process as I could as manually validating each post proved tedious. I had tried in the past using the Selenium framework with limited success. I recently stumbled on an approach to make this process significantly easier migration of all of the content took less than a week.
Note: This approach requires self-hosted WordPress or a plan on WordPress.com which runs custom plugins from the marketplace. As of this writing, that includes Business, Commerce, Enterprise, or the legacy Pro plan.
1. Export all URLs
The first order of business is to export all of the URLs within my WordPress instance so I know which posts could possibly change. The plug-in Export All URLs makes this easy.
I exported all of my posts, pages, and the homepage into a text file with each URL on its own line.
urls.txt ----- https://dashedyellowline.com/2023/08/07/trainwreck-turned-trainwreck https://dashedyellowline.com/2023/08/03/entering-canada https://dashedyellowline.com/2023/09/04/parrotheads-motorcycles and so on....
2. Screenshot each piece of content
I wanted a way to validate large scale changes on my site easily. As mentioned above I tried a few methods from my QA past, but nothing seemed easy or reproducible in this context. I give huge kudos to Ryan over on SoothSawyer for his post how to take full page screenshots from a list of URLs. This was the missing piece of content that I had been searching for. Thank you, thank you, thank you, Ryan!
Running Ryan’s code against the list of URL’s above creates a screenshot for each piece of content inside of my entire WordPress installation. Today that’s about 450 pieces of content. Totall running time was about 30m to generate all the screenshots.
3. Update all pages and posts to Gutenberg
Again, I didn’t want to do this one manually so another WordPress plugin came to the rescue:
The Bulk Block Converter finds all of the posts and pages within a WordPress installation that use the classic editor and execute the same migration in a bulk format that users can do page by page through the UI. It saves time, but I can’t validate the conversion of each post as it runs. I will have to do that after the fact. Some larger or higher profile so it may not be able to take that level of configuration limbo – but for me it will work just fine.
4. Rerun step two
Now that all of the posts and pages have been converted, it’s time to re-screenshot every piece of content on the site. Since the names and URLs have not changed, the urls.txt file still works. However, create a new copy of Ryan’s scripts in a new directory. I can’t emphasize this enough. In step five, we will compare the images generated in step four with the images generated in step two.
5. Visually compare new to old
I was looking for an application that would allow me to see the new post wordpress along side the old post or page. Since the file names are the same, I was hoping to find a diff tool. CompareMerge2 fit the bill. I simply had to compare the old directory with the new directory.
Now, simply click on the first file to see the old blog post/page next to the new post/page and scan for visual differences. In the above case, the new blog is on the right. It’s missing the tiled gallery and the top two pictures are incorrectly sized. Alternatively, if they look the same (or close enough), click the next button. If they look different, load that post/page in the editor and make the necessary changes so that post/page renders correctly as in the example above.
If your monitor supports vertical mode, validating posts runs significanly faster.
In my migration, I found that most of the content migrated correctly. Tiled galleries disappeared in every blog post and had to be manually recreated. if I didn’t visually compare each post, I might not have caught that as early as I did in the process. Also, some images were mangled inside of headings. these were also easy to spot with the visual compare tools.
Tip: Modern blog posts embrace more features and “look cooler” than legacy content. I’m a better writer, photographer, and the WordPress platform has more features. I didn’t enrich any of the legacy content. That would have taken forever or been impossible. I simply made it look the same as it did when it was published.
It was also great to flip through all of the adventures over the past 20 years. Well, I smiled more fully when I didn’t have to do migration work, only about a quarter of the posts needed manual touch – most of which were creating tiled galleries.
Now that I am on WordPress.com cloud, I hope that I’m shielded from future migration efforts. however, I know that this is just a wish. It’s not if the next migration comes – it’s a matter of when 🙂.