An import script and set of routes to help migrate from Wordpress with minimal hassle.
The import script follows redirects, builds a set of compatibility paths for each page, and downloads attachments. It expects everything to be on the same domain, and cross-domain redirects will confuse it.
- /wp-content/* -> /attachments/wp-content/*
- /feed/ -> /articles.xml
- /comments/feed/ -> http://#{short_name}.disqus.com/latest.rss
-
Make sure you have a new or existing Nesta site, with all new and modified files committed to Git. We'll call this 'your site' from now on. And we're assuming all commands specified here are executed from the root of your site. We're also expecting all files in UTF-8 form.
-
Edit your gemfile to use my branch of Nesta and the following 4 plugins. Strictly speaking, you can omit 'nesta-plugin-tags' if you don't want tag functionality.
gem 'nesta', :git => "git://github.com/nathanaeljones/nesta.git" gem 'nesta-plugin-aliases', :git => "git://github.com/nathanaeljones/nesta-plugin-aliases.git" gem 'nesta-plugin-simplicity', :git => "git://github.com/nathanaeljones/nesta-plugin-simplicity.git" gem 'nesta-plugin-wordpress', :git => "git://github.com/nathanaeljones/nesta-plugin-wordpress.git" gem 'nesta-plugin-tags', :git => "git://github.com/nathanaeljones/nesta-plugin-tags.git"
-
Run
bundle install
-
Make sure your Wordpress site is still running and available at its original location. It needs to be online for the redirect tracker and attachment downloader to work.
-
Go to Tools-> Export on your Wordpress site. Choose "All content", and save the exported file as "wordpress.xml" in the root of your new site.
-
In your Disqus account, make a site matching your current domain. Make sure the Website URL matches the primary website URL of your site.
-
In Disqus, go to Tools>Import and import the wordpress.xml file. Wait till it's finished.
-
In Disqus, go to Tools>Migrate Threads> Redirect Crawler. Start the redirect crawler; this should consolidate all your comments and associate them with the 'primary' version of each article. Remember: You'll need to do this a second time after you've pointed the domain to the new Nesta server.
- Page and Post content will be converted from HTML to XHTML. The conversion is done with Hpricot followed by Nokogiri. Hpricot has the best element closure detection, Nokogiri enforces proper entity encoding. This mimics browser 'autocorrection' very well, and shouldn't corrupt your content.
- The title is placed at the beginning of the content between <h1></h1> tags.
- Metadata pairs with null or empty values are discarded.
- Metadata pairs with keys starting in "_" are discarded. (With the exception of _wp_page_template, which becomes wp_template in the page header)
- User-defined fields are added to the end of the file as XML comments.
- Author IDs are converted to Author Display Names
- If available, the GMT post date is used for the Date: field instead of the time-zone specific value. The Date: field is omitted for 'Page' items.
- Password-protected pages/posts will be given
Status: protected
andFlags: hidden
, but the password will not be copied. - The following data is dropped: Sticky post flags, comment status, menu order, post parents, and ping status values.
Aliases: /image-resizer-installation/ /?page_id=91 /?p=91
Atom ID: http://66.29.219.39/?page_id=91
Post ID: 91
Author: Nathanael Jones
wp_date: 2010-11-11 19:44:48 -0500
wp_template: default
wp_status: private
Flags: hidden
Summary: This is the article summary
Categories: image-resizing, installation
Tags: image-resizing, image-resizer
- Public 'Posts' will be placed in /content/pages/blog/YYYY/post-title-sanitized.htmf
- Non-public 'Posts' will be placed in /drafts/post-title-sanitized.htmf and hidden with
Flags: hidden
- All 'Pages' with an assigned URL will be placed in a matching directory strucutre. I.e. /contact/donate will become /contact/donate.htmf
- If a Page doesn't have an assigned URL, it will be dumped in /drafts/
- If a Page is hidden, protected, etc, it will be hidden with
Flags: hidden
- index.htmf files are not generated for parent pages. Instead, you'll get 'name.htmf' and folder 'name' in the same directory.
- Attachments will go into /attachments/wp-content/... They will keep the same directory structure and file name, except for
/attachments/
being prepended. - Attachments will have a modified date matching their original Upload datestamp.
- Attachments will be downloaded from the live site, following any redirections issued by the current server.
Aliases for each post and page are generated by taking the following URLs, requesting them, then collecting a list of all URLs they are redirected through.
Wordpress does not always store the 'pretty url' in the database, but generates it dynamically. Often, the only way to get the 'official' url for a page is to execute the HTTP request.
- Wordpress Link (official URL for page or psot)
- Wordpress Guid (GUID Permalink)
- Wordpress Post ID (in the form /?p=[post id])
- Wordpress metadata "url" (user-defined "url" field)
The resulting set is made domain-relative and cleansed of duplicates.
This is how the Aliases: /?p=27 /1204_My_Article /2009/11/12/my-article
metadata tag is determined.
Double-check you have everything committed before starting, so you can see what has been changed. Files can get overwritten.
bundle exec wordpress-to-nesta -f -a
Use this to perform a full import. Downloads all attachments and checks for 301/302 redirects to determine what aliases need to be specified for each page
bundle exec wordpress-to-nesta
bundle exec wordpress-to-nesta -d none
bundle exec wordpress-to-nesta -d "Author","wp_status","Flags","wp_date","wp_template","Aliases","Atom ID","Post ID","Status","Categories","Tags","Summary","Date"
bundle exec wordpress-to-nesta -c wordpress-other-file.xml
Once the conversion script has finished, make sure you review the log for errors.
If you're using nesta-plugin-wordpress and nesta-plugin-aliases
- Manually update menu.txt
- Make sure disqus_short_name is set in config.yml
- Forget about redirecting comment RSS feeds from individual pages... or figure out how to calculate them. Disqus seems to use the normalized title of the page, which seems...fragile. Ex. http://shortname.disqus.com/normalized-title/latest.rss
If you're not using those plugins... you'll still need the custom version of nesta and nesta-plugin-simplicity, and you'll have to manually:
- Set up generic rewrites
- /wp-content/ to /attachments/wp-content/
- /feed/ to /articles.xml
- Add comment-specific redirects
- /comments/feed/ to http://intensedebate.com/allBlogCommentsRSS/37301 (ID of blog with intensedebate) or http://{disqus_short_name}.disqus.com/latest.rss
- /id/article/feed/ to http://intensedebate.com/postRSS/104986059 (ID of post with intensedebate) or ???
- URL redirect from each space-delimited value in 'Aliases' of each page metadata to the actual page path (or use nesta-plugin-aliases)
- Tags and categories from Wordpress are imported as-is. Maybe these need to be combined or something, haven't figured out how exactly I'm mapping them to Nesta's design.
- No views for /tag/{tag}/
- No views for /category/{category}/
- No views for /category/{category}/{child-category}/
As a long-time user of IntenseDebate, I originally wanted to port my IntenseDebate comments over directly. However, after using Disqus for a bit, I realized it was a much nicer service. And it imports directly from both IntenseDebate and Wordpress.
If you want to stay with IntenseDebate, you'll have to do some coding for it
-
Configure IntenseDebate to use the page.metadata["Post ID"] for the idcomments_post_id value (if present) (Requires view editing)
<script> var idcomments_acct = ‘YOUR ACCT ID’; var idcomments_post_id; //<- this is where you use "Post ID" var idcomments_post_url; </script> <script type=”text/javascript” src=”http://www.intensedebate.com/js/genericLinkWrapperV2.js”></script> -
Write code against the IntenseDebate API to fix your comment feeds
- /comments/feed/ to http://intensedebate.com/allBlogCommentsRSS/37301 (ID of blog with intensedebate)
- /id/article/feed/ to http://intensedebate.com/postRSS/104986059 (ID of post with intensedebate)