This takes a Tumblr archive generated with tumblr-rb and turns it into a single XML file using Jekyll.
-
If you have not done so, install
tumblr-rb
with the commandgem install tumblr-rb
-
Clone this repo:
git clone https://github.com/thebestsophist/tumblrtoxml.git
-
Enter the directory and download your tumblr archive to
_posts
$ TUMBLRHOST=yourbloghere.tumblr.com tumblr backup _posts/
-
Update the yaml front matter in
index.xsl
with your blog details. -
Enter the
_posts
directory and rename the archived posts to something jekyll will like:$ find *.txt -exec bash -c 'mv "$0" "2013-10-05-${0%\.txt}.markdown"' {} \;
You can replace the date 2013-10-05
with any date you prefer, jekyll requires it, but it is not used in the generated XML file. I prefer to use the date of import.
- Mass replace
id:
withtumblrid:
[Need a good command for this that only matches the id:
in the yaml front matter, I currently just use a mass-repace app.]
-
Edit the YAML front matter in
archive.xml
to match your site. -
Return to your jekyll root and build the xml file:
$ jekyll build
Jekyll will still build individual pages for your posts, you can ignore those, the important file is _site/archive.xml
which will be a single XML.
- Since Tumblr is terrible at generating clean XHTML, post contents are shoved into a
<![CDATA[]>
. You may need to clean up the data a little more if you plan on using it to generate your sites. - Since XML is more strict with special characters, you may have to go through and clean your ampersands, smart quotes and such.