A CLI to convert Medium posts to Jekyll format
This tool is a quick hack to satisfy my own needs. Absolutely no warranty.
Be sure to check the original posts and the converted Markdown files before you import into Jekyll.
- Node.js 12+
- UNIX Shell such as bash
npm install
First, download your Medium archive from medium.com, then extract the zip file.
./bin/run <path_to_archive>/posts/*.html
Markdown file will be saved under the same folder of the original HTML file.
After the conversion:
- Move the Markdown files to
_posts
of your Jekyll project - Move the
images
folder to the root of your Jekyll project
Your Jekyll project folder structure should look like this:
.
├── _posts
│ └── 2019-06-20-my-awesome-article.md
└── images
└── 2019-06-20-my-awesome-article
└── 1*XXXXXXXX.png
Images will be replaced with a reference to local path relative to
/images/<markdown_file_name>
. This tool will collect all the download URLs
for you. Please download them separately.
For example, assuming that this tool converted your HTML file to
2019-06-20-my-awesome-article.md
<img src="https://cdn-images-1.medium.com/max/2560/1*XXXXXXXX.png" />
will be replaced with
<img src="/images/2019-06-20-my-awesome-article/1*XXXXXXXX.png" />
By running this tool, in addition to 2019-06-20-my-awesome-article.md
, the
above image files will be downloaded to
./images/2019-06-20-my-awesome-article/1*XXXXXXXX.png
, under the same folder
as the input HTML file.
--image-dir=image-dir
[default: images] Image Directory prefix
--image-url-prefix=image-url-prefix
[default: /images] Image URL prefix
Detects the programming language for the contents of fenced code block.
Caveats: In some cases it may detect the wrong language. Be sure to review the Markdown result.
For example:
<pre><code>
const foo = { bar: () => 'baz };
</code></pre>
May be detected as JavaScript, and generate:
```js
const foo = { bar: () => 'baz };
```
Default language set is js,css,html,py,rb,java,sql,go
Language detection is done by
Highlight.js.
See ./bin/run --help
for full list of languages.
--[no-]detect-languages
Disable programming language detection
--languages=...
[default: js,css,html,py,rb,java,sql,go] Programming languages to detect in code block
This tool uses Turndown as the Markdown converter.
The following Turndown options are supported:
--md-code=fenced|indented
[default: fenced] Markdown codeBlockStyle
--md-em=_|*
[default: _] Markdown emDelimiter
--md-fence=```|~~~
[default: ```] Markdown fence
--md-hh=atx|settext
[default: atx] Markdown headingStyle.
--md-hr=* * *|---|___
[default: ---] Markdown hr
--md-link=inlined|referenced
[default: inlined] Markdown linkStyle
--md-ref=full|collapsed|shortcut
[default: full] Markdown linkReferenceStyle
--md-strong=**|__
[default: **] Markdown strongDelimiter
--md-ul=-|+|*
<figure>
will by default be converted as HTML tags, to avoid missing
information on the resulting Markdown file.
Unfortunately, Markdown does not have a built-in figure
syntax.
Aspect ratio won't be attained.
This tool provides the following modes:
no
= Convert to<figure>
alt
= Convert to image tag, embed<figcaption>
text intoalt
title
= Convert to image tag, embed<figcaption>
text intotitle
Examples:
For the following tag in Medium HTML:
<figure name="81a2" id="81a2" class="graf graf--figure graf-after--p">
<div class="aspectRatioPlaceholder is-locked" style="max-width: 700px; max-height: 721px;">
<div class="aspectRatioPlaceholder-fill" style="padding-bottom: 103%;"></div>
<img class="graf-image" data-image-id="1*xxxxxxxx.png" data-width="1582" data-height="1630" data-is-featured="true" src="https://cdn-images-1.medium.com/max/800/1*xxxxxxxx.png">
</div>
<figcaption class="imageCaption">The quick brown fox jumps over the lazy dog</figcaption>
</figure>
Setting to no
:
<figure>
<img alt="" src="/images/1*xxxxxxxx.png" title="" />
<figcaption>The quick brown fox jumps over the lazy dog</figcaption>
</figure>
Setting to alt
:
![The quick brown fox jumps over the lazy dog](/images/1*xxxxxxxx.png)
Setting to title
:
![](/images/1*xxxxxxxx.png The quick brown fox jumps over the lazy dog)
CLI option:
--md-figure=alt|title|no
[default: no] Markdown figureStyle. "no" = use <figure> tag
- Only Medium Posts will be converted.
- Your comments on posts will not be converted.
- The file name will contain the slug string from the original Canonical URL on Medium.
- But the random string of the slug will be removed
For example:
If the original URL is
https://medium.com/@yourname/my-awesome-article-deadbeef123
Then the file name of Markdown file will be:
<date>-my-awesome-article.md
Where date
is also extracted from the content of the HTML file.
On Medium, the "Big Title" is rendered as <h3>
, and "Small Title" is <h4>
.
This converter will escalate all header level by 1, so that "Big Title" becomes ##
(h2
) in Markdown.
This tool identifies drafts by looking for "Published At" metadata in the HTML file of the post. If that date is missing, then the post will be identified as "Draft", with the following features:
- File name will start with
draft-
, same as the original exported HTML file. - In the YAML front matter of Markdown, there will be a
published: false
flag.
No tags will be imported to YAML front matter in the Markdown.
MIT License. See LICENSE file.