gregstoll/ljtowordpress

Titles as permalinks

Closed this issue · 15 comments

asuh commented

I noticed that URLs are set by the linkId number, which is great for posts which don't contain titles, the ones which end up saying (no subject). However, it would be great if posts that do contain titles could be used as the permalinks for the URLs.

Sounds like a good idea, although you'd have to make sure the titles are unique.

This is done in the addPost() method - I think that's controlled by the wp:post_name element, if you want to take a shot at it :-)

asuh commented

Well, I'm taking a shot but I'm fairly certain this needs modification:

    if '(no subject)' not in post.find('title').text:
        ET.SubElement(item, 'wp:post_name').text = post.get('linkId')
    else:
        ET.SubElement(item, 'wp:post_name').text = consolidateJoinedSpaces(spacesToDashes(post.find('title').text))

def spacesToDashes(d):
    return d.replace(' ', '-')

I'm not sure the replace method to replace spaces with dashes will work, and I also think I might be missing functionality to fix duplicate titles. Let me know if I'm going in the right direction.

I'm also realizing that I need to exclude other symbols such as ' and " and most others on the shift section of the numbers. Guessing that regex would be the way to go but I'm not really familiar.

Yeah, I think something like this will work. Although I think the logic is backwards (we should use the linkId if post.find('title').text == '(no subject)'), and you'll have to keep track of the titles we're using so we don't duplicate them in a dictionary.

asuh commented

Yeah, I'm lost. I've also tried this:

    if post.find('title').text == '(no subject)':
        ET.SubElement(item, 'wp:post_name').text = post.get('linkId')
    else:
        ET.SubElement(item, 'wp:post_name').text = consolidateJoinedSpaces(spacesToDashes(post.find('title').text))

Now it's showing everything like this:

<wp:post_name>------title-goes-here ----</wp:post_name>

even the (no subject) comes out like this:

<wp:post_name>------(no-subject) ----</wp:post_name>

Not sure where those extra spaces/dashes are coming from or how to get rid of them.

My guess is that post.find('title').text is returning something with spaces at the beginning and end, like

"       title goes here     "

so that's why your comparison to '(no subject)' is failing, and why you get a bunch of dashes before and after the subjects.

Try something like this:

title = post.find('title').text.strip()

and then operating on the title variable instead of post.find('title').text

asuh commented

That worked, nice!

Now I need a way to strip out unneeded symbols and syntax, such as ', ,, ., +, etc. Those are all still left the in the text as is

Yeah, I don't know exactly what characters are allowed, but I bet you could use a regular expression to only keep, say, letters and numbers and dashes...

asuh commented

Yeah, that's what I'm attempting to figure out. I think something like this should work but I don't know how to properly fit it into everything.

re.match('[^\w]', string) since string is specific

I think you'll need to use something like re.sub - maybe substitute all non-word characters with empty strings?

asuh commented

Makes sense. How do I fit it into this?

re.sub('[^\w]', post.find('title').text).strip() returns an error

I would just split it into multiple lines for readability - so

postText = post.find('title').text.strip()

then do the re.sub on that.

(it looks like the problem in your code is a mismatched parenthesis, fyi)

asuh commented

Yes, I fixed the parentheses issue. I think what I'm missing is what to put into the "string".

re.sub(pattern, repl, string)

I have it as this:

wordChar = re.sub('[^\w]', '', string) but obviously string isn't correct

If that was correct, I'd theoretically apply it like this:

postTitle = wordChar(post.find('title').text).strip()

asuh commented

Think I finally figured this out except for one thing.

    postTitle = post.find('title').text.strip().lower()
    wordChar = re.sub('[^\s\w]', '', postTitle)
    if postTitle == '(no subject)':
        ET.SubElement(item, 'wp:post_name').text = post.get('linkId')
    else:
        ET.SubElement(item, 'wp:post_name').text = consolidateJoinedSpaces(spacesToDashes(wordChar))

The only thing I didn't figure out is for the plus symbol. Anytime there's a plus symbol it strips it out but there are always two dashes next to each other.

So, when I have yes + no, it would turn into yes--no. If you know what it could be, let me know. Otherwise, I'll submit another pull request for this.

Great, nice work!

I think the problem is in that last line - if those functions are named correctly, here's what it looks like will happen if we start with yes + no:
wordChar will be yes no
spacesToDashes will return yes--no
consolidateJoinedSpaces will return yes--no

I have some stylistic nitpicks (and I'd also like to put this behind an option), but I can address that when I see the pull request.

Thanks!

asuh commented

With that last pull request, this issue should be resolved.