qTranslate-Team/qtranslate-x

design of function qtranxf_excludeUntranslatedPostComments

johnclause opened this issue · 38 comments

This is a continuation of WP discussion https://wordpress.org/support/topic/the-latest-beta-from-github-is-good-for-testing-thank-you on the topic initiated by side777.

side777: I get a WordPress database error Unknown column 'wp_posts.post_content' in 'where clause' for query SELECT * FROM wp_comments [...] but only on some sites of my multisite network. still investigating.

side777: Unfortunately i could not reproduce the error with a clean 2015 - BUT get_comments() still returns an empty array! The 2015 recent comment widget DOES return the comments, but ALL, also for untranslated posts, though.

johnclause: I guess in some cases this query gets executed without wp_posts table included. What are those cases? Do we abort this function in such cases? Or do we force a reference to wp_posts in somehow?

side777: I think this is not a good approach anyway - in worst case you query 1000s of comments WITH the corresponding posts, PLUS filtering the post content - I wrote my own solution a while ago... I think it's way more effective to save the current language to a comment meta when the comment is saved. this way you can easily query comments by meta.

johnclause: Sure that would be much better. Unfortunately, qTranslate is not designed for big sites with many languages and huge content to begin with. It does have severe performance problems in many places, not just this one. My priority is to make it work in a minimal, but bug free way, so that people can continue to use it. Most of qTranslate* clients, I believe, are small bi-lingual (not multi-) sites and they are ok with this version of free translation framework. I would worry about the performance issues later, when time comes. For now we need to figure out how to fix this query issue as soon as possible, so that I could release 3.0. Then I will be happy to come back and redesign this piece as well as other places in a much more efficient way. And I believe that those missing if statements, which we need to figure out now, will be useful to know when we do optimization.

On second thought, if you really have a different solution, which is tested and ready to go, submit it with pull request, and we will go from there?

What if we add these two lines at the beginning of the function:

if( !isset($clauses['join']) ) return $clauses;
if( strpos($clauses['join'], $wpdb->posts ) === FALSE) return $clauses;

Did your error messages disappear? Is that good enough for you? If you need to still filter them, then try this:

if( !isset($clauses['join']) || empty($clauses['join']) ){
  $clauses['join'] = "JOIN $wpdb->posts ON $wpdb->posts.ID = $wpdb->comments.comment_post_ID";
}elseif( strpos($clauses['join'], $wpdb->posts) === FALSE ){
  //do not break some more complex JOIN
  return $clauses;
}

Please, also tell me if option "Hide Untranslated Content" makes sense to you at all? I personally do not use it, since I think people should decide themselves whether they want to read untranslated to their language post. They might choose to translate it themselves, but when it is hidden, then they do not know that it existed.

Option "Hide Untranslated Content" perfectly makes sense. I run a couple of sites where i have some posts that are just not relevant for the other languages. plus: it's a seo issue - 'duplicate content'...

regarding the 'people should decide themselves' thing: i run a different approach - people who are logged in can choose which languages they understand in the profile settings and see posts of all languages - the main lang (the one they choose in the browser / the url) always first.

the "Hide Untranslated Content" option is especially effective for 'latest posts' widgets. if your last 3 posts are untranslated you could easily drive people away from your site showing foreign headlines in suggested posts, etc.

Ok, point taken. Other people seem to also want it. What do we do about the code?

i see a big drawback in the current approach of filtering for post_content: it just doesn't tell you anything about the actual language of a certain comment if the corresponding post is translated or not. i have multi-lingual posts, people tend to comment in the language they are reading. so the filter hides comments for untranslated posts but still shows comments in a different language.

so it makes much more sense to add a lang meta to every comment when the comment is saved. the language would be the current user language. very easy. i'll provide code asap.

We want to hide the comments for the untranslated posts, not the untranslated comments. We do not care about the language of comment, it can be anything, we just want consistency, if post is hidden, then all comments to it also hidden regardless to their language.

If we also want to filter comments by specific language, then it would be a new and different option.

Could you answer the question, if the code I cited makes your problem go away? Did you figure out under which condition a query without JOIN to posts table happens?

I would much rather to do optimization later, because I plan to re-design the whole backend to be sane, as it is insanely inefficient now. I do not wish to do it in pieces, just for comments now.

Please, answer the question if the code above makes your site work. Do you need filtering for the case when the problem happens. Would be also good if you tell me finally when and how it happens.

i used the second snippet and it works.

what i found out until now: the get_comments(); function by default doesn't JOIN posts. if you query comment meta it adds INNER JOIN wp_commentmeta and only if you query post related stuff (like post_status, post_type, etc.) it adds JOIN wp_posts.

so a save join string would be
"JOIN $wpdb->posts ON $wpdb->posts.ID = $wpdb->comments.comment_post_ID INNER JOIN $wpdb->commentmeta ON ( $wpdb->comments.comment_ID = $wpdb->commentmeta.comment_id )"

you could test for both cases and only skip if the join string is more complex...

checking the core for 'WP_Comment_Query' (https://core.trac.wordpress.org/browser/tags/4.1/src/wp-includes/comment.php) led me to another major flaw in the current filter. as you can see in line 430 a CACHED version of the query is returned - ignoring any altered clauses! therefore dynamically altering the clauses leads to wrong returns, i.e. wrong languages displayed. especially if users run object caches. i tested this in my 2015 setup. i had to clear the cache to see the correct comments.

there are 2 safe possible solutions:

  1. altering the query vars ('pre_get_comments')
  2. filter the 'comments_array'

if it's even worse, the same problem applies to the posts! still checking...

it's true. post queries get also cached. so the 'hide untranslated' option is badly broken because it shows the wrong languages...

Why don't it get cached correctly at first time?

I thought this cache is for one session only, different users do not affect each other, do they?

You had to clear cache, because you were doing testing and changed the query in between the calls on the same session, is that correct? This should not happen at live site, will it?

From the page http://codex.wordpress.org/Class_Reference/WP_Object_Cache:

By default, the object cache is non-persistent. This means that data stored in the cache resides in memory only and only for the duration of the request. Cached data will not be stored persistently across page loads unless you install a persistent caching plugin.

Does it happen in your theme, that this query changes language during the same request? Which theme do you use, btw? If this is what you really need, then yes, we can set language in query_vars['lang'] via do_action_ref_array( 'pre_get_comments', array( &$this ) );. The value query_vars['lang'] will not be used by query, but it will change the cache key.

read the source: it generates a md5 cache_key from the DEFAULT $args. any altering of the clauses happens later and is ignored by the function because the cache key is the same not matter what you do with the clauses later. so it returns the new request from the cache assuming that queryvars not changed == output the same.

with a caching plugin the cache is persistent. that means the next user gets YOUR output. wp generates cache for every different set of $args.

so this clauses altering breaks queries with ANY theme. try it with 2015! (i did.)
you are not going to tell the people they can't use caching plugins to hide untranslated posts, right? ;)

i wrote an alternative approach for posts filtering today. i set up an action to preg match the languages in use from the title of the current post in the loop and set a post meta IF the meta is not already there. (for backwards comp)
and i set up an action in save_post to update the meta not matter what.

then i alter the query with 'pre_get_posts'.

md5 cache_key from the DEFAULT $args

I understand, but default arguments uniquely defines what is done later, except us adding a fork by language. If we add query_vars['lang'], the cache key will become different enough for the purpose.

with a caching plugin

Which one do you mean? Anyway, what I described with query_vars['lang'] should solve the problem even with persistent cache.

I am very glad you that you have better ideas and wish to implement them. I will be waiting for your pull request here.

Meanwhile I will put in query_vars['lang'] to be done for the moment, so I can release 3.0 now.

BTW, you never let me know which theme do you use? I am still curious.

2.9.8.9 has your changes. Persistent cache did not work, because they cut all extra keys from query_vars. Could you, please, test if it is ok for you without persistent cache? We will work on persistent cache later.

so at least have a look at my working solution.. you get in lots of trouble if you break caching. people tend to not read any warnings and complain in the support forums later. believe me, i know exactly what i'm talking about...

if you insist on using a hacky workaround for your 3.0 that you could use any query var from the default list (in the core code, linked above) - i guess a meta value would do the trick. as mentioned before, this would be very hacky for a major release...

i have never worked with github - as soon as i find out how pulling a request works and where to do it, i'll send you my solution.

about my theme: i use a selfmade framework with child themes for the sites under my direct control. other sites i admin use several themes...
for testing i use a clean twenty fifteen setup.

any query var from the default list

I did look into it right after the discovery, but there is no one we can change safely as far as I can tell.

Which caching plugin do you use? I can see a lot of problems with persistent cache, which such plugin must deal with. What if one user submits new comment? It is not going to show up for anyone until cache is invalidated? It can get extremely complex. I would say that if caching plugin does not work in this case then it is a problem of caching plugin. Let them figure out a solution. Many other plugins can alter query result like we do, and I am sure they do not think about persistent cache either. Caching plugin must have its own ways around.

Did you try the latest version? Does it work for you except persistent caching? Press Download button at plugin page: https://github.com/qTranslate-Team/qtranslate-x, which runs this link https://github.com/qTranslate-Team/qtranslate-x/archive/master.zip

GitHub has become one of the most popular standard for doing shared development. You will have to get engaged with them in any case for many different reasons, if you are in this line of business. Fortunately, they have pretty good documentation. Shortly: you fork the repository, clone it to your local computer, modify the code, push modifications to your fork, then from your fork it is two clicks to submit pull request here. You will find documentation on each step.

Please, try the latest version before anything else ;)

any query var from the default list

add this to a meta query:
array( 'key' => 'qtranxf_language_' . qtranxf_getLanguage(), 'compare' => 'NOT EXISTS' );

i use w3tc, which is very common. afaik most caching plugins make the wp cache persitent, btw.

What if one user submits new comment? It is not going to show up for anyone until cache is invalidated?

exactly. that's why the core invalidates the cache on any new comment. read the code.

it is a problem of caching plugin.

definitely not. i'd say it's a flaw in the core because it doesn't check for altered clauses. the caching plugin has no chance at all to handle this because it simply builds up upon the core cache function.

Many other plugins can alter query result like we do

wrong - i guess this approach is pretty specific: dynamically alter the SQL for not logged in users and for the exact same URL? i can't think of one plugin that alters the SQL to output different content at the same URL.

I am sorry, @side777, about the delay in response. I got busy now with other things. Could you meanwhile explain the logic behind the following:

add this to a meta query:
array( 'key' => 'qtranxf_language_' . qtranxf_getLanguage(), 'compare' => 'NOT EXISTS' );

^^ this makes the query individual for every language and different cache data is saved.
in a better future version you would save the languages used in a post in that meta key, compare 'LIKE' and skip the dirty manipulation of the query string, i.e. increase the performance of the query, PLUS avoid the cache problem.

I understand that it makes key unique and helps caching, I am curious how does this affects the final query and why it is safe to implement. I can look it up, but I thought you would explained it to me since you already did the investigation.

it adds the meta key to the query and returns all posts where the key doesn't exist. for now it doesn't exist for all posts, so it's safe to implement. it has no effect on the result of the query, it only makes the cache language specific.

can you help me and point me in the right direction? there must be a function already in the plugin to check the posts for the languages in use. (they are listed in the backend, in the posts table...)
i would use this funtion to write the languages to the meta keys. from there it's probably just 5 lines of code and i'll make a pull request...

Hi @side777, is this issue still important to you or it went away for some reason?

point me in the right direction

You may search for 'pre_get_comments' on the latest version from GitHub and you will see commented function 'qtranxf_add_query_language'. Uncomment it and this is where you can test your ideas.

Although I do not understand why do you need to know which languages are in use? Query, which filters the comments is where you found it before qtranxf_excludeUntranslatedPostComments and 'qtranxf_excludeUntranslatedPosts'. We only need to make query cache key depend on language without breaking query. Is that correct?

is this issue still important to you or it went away for some reason?

currently i am using a patch. i remove the qtranxf filter and use my own. so it's not important to me but possibly to many other users with caching plugin active.

ironically not using a caching plugin makes the current qtranxf approach of filtering the post by altering the clauses very slow. i mean you are scanning the whole post contents of ALL posts just to determine the used languages on EVERY load of an archive or latest posts or latest comments widget!

Although I do not understand why do you need to know which languages are in use?

because i want to save the used languages in a post meta on save. this way i can query for that meta value which:

  1. is MUCH faster
  2. is the way wp queries are supposed to be done
  3. makes the query compatible with other themes or plugins that alter the clauses
  4. makes the cache key unique

besides this we are then only 2 lines of code away from filtering comments by language used in the actual comment, which perfectly makes sense on a multi language blog.

because i want to save the used languages

This is a good idea, and a bit different topic. Yes, you mentioned this before too. There currently is no place in the code to catch post saving, it goes through WP whatever outside -X, but there will be a hook a bit later, because we plan to implement "Single Language Editor Mode" to provide compatibility with TinyMCE "breakers", see #27.

Do you think having a separate table for that purpose would be better than using already overloaded meta table? There is a general performance problem anyway, which needs to be also solved, and probably we will end up with a different table(s) to keep translated data. Learn how to use GitHub, and maybe you can help to design this thing?

by altering the clauses very slow
This has to be on an option, so that people would turn it on only when cache is in use?

i was looking for qtranxf_getAvailableLanguages. i would write the results to a meta value on post save and replace current clauses manipulation with a cleaner meta query.

i would do the same thing with comments - but this would be a new feature. it's very easy and basically the same as with the post filtering. it just makes much more sense to filter the comments too. i added this to my blogs years ago. your call!

having a separate table for that purpose

definitely not for this purpose solely. it's a simple array with (in most cases) only 2 values. what do you mean, the meta table is overloaded?

different table(s) to keep translated data

you mean all data like the post_content, etc? that would be dangerous because something could change the data in the original location without qtranx noticing it. plus: you would save everything twice, bloating the database...

No, not twice, I was thinking about the split translations, saving separately in different tables for faster retrieval, but you are right, it can get very complex and dangerous.

comments - but this would be a new feature

Comments are currently not language aware. Yes, this is also a good idea, but then we would have to ask comment authors to specify which language they use, and it may not be one of the enabled. This is the whole new framework ...

saving sep8arately in different tables for faster retrieval

i don't see how opening a different table would be any faster. the only thing that would make sense would be seperate posts for every language. (this would solve the comments filtering as a side effect. and translated slugs.)

but then we would have to ask comment authors to specify which language they use

not really. in my blogs i just add a language selection under the comment form which defaults to the current language. in 99.9% of the cases this is the language the user is going to use in their comments. because they answer directly to the post or reply to a comment (and they only see comments in their language).

in my experience the first comment sets the rules. if comments are not filtered and the first few comments are in one language it's much less likely that comments in the other language (of a bilingual blog) are made after that. people don't understand and don't want to join a discussion they have no idea about. so language filtering helps a LOT, even if it's not 100% accurate.

i would not even add a select to the comment form in qtrans-x, or make it optional. as i said, in most cases the language fits the users browser setting.

the only thing that would make sense would be separate posts for every language

Then we lose the beauty and point of -X and become another WPML ;)

opening a different table would be any faster

No, that alone would not, but it may be less hurtful than parsing 40 languages to get one from each field.

When we are ready to actually work on it, we will need to do profiling.

Yes, it all makes sense. I will be waiting for your pull requests if you decide to contribute.