Suggestion: 'pre_insert_post_syndicated_item' hook for duplicate detection and post preparation prior to insert.
ThatStevensGuy opened this issue · 1 comments
The feeds we syndicate contain many duplicate posts. The incoming posts also need to be prepared before they are added to the site.
I was using the 'post_syndicated_item' (sometimes 'update_syndicated_item') hook. However, because these hooks deal with posts that are already in the database (correct me if I am wrong), it was causing significant load on the server.
I wanted a hook that would let me make modifications and detect a duplicate, prior to being stored in the database. But I couldn't find a way with the existing code. I eventually settled on just adding my own.
I found that if I added a custom filter with a bit of following code to insert_post() before the 'add_terms' action (line 1742 syndicatedpost.class.php) that I could achieve this.
I had not suggested this until now, as I was unsure about my code after the filter. However the code has run flawlessly for years now. So I thought with the recent WordPress 5.5 updates, I would share something that might be useful to someone and maybe the filter can be considered as an addition to the plugin with some insight from @radgeek
// Line 1742 of syndicatedpost.class.php (insert_post() above transition_post_status 'add_terms')
$dbpost = apply_filters('pre_insert_post_syndicated_item', $this, $dbpost, $update);
// If the filter returns empty, break operation.
if (empty($dbpost)) {
return;
}
// We potentially modified the tax_input, so pass it back into the
// syndicated post object for the add_terms callback function here.
$this->post[ 'tax_input' ] = $dbpost[ 'tax_input' ];
Here is my use case for that hook. You can see my duplicate post detection methods here too if you are curious.
/**
* FeedWordPress - Remove Duplicates. Clean the syndicated post prior to inserting into the database. Performance enhancement.
*
* Note: This does require additional code to be added at line 1742 of syndicatedpost.class.php (insert_post() above transition_post_status 'add_terms')
*
* // --- MODIFICATION: Christopher Stevens
*
* $dbpost = apply_filters('pre_insert_post_syndicated_item', $this, $dbpost, $update);
*
* // If the filter returns empty, break operation.
* if (empty($dbpost)) {
* return;
* }
*
* // We potentially modified the tax_input, so pass it back into the
* // syndicated post object for the add_terms callback function here.
* $this->post[ 'tax_input' ] = $dbpost[ 'tax_input' ];
*
* // --- END MODIFICATION: Christopher Stevens
*
* @param object $syndicatedpost
* @param array $dbpost
* @param bool $update
* @return array
*/
add_filter('pre_insert_post_syndicated_item', function (object $syndicatedpost, array $dbpost, bool $update) {
// --- Check for duplicates.
if (!$update) {
global $wpdb;
$match_buid = $match_title = false;
// Does the post have a BUID?
//
// Note: This is a custom post meta field 'syndication_buid $(buid)'
// added in Custom Post Settings (to apply to each syndicated post).
if (!empty($dbpost[ 'meta' ][ 'syndication_buid' ][ 0 ])) {
$match_buid = true;
} else {
$match_title = true;
}
// Match BUID
if ($match_buid) {
$duplicate_post = $wpdb->get_row($wpdb->prepare("
SELECT p.ID as ID FROM $wpdb->postmeta pm
JOIN $wpdb->posts p ON p.ID = pm.post_id
WHERE p.post_status = 'publish' AND p.post_type = 'post'
AND pm.meta_key = 'syndication_buid' AND pm.meta_value = %s
LIMIT 1
", [
$dbpost[ 'meta' ][ 'syndication_buid' ][ 0 ],
]));
// Match Title as a backup, can be removed once BUID transition is complete.
if (empty($duplicate_post)) {
$match_title = true;
}
}
// Match Title
if ($match_title) {
$duplicate_post = $wpdb->get_row($wpdb->prepare("
SELECT ID FROM $wpdb->posts
WHERE post_status = 'publish' AND post_type = 'post' AND post_title = %s
AND ( post_date BETWEEN DATE_SUB( %s, INTERVAL 2 HOUR ) AND DATE_ADD( %s, INTERVAL 1 HOUR ) )
LIMIT 1
", [
$dbpost[ 'post_title' ],
$dbpost[ 'post_date' ],
$dbpost[ 'post_date' ]
]));
}
// Is it a duplicate post?
if (!empty($duplicate_post)) {
// Append custom_taxonomy taxonomy terms so the duplicate appears in those terms.
if (!empty($dbpost[ 'tax_input' ][ 'custom_taxonomy' ])) {
wp_set_object_terms($duplicate_post->ID, $dbpost[ 'tax_input' ][ 'custom_taxonomy' ], 'custom_taxonomy', true);
}
// Get out of here if its a duplicate post.
return [];
}
}
// --- Clean-up the post.
// Assign the default category when a post has no categories. No idea why FWP doesn't do this by default.
if (empty($dbpost[ 'tax_input' ][ 'category' ])) {
$dbpost[ 'tax_input' ][ 'category' ][] = intval(get_option('default_category'));
}
// Imported posts will include a 'read more' from the source site at the
// end of their excerpts. Strip that out.
$dbpost[ 'post_excerpt' ] = preg_replace("/(…|…) <a href=.+<\/a>/", '', $dbpost[ 'post_excerpt' ]);
$dbpost[ 'post_excerpt' ] = preg_replace("/\\[…\\]/", '', $dbpost[ 'post_excerpt' ]);
$dbpost[ 'post_excerpt' ] = trim($dbpost[ 'post_excerpt' ]);
// Imported posts are formatted for Bootstrap 3, convert some content to Bootstrap 2.
$dbpost[ 'post_content' ] = str_replace('<span class=\"glyphicon glyphicon-chevron-right\"></span>', '<i class=\"icon-chevron-right\"></i>', $dbpost[ 'post_content' ]);
$dbpost[ 'post_content' ] = str_replace('<span class=\"glyphicon glyphicon-save\"></span>', '<i class=\"icon-download-alt\"></i>', $dbpost[ 'post_content' ]);
// Convert old HTTP linkages to HTTPS.
$dbpost[ 'post_excerpt' ] = tsg_convert_http_to_https($dbpost[ 'post_excerpt' ]);
$dbpost[ 'post_content' ] = tsg_convert_http_to_https($dbpost[ 'post_content' ]);
return $dbpost;
}, 10, 3);
I think I found another hook (the actual one I should be using) will reopen if testing doesn't work.
The hook I probably should have been using is 'syndicated_post' and just return null for duplicates.