Filtering posts during Import
-
Hi Maxell,
I’ve always found it valuable for developers to see how their software is used in real-world scenarios, so I wanted to share our use case.
We manage a homeowners association website composed of three main content types:
- Pages (~250)
- Posts (~1,400 announcements over the past 10 years)
- Rules and Regulations (6 PDF documents)
Our objective is to import these into the knowledge base for residents, with some filtering:
- Pages: Only those accessible to residents should be included (excluding Property Manager only content).
- Posts: Since announcements are often repetitive, we limit the import to the past year to reduce duplication.
- Rules and Regulations: These are handled separately via the PDF sitemap import.
For Pages and Posts, I’m using the “WordPress Content” importer. To restrict the imported posts, I implemented the mxchat_before_process_post hook with a custom snippet. See below:
add_filter(‘mxchat_before_process_post’, function($post, $bot_id) {
// For pages only: skip if "gestion" is in the URL if ($post->post_type === 'page') { $permalink = get_permalink($post->ID); if ($permalink && stripos($permalink, 'gestion') !== false) { return false; } } // For posts only: skip if older than 1 year if ($post->post_type === 'post') { if (strtotime($post->post_date) < strtotime('-1 year')) { return false; } } return $post;}, 10, 2);
This works partially: the content itself is excluded as intended, but entries are still created with null content, which results in empty indexed items.
I can work with that, but would it be possible to adjust the logic so that entries with null content are not indexed at all ?
Alternatively, a dedicated hook to explicitly exclude posts from indexing might be a cleaner approach.
I look forward to your thoughts on the best way to handle this.
Louis
The page I need help with: [log in to see the link]
You must be logged in to reply to this topic.