Correct the error this way:
$query->set('s', get_text_clear_of_diacritics($query->get('s')));
Doing so does mean the search will no longer match content that does have diacritics.
ETA: Oh, “pre_get_posts” is an action, not a filter. The same process is used either way. It’s a semantic difference.
-
This reply was modified 3 years, 4 months ago by
bcworkz.
Thread Starter
aldiaa
(@aldiaa)
Thank you for your reply dear @bcworkz, I thought I can access class fields directly by name without using “get” similar to other languages. Also, I was looking at some WP tutorials and found out the difference between filters and actions.
However, my logic is not correct. What I am trying to do is to make the search insensitive in terms of diacritics. Users usually do not type diacritics when searching but some contents in my site contain these diacritics. So, I need to make MySQL considers “عَلِمَ”, “عِلْمٌ”, “علم” as one word.
I found a good article in this topic, but I am not sure how to access Index.xml file in phpMyAdmin to add the required custom collation. As per the official MySQL documentations, the configuration files should be located in character_sets_dir (/usr/share/percona-server/charsets/), but I do not know how to navigate to these directories.
Arabic Case Insensitive In Database Systems: How To Solve Alef With and Without Hamza Problem | by Ahmed Essam | Medium
MySQL :: MySQL 5.7 Reference Manual :: 10.14.3 Adding a Simple Collation to an 8-Bit Character Set
-
This reply was modified 3 years, 4 months ago by
aldiaa.
Thread Starter
aldiaa
(@aldiaa)
I am trying now to normalize both sides (MySQL and the search term written in the search textbox) but with no luck. This is my code in functions.php.
add_action('pre_get_posts','searchfilter');
function searchfilter($query) {
if ($query->is_search) {
get_custom_search_query($query);
}
return $query;
}
function get_custom_search_query($query){
$search_term = get_text_clear_of_diacritics($query->get('s'));
$sql = "SELECT * FROM _DDV_posts WHERE 1=1 ";
$sql .= "AND (((" . get_normalized_field(_DDV_posts.post_title);
$sql .= " LIKE '%" . $search_term . "%') ";
$sql .= "OR (" . get_normalized_field(_DDV_posts.post_content);
$sql .= " LIKE '%" . $search_term . "%'))) ";
$sql .= "AND _DDV_posts.post_type = 'post' ";
$sql .= "AND ((_DDV_posts.post_status = 'publish')) ";
$sql .= "ORDER BY _DDV_posts.post_date DESC ";
$query->get_results($sql);
//echo 'Last Query: ' . $query->request;
}
function get_text_clear_of_diacritics($original_text){
return preg_replace("~[\x{064B}-\x{065B}]~u", "", $original_text);
}
function get_normalized_field($column_name){
return 'REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(HEX(REPLACE('
. $column_name .
', "-", "")), "D98E", ""), "D98B", ""), "D98F", ""), "D98C",""),"D991",""),"D992",""),"D990",""),"D98D","")';
}
The result query is for some reason still showing diacritics.
SELECT SQL_CALC_FOUND_ROWS _DDV_posts.ID
FROM _DDV_posts
WHERE 1=1
AND (((_DDV_posts.post_title LIKE '%وقفاتٌ%')
OR (_DDV_posts.post_excerpt LIKE '%وقفاتٌ%')
OR (_DDV_posts.post_content LIKE '%وقفاتٌ%')))
AND ((_DDV_posts.post_type = 'post'
AND (_DDV_posts.post_status = 'publish'
OR _DDV_posts.post_status = 'private'))
OR (_DDV_posts.post_type = 'page'
AND (_DDV_posts.post_status = 'publish'
OR _DDV_posts.post_status = 'private'))
OR (_DDV_posts.post_type = 'attachment'
AND (_DDV_posts.post_status = 'publish'
OR _DDV_posts.post_status = 'private'))
OR (_DDV_posts.post_type = 'e-landing-page'
AND (_DDV_posts.post_status = 'publish'
OR _DDV_posts.post_status = 'private'))
OR (_DDV_posts.post_type = 'sr_playlist'
AND (_DDV_posts.post_status = 'publish'
OR _DDV_posts.post_status = 'private')))
ORDER BY _DDV_posts.post_title LIKE '%وقفاتٌ%' DESC, _DDV_posts.post_date DESC
LIMIT 0, 20
It seems that my custom query is not being executed at all, it is being overwritten by the default search query. Why?
Edit: if I try to use the global $wpdb, it is NULL.
-
This reply was modified 3 years, 4 months ago by
aldiaa.
-
This reply was modified 3 years, 4 months ago by
aldiaa.
I don’t know much about DB configuration. I’m not sure how much is safe to change. WP expects a certain configuration for certain aspects. Other aspects it doesn’t care about and you can do what ever you like. It may not be necessary to alter collation. You can use SQL wildcard characters to allow SQL to match any number of characters for one. There is more to SQL wildcards than the % we always see.
You could compose a search term similar to %ع[للْلِ]م%. I don’t know Arabic, I may have butchered that, but hopefully you get the concept anyway. It’ll take some effort to compose a data array of possible variations for each specific character, but it only needs to be done once. You then need code to replace each applicable character with a wildcard set of allowable substitutions.
I’m not sure why global $wpdb is null. It may be moot since you can manage the above all in “pre_get_posts”. Even if you need a custom SQL query, you still don’t need a valid $wpdb object. You can force WP_Query to utilize any SQL you want through the “posts_request” filter.
For the sake of more information, if all else failed, you could create your own wpdb class object, which would make a new connection to the DB for you to use. Or rely upon PHP mysqli*() functions to make DB queries.
Thread Starter
aldiaa
(@aldiaa)
Thank you @bcworkz, “posts_request” filter was the answer! I just had to fix some syntax issues in my SQL query (missing quotes) and use this magical filter 🙂
Note: I do not think using SQL wildcards would be a good idea. Each Arabic character has nine forms, so replacing each one by nine characters would result to a long ugly query. For example, “علم” is a short word but it will be converted to “[ععَعِعُعًعٍعٌعّعْ][للَلِلُلًلٍلٌلّلْ][ممَمِمُمًمٍمٌمّمْ]”. This might be easy to write dynamically through PHP, but I am not sure whether the performance would be affected.
Below is my complete code, just for reference.
add_filter("posts_request", "custom_search_filter", 10, 2);
function custom_search_filter($sql, $query) {
if ($query->is_main_query() && is_search()) {
$sql = get_custom_search_query($query);
}
return $sql;
}
function get_custom_search_query($query){
$search_term = get_text_clear_of_diacritics($query->get('s'));
$sql = "SELECT * FROM _DDV_posts WHERE 1=1 ";
$sql .= "AND (((" . get_normalized_field('_DDV_posts.post_title');
$sql .= " LIKE '%" . $search_term . "%') ";
$sql .= "OR (" . get_normalized_field('_DDV_posts.post_content');
$sql .= " LIKE '%" . $search_term . "%'))) ";
$sql .= "AND _DDV_posts.post_type = 'post' ";
$sql .= "AND ((_DDV_posts.post_status = 'publish')) ";
$sql .= "ORDER BY _DDV_posts.post_date DESC ";
return $sql;
}
function get_text_clear_of_diacritics($original_text){
return preg_replace("~[\x{064B}-\x{065B}]~u", "", $original_text);
}
function get_normalized_field($column_name){
return 'REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE('
. $column_name .
', "D98E", ""), "D98B", ""), "D98F", ""), "D98C",""),"D991",""),"D992",""),"D990",""),"D98D",""),"ّ","")';
}
-
This reply was modified 3 years, 4 months ago by
aldiaa.