# I think I'm actually starting to get the hang of regex replacements. I know, I'm shocked!
I have been using a preg_replace function to convert hashtags into clickable search links for nearly a year now. I started off with an existing plugin but soon reworked it into a much smaller function in functions.php
.
I noticed, however, that it would not convert a hashtag at the start of a line. I tried numerous variants of what I already had but nothing worked so decided to trash it and start from scratch.
I needed to capture a string starting with a hash up to (but not including) certain characters like a space, period, quotes, and brackets.
If I had a hash written as its ascii value (#) it couldn't be affected so needed to ensure it didn't get replaced if preceded by an ampersand.
And I also wanted to only catch stand-alone hashtags so not replace them if there is text immediately before rather than a space.
Because a hash is the Markdown header character there is the possibility I might use it in a code block so couldn't replace it if immediately followed by a space rather than text. And then I also wanted to ensure that I could use it in other places within code so needed a few more exclusions.
Not much then!
After a bit of reading and some tests it finally clicked what I had to do: the pattern had to be a combination of negative lookahead and lookbehind assertions followed by a non-greedy capture (.*?) through to a range of specific characters.
I ended up with this:
'/((?"|\)|^\.<]+)/i'
(Oh the fun I’ve had getting that to display properly.)
The hash is surrounded by the negative lookbehind (?<!) and lookahead (?!) assertions ensuring that it was not caught when preceded or followed by certain characters.
We then have the non-greedy capture (.*?) which gets all the following characters until the first instance (hence non-greedy) of one of the terminating characters.
Phew!
I had a mad moment of panic thinking that the blog had been hacked or suffered a code injection as I noticed what should have been a link replaced with a seemingly random string of characters. I jumped into the post in wp-admin to see what was happening and realised that it was due to my new regex pattern for hashtag replacement. It is more aggressive than the previous pattern to ensure it captures things that one didn't but this means I have had to add more exclusions. One exclusion I forgot was a preceding forward slash meaning any link with a fragment, such as links to comments - url/#fragment, would have the fragment replaced thus breaking the link. Fixed! I'll probably find more instances of overly aggressive hash replacement as we go but, for now, it's panic over.
Sometimes I feel quite clever only for it to be demonstrated how I might have been overthinking things. Smokey recently produced a function to automatically convert @-mentions to clickable links for micro.blog users. His regex pattern is simple and works. He takes a different approach to my hashtag linking function in that his is applied pre save thus changing the actual post content, whereas mine only impacts the output when viewed. It's crazy how performing the same action at two different times (save vs display) requires such a difference in complexity of solution. Note: his pattern doesn't work at display time. I'm sticking to my method as I've only ever wanted the hashtags to be a feature when viewing the blog itself rather than when distributed via RSS, but it's been a good lesson.