I think I'm actually starting to get the hang of regex replacements. I know, I'm shocked!
I have been using a preg_replace function to convert hashtags into clickable search links for nearly a year now. I started off with an existing plugin but soon reworked it into a much smaller function in
I noticed, however, that it would not convert a hashtag at the start of a line. I tried numerous variants of what I already had but nothing worked so decided to trash it and start from scratch.
I needed to capture a string starting with a hash up to (but not including) certain characters like a space, period, quotes, and brackets.
If I had a hash written as its ascii value (#) it couldn't be affected so needed to ensure it didn't get replaced if preceded by an ampersand.
And I also wanted to only catch stand-alone hashtags so not replace them if there is text immediately before rather than a space.
Because a hash is the Markdown header character there is the possibility I might use it in a code block so couldn't replace it if immediately followed by a space rather than text. And then I also wanted to ensure that I could use it in other places within code so needed a few more exclusions.
Not much then!
After a bit of reading and some tests it finally clicked what I had to do: the pattern had to be a combination of negative lookahead and lookbehind assertions followed by a non-greedy capture (.*?) through to a range of specific characters.
I ended up with this:
'/((?<!&|\)|\||[a-z])# (?!\s|#|\*|\$|^[a-z]).*?) ([^\s|^"|^\)|^\.<]+)/i'
(Oh the fun I’ve had getting that to display properly.)
The hash is surrounded by the negative lookbehind (?<!) and lookahead (?!) assertions ensuring that it was not caught when preceded or followed by certain characters.
We then have the non-greedy capture (.*?) which gets all the following characters until the first instance (hence non-greedy) of one of the terminating characters.