I think I'm actually starting to get the hang of regex replacements. I know, I'm shocked!

I have been using a preg_replace function to convert hashtags into clickable search links for nearly a year now. I started off with an existing plugin but soon reworked it into a much smaller function in functions.php.

I noticed, however, that it would not convert a hashtag at the start of a line. I tried numerous variants of what I already had but nothing worked so decided to trash it and start from scratch.

I needed to capture a string starting with a hash up to (but not including) certain characters like a space, period, quotes, and brackets.

If I had a hash written as its ascii value (#) it couldn't be affected so needed to ensure it didn't get replaced if preceded by an ampersand.

And I also wanted to only catch stand-alone hashtags so not replace them if there is text immediately before rather than a space.

Because a hash is the Markdown header character there is the possibility I might use it in a code block so couldn't replace it if immediately followed by a space rather than text. And then I also wanted to ensure that I could use it in other places within code so needed a few more exclusions.

Not much then!

After a bit of reading and some tests it finally clicked what I had to do: the pattern had to be a combination of negative lookahead and lookbehind assertions followed by a non-greedy capture (.*?) through to a range of specific characters.

I ended up with this:

'/((?<!&|\)|\||[a-z])#
(?!\s|#|\*|\$|^[a-z]).*?)
([^\s|^"|^\)|^\.<]+)/i'

(Oh the fun I’ve had getting that to display properly.)

The hash is surrounded by the negative lookbehind (?<!) and lookahead (?!) assertions ensuring that it was not caught when preceded or followed by certain characters.

We then have the non-greedy capture (.*?) which gets all the following characters until the first instance (hence non-greedy) of one of the terminating characters.

Phew!