14/01/2018

The archive contains older posts which may no longer reflect my current views.

# I think I'm actually starting to get the hang of regex replacements. I know, I'm shocked!

I have been using a preg_replace function to convert hashtags into clickable search links for nearly a year now. I started off with an existing plugin but soon reworked it into a much smaller function in functions.php.

I noticed, however, that it would not convert a hashtag at the start of a line. I tried numerous variants of what I already had but nothing worked so decided to trash it and start from scratch.

I needed to capture a string starting with a hash up to (but not including) certain characters like a space, period, quotes, and brackets.

If I had a hash written as its ascii value (#) it couldn't be affected so needed to ensure it didn't get replaced if preceded by an ampersand.

And I also wanted to only catch stand-alone hashtags so not replace them if there is text immediately before rather than a space.

Because a hash is the Markdown header character there is the possibility I might use it in a code block so couldn't replace it if immediately followed by a space rather than text. And then I also wanted to ensure that I could use it in other places within code so needed a few more exclusions.

Not much then!

After a bit of reading and some tests it finally clicked what I had to do: the pattern had to be a combination of negative lookahead and lookbehind assertions followed by a non-greedy capture (.*?) through to a range of specific characters.

I ended up with this:

'/((?<!&|\)|\||[a-z])#
(?!\s|#|\*|\$|^[a-z]).*?)
([^\s|"|\)|^\.<]+)/i'

(Oh the fun I’ve had getting that to display properly.)

The hash is surrounded by the negative lookbehind (?<!) and lookahead (?!) assertions ensuring that it was not caught when preceded or followed by certain characters.

We then have the non-greedy capture (.*?) which gets all the following characters until the first instance (hence non-greedy) of one of the terminating characters.

Phew!

2 comments: click to read or leave your own Comments

# Something in the house is intermittently tripping the circuit breaker for the plug sockets so now we have the fun task of isolating everything in turn to find out what it is. Seeing as it can take hours to happen it's going to be a long day.

2 comments: click to read or leave your own Comments