Simple PHP code problem...?

Cap'n Refsmmat · July 7, 2005

I've been fooling around with a bit of code lately. It's supposed to convert links like [[link]] in a text to a link like http://www.example.com/index.php?title=link or whatever. Like a Wikipedia link, really.

Here it is:

while($linkpos = strpos($article_text, "[[", $i) && ($linkpos != FALSE))
{
$linkendpos = strpos($article_text, "]]", $linkpos + 1);
$link = substr($article_text, $linkpos + 2, ($linkendpos - 1) - ($linkpos + 2));
$newlink = '<a href="' . $site_url . 'index.php?title="' . $link . '" class="interlink">' . $link . '</a>';
$article_text = substr($article_text, 0, $linkpos - 1) . substr($article_text, $linkpos - 1, ($linkendpos + 2) - ($linkpos - 1)) . substr($article_text, $linkendpos + 2, strlen($article_text) - ($linkendpos + 2));
$i = $linkendpos + 2;
}

Of course, it doesn't work. The text comes out unchanged.

At the moment, $article_text is the text that is being parsed. $site_url is the url of whatever the site is.

Anyways, you're probably thinking "What was he thinking when he wrote that code?" or something like that. Well, I'm not that great at PHP. I've never had to do regex or string manipulation before, really. I'm living off of the book until I can get enough experience. And I don't have that much, at the moment.

Any help would be appreciated. Thanks!

Dave · July 7, 2005

I'd say that, in this instance, regex is definately the way to go. No whiles or anything, and can be done with a one-liner:

preg_replace('/\[\[(.*?)\]\]/', '<a href="'.$site_url.'"index.php?title=\\1">Linkage</a>', $article_text);

Something in that order anyway. There's an excellent tutorial on regexps here.

Cap'n Refsmmat · July 7, 2005

Dang, you're good.

The book I have just doesn't give regex justice. I haven't had much need for it either.

I'll test that out in a minute.

Cap'n Refsmmat · July 7, 2005

No luck.

It still comes out unchanged.

I'm using the url [[blah]].

Aeternus · July 7, 2005

Just to detail what the original problem might have been, shouldnt the line -

$article_text = substr($article_text, 0, $linkpos - 1) . substr($article_text, $linkpos - 1, ($linkendpos + 2) - ($linkpos - 1)) . substr($article_text, $linkendpos + 2, strlen($article_text) - ($linkendpos + 2));

have been something along the lines of -

$article_text = substr($article_text, 0, $linkpos - 1) . $newlink . substr($article_text, $linkendpos + 2, strlen($article_text) - ($linkendpos + 2));

Thereby placing the new link in the place of the old [[Link]].

Agreed with Dave, regexps are much nicer and make alot of sense once you get to know them (to be honest his regexp there is easier to read than the while code made to do the same thing (not saying your code is nasty, just that the regexp is short and sweet)).

Cap'n Refsmmat · July 7, 2005

Indeed. But to no avail, it seems. Still doesn't work.

I'll read up on regex and see what I can do. Thanks for the help, people.

Cap'n Refsmmat · July 7, 2005

Never mind that, dave. I forgot to add $article_text = in the front. Although now the text it returns is completely blank... argh.

edit: anything comes out blank with your preg_replace in it, now that it actually stores the result to $article_text. Had to comment that out for the moment.

I have to give you people some credit; nobody has even replied on WebHostingTalk.

Dave · July 7, 2005

Crap, sorry - the PHP should read:

$article_text = preg_replace('/\[\[(.*?)\]\]/', '<a href="'.$site_url.'"index.php?title=\\1">Linkage</a>', $article_text);

It might be doing the regexp replace, but it wasn't assigning it to $article_text

Cap'n Refsmmat · July 7, 2005

Well then. That works (I had accidentally removed something from the original one as well, so it all came out blank) but it links right back to index.php? rather than index.php?title=blah.

Hmmm...

Cap'n Refsmmat · July 7, 2005

Aha! You had an extra quotation mark there.

$article_text = preg_replace('/\[\[(.*?)\]\]/', '<a href="'.$site_url.'index.php?title=\\1">Linkage</a>', $article_text);

works. (You had added an extra " after $site.url.')

Thanks!

Now, to work on un-parsing it for when you edit the page again...

Dave · July 7, 2005

Well, store the un-parsed version in the db, and then parse it every time you want it displayed. That's the simplest way.

Cap'n Refsmmat · July 7, 2005

But that's slower to display.

I'll fool around and see what I can do.

edit: sigh. I guess nothing beats the easy way...

Well, thanks a lot for the help, folks!

Dave · July 7, 2005

Well, yes. But preg_replace is stupidly fast, especially for something as simple as that regexp.

Cap'n Refsmmat · July 7, 2005

I'll be needing preg, too--bbcode and all that fun stuff.

Is ereg_replace just as fast? That's all this book covers.

(maybe I should buy one of those "Regular Expressions in a Nutshell" books)

edit: dang. Doesn't look like it is. I hate this book.

radiohead · July 7, 2005

I don't see why people always instist on buying books when ebooks are free.

http://www.radiohead.is-a-geek.org/etc/ebooks

I have PHP books in there if you do not like your book. Sorry I couldn't help you sooner, I just saw this post.

Cap'n Refsmmat · July 7, 2005

If you had a book on PCRE syntax I'd be fine. (PCRE = Perl Compatible Regular Expression syntax. preg stuff.)

Dave · July 7, 2005

Nice collection of eBooks there I may have to steal a few to add to my collection.

Cap'n Refsmmat · July 8, 2005

More woes from regexp land...

I have this code:

$article_text = eregi_replace( "[[:<:]]((http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,10}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\.\,_\?\'/\\\+&%\$#\=~;])*)[[:>:]]",  '<a href="\\1">\\1</a>', $article_text);

to match URLs and convert them to nice neat <a href=""></a> tags. Now I'm trying to add ] tags and proper [url tags. How would I go about making this not parse the url if the url is enclosed in tags?

Aeternus · July 8, 2005

Something that might be simpler is to use the regex you have already from dave to replace all the instances with some symbol or set of symbols that is unique and wont likely be found in the text as well as storing the results in an array (using something like preg_match or similar). Then do a simple regex looking for links and replace the links with the notation and then go back through the article replacing all the special symbols that you used with the original content stored in the array (either finding using strpos() and doing a bit of substr()ing or using preg_replace or something similar in a more complex way than normal).

A good regex tutorial can be found Here .

Cap'n Refsmmat · July 8, 2005

Yes, indeed. I already do that with [nobb] tags so the text inside isn't parsed like the rest of it. I can just modify that code to do it for URLs, if necessary. But I think it would be easier to just make the thing not parse a url if it's already in an <a href=""> or <img src=""/> tag. A quick addition to the regex, I think.

Aeternus · July 8, 2005

To be honest, I'm not sure it's that simple. Adding additional things to match is fine but negating things and matching others at the same time becomes quite a bit more complicated. I tried doing what you said using Negative Lookbehind and LookAhead but it doesnt seem easy to get it to work properly in all situations.

I just cant find anything to simply require that a string not be there for a match (as this is hard to do as if the string cant be there, it will just match everything after the string instead). There may be an easy but non-elegant way of doing it by negating specific characters and i'll try a few things out when i get home but I don't think it's as easy as it might seem.

Cap'n Refsmmat · July 8, 2005

$article_text = eregi_replace("\[url=([^\[]+)\]([^\[]+)\[/url\]","<a href=\"\\1\" target=\"_blank\">\\2</a>", $article_text);
$article_text = eregi_replace("\[url\]([^\[]+)\[/url\]","<a href=\"\\1\" target=\"_blank\">\\1</a>", $article_text);
$article_text = eregi_replace("\[img\]([^\[]+)\[/img\]","<img src=\"\\1\" border=\"0\" alt=\"user posted image\"/>", $article_text);

That seems to do it. Any potential problems you can see with it?

I love google... so easy to find free code.

(edit: it seems to have slowed my script down a bit. All this bbcode takes an effect. Let's see if it can be made more efficient...)

Cap'n Refsmmat · July 8, 2005

Would I be correct in saying that strtr() is faster than str_replace()? In my trials it seems so. In 1000 strtr or str_replace repetitions, strtr is faster by .01+ of a second every time.

Also: Having trouble making code to replace all < and > with < and > without doing it to <table>, <tr>, <td>, and <th> tags (don't want to have to do bbcode tables).

Aeternus · July 9, 2005

$a = preg_replace( '/<(?!\/{0,1}(?:table|tr|td|th)[a-zA-Z0-9\-\.,_\?\\\'\/\+&%$\#\=~;]*?>)/', '<', $a ); // Less thans

Seems to work for the less thans, the greater thans are a little bit more tricky (due to restrictions placed on the length variability of Negative LookBehinds). Bare in mind that theres probably a much simpler method than this (some of it is there to ensure stuff like it isnt just cheese<table etc which still needs to be matched) and Ive just got back from work and it's late. I'll take another crack at the greater than replacement in the morning (after i get back from work again around 2 oclock GMT) when ive had some sleep.

Aeternus · July 9, 2005

$a = preg_replace( '/<(?!\/{0,1}(?:table|tr|td)[a-zA-Z0-9\-\.,_\?\s\\\'\/\+&%$\#\=~;]*?>)/', '<', $a ); // Less thans

for the less thans sorry (forgot to add the space to the character class in there).

However doing the same with a Negative LookBehind for the greater thans is proving more of a problem as the LookBehind require the regex within to be a fixed length (ie same number of characters being matched) and so things such as the non-greedy character class repetition in the less than regex wont work. I'm trying to figure out a way to do it but i haven't got my hopes up.

A simpler way as i said before with the ignoring, is simply to replace the tags you need with a special symbol or string and store them and then go back after the replacement of the <'s and >'s and replace the symbols with the correct replacements but you said you wanted a single regex for it so...

Sign In

Simple PHP code problem...?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Important Information