URLS to LINKS
In acontent management system I was working on at work we noticed there was a couple of problems with how the URLs entered by users broke our web site due to their length. We also had a request to automatically turn hand typed URLs into links, so here is what I did in C#
Added the RegEx namespace:
using System.Text.RegularExpressions;
Here is the method I added to our utility class:
public string URLsToHyperlinks(string sInput)<br />
{<br />
return Regex.Replace(sInput, @"(\bhttp://[^ ]+\b)", @"<a href=""$0"">$0</a>");<br />
}
Here is a function that does the same thing only in PHP
function urls_to_hyperlinks($text)<br />
{<br />
return preg_replace( "`((http)+(s)?:(//)|(www\.))((\w|\.|\-|_)+)(/)?(\S+)?`i", "<a href=\"http\\3://\\5\\6\\8\\9\" title=\"\\0\" target=\"_blank\">\\5\\6</a>", $text); <br />
}
C# Quickly remove un supported HTML tags from forms
While building a some forms for a web site, I realized that there were some users that were using HTML tags in forms that I wasn't happy about. I didn't mind some tags since they were hrmless and gives the users who know HTML some ability to alter presentation. So here is what I did in C# to clean up/remove HTM tags. I am sure there are many ways to improve it (so feel free to.)
// this method will remove *most* malicious code leaving allowed
// HTML intact
public static string stripHTMLTags(string input)
{
string output = "";
// break the comments so someone cannot add an open comment
input = input.Replace("<!--", "");
// strip out comments and doctype
Regex docType = new Regex("<!DOCTYPE[.]*>");
output = docType.Replace(input, "");
// add target="_blank" to hrefs and remove parts that are
// not supported
output = Regex.Replace(output, "(.*)", @"$5");
// strip out most known tags except (a|b|br|blockquote|em|h1|h2|
h3|h4|h5|h6|hr|i|li|ol|p|u|ul|strong|sub|sup)
Regex badTags = new Regex("< [/]{0,1}(abbr|acronym|address|applet
|area|base|basefont|bdo|big|body|button|caption|center|cite|code|col
|colgroup|dd|del|dir|div|dfn|dl|dt|embed|fieldset|font|form|frame
|frameset|head|html|iframe|img|input|ins|isindex|kbd|label|legend
|link|map|menu|meta|noframes|noscript|object|optgroup|option
|param|pre|q|s|samp|script|select|small|span|strike|style|table
|tbody|td|textarea|tfoot|th|thead|title|tr|tt|var|xmp){1}[.]*>");
return badTags.Replace(output, "");
}
Here were a couple of web sites that I used as a reference:
Regular Expression Reference - http://www.regular-expressions.info/reference.html
A somewhat comprehensive list of HTML tags - http://www.w3schools.com/tags/default.asp
