Here is a simple ColdFusion function to extract URL and link Titles form a given string. This outputs an array of links, URLs and Titles.
1: <cfhttp 2: url = "http://www.google.com/search?hl=en&q=paris+hilton&aq=f&oq=" 3: userAgent = "#cgi.HTTP_USER_AGENT#"> 4: </cfhttp> 5: <cfdump var="#getlinks(CFHTTP.FileContent)#"> 6: <cffunction name="getLinks" access="public" returntype="array" output="yes" hint="seperate Links from given HTML string, output as a array"> 7: <cfargument name="html" hint="HTML String with links" required="yes"> 8: <cfset local.startpos = 1> 9: <cfset local.list = ArrayNew(1)> 10: 11: <cfloop condition="local.startpos GREATER THAN 0"> 12: <cfset local.linkpos = reFindNoCase('<a\b[^>]*>(.*?)</a>',arguments.html,local.startpos,'true')> 13: 14: <cfif val(local.linkpos.len[1])> 15: <cfset local.startpos = local.linkpos.len[1]+local.linkpos.pos[1]> 16: <cfset local.string = mid(arguments.html,local.linkpos.pos[1],local.linkpos.len[1])> 17: <cfset local.hrefpos = reFindNoCase('(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+##]*[\w\-\@?^=%&/~\+##])?',local.string,1,'true')> 18: <cfif val(local.hrefpos.pos[1])> 19: <cfset local.this.a = mid(local.string,local.hrefpos.pos[1],local.hrefpos.len[1])> 20: <cfset local.this.title = reReplacenocase(local.string,'<a\b[^>]*.>',"")> 21: <cfset local.this.title = reReplacenocase(local.this.title,'</a*>',"")> 22: <cfset ArrayAppend(local.list,local.this)> 23: <cfset StructDelete(local,'this')> 24: </cfif> 25: <cfelse> 26: <cfbreak> 27: </cfif> 28: </cfloop> 29: 30: <cfreturn local.list> 31: </cffunction>
Posted by Saman W Jayasekara at Saturday 05 December 2009 12:39 AM
.
ColdFusion