Extracting Links using Xpath
Tagged Under : XML, Xpath
Extracting links from a piece of HTML code is a very common task and any programmer would have come across this requirement at some point. I have always used regular expressions to achieve this and it has always worked for me, no complaints there. However, I was just curious to find some other way to do it.
Here is what I did.
used CFHTTP to get the HTML code.
Put it all in a CF XML Object
Got all links using Xpath
Put everything inside a CF query.
And it works! I was delighted to see the results. However, the only condition is that the HTML should be valid HTML or XHTML I must say. Well, nothing special I know but atleast I found out which people dont have valid HTML on their sites! ha!