| Author |
Topic Search Topic Options
|
y2cRAck4
Groupie
Joined: 21 August 2005
Location: United States
Status: Offline
Points: 116
|
Post Options
Thanks(0)
Quote Reply
Topic: Get a list of links from HTML Posted: 18 December 2005 at 7:22am |
Hi,
It would be great if someone will tell me; how do I get a list of links
from HTML Code.
Greetings.
|
|
|
 |
dpyers
Senior Member
Joined: 12 May 2003
Status: Offline
Points: 3937
|
Post Options
Thanks(0)
Quote Reply
Posted: 18 December 2005 at 1:26pm |
|
Do you want just the links or the anchor text too?
|
Lead me not into temptation... I know the short cut, follow me.
|
 |
dpyers
Senior Member
Joined: 12 May 2003
Status: Offline
Points: 3937
|
Post Options
Thanks(0)
Quote Reply
Posted: 18 December 2005 at 2:55pm |
Some code I had. You could change the strUseThisURL variable to get it's content from a query string.
<HTML> <HEAD> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <TITLE>GetLinks.asp</TITLE> </HEAD> <BODY bgcolor="#FFFFFF">
<%
Dim strStartPos Dim strEndPos Dim strLength Dim objXMLHTTP Dim strPageText Dim strSearchString Dim strEndStringChars Dim strUseThisURL Dim arrLinkArray Dim strArrayEntry Dim i
strUseThisURL = "http://www.microsoft.com/"
' Set up default strings strSearchString = "<a " strEndStringChars = "</a>"
' --------------------------- SUBROUTINES -----------------------------------
Sub subPullStrings(strArrayEntry, strEndStringChars) strStartPos = 0 strEndPos = 0 strLength = 0
'Find the Start Position of the Search String strStartPos = instr(strArrayEntry,strSearchString) 'Starting from the Search String , 'find the beginning postion of the End String of Chars strEndPos = instr(strStartPos + len(strSearchString), strArrayEntry, strEndStringChars) 'Compute the length of the string - including the end-string characters strLength = (strEndPos - strStartPos) + len(strEndStringChars)
On Error Resume Next 'In case anything is screwed up with the start/end strings 'Pluck the string out of the page text response.write vbCRLF & "<br/>" & i & ". " & mid(strArrayEntry,strStartPos,strLength)
End Sub
Sub subGetHTML() Set objXMLHTTP = Server.CreateObject("MSXML2.ServerXMLHTTP") objXMLHTTP.Open "GET", strUseThisURL, False objXMLHTTP.Send strPageText = objXMLHTTP.responseText Set objXMLHTTP = Nothing End Sub
' -----------------------MAINLINE CODE --------------------------------------
subGetHTML
arrLinkArray = split(strPageText, strSearchString) For i = 1 to Ubound(arrLinkArray) strArrayEntry = strSearchString & arrLinkArray(i) subPullStrings strArrayEntry, strEndStringChars Next
%> </BODY> </HTML>
|
|
Lead me not into temptation... I know the short cut, follow me.
|
 |
y2cRAck4
Groupie
Joined: 21 August 2005
Location: United States
Status: Offline
Points: 116
|
Post Options
Thanks(0)
Quote Reply
Posted: 18 December 2005 at 3:34pm |
Thank you Pyers!
Greetings! 
Edited by y2cRAck4 - 19 December 2005 at 7:39am
|
|
|
 |
dpyers
Senior Member
Joined: 12 May 2003
Status: Offline
Points: 3937
|
Post Options
Thanks(0)
Quote Reply
Posted: 18 December 2005 at 6:53pm |
There's a couple of hacky things going on in the array handling... - Normally, you start looping through an array from entry 0 but because of the way the array was made during a split, entry 0 just contains stuff from before the first "<a ". There's no link within its data.
- When you split a string, it creates array entries for the stuff before and after the string you split it by, but doesn't include the split characters themselves in either entry. Before passing the array entry to the sub that pulls the link out of it, I had to add the split characters to the beginning of the array entry.
EDIT: Note that images in links may be broken if they use relative addressing for the image.
Edited by dpyers - 18 December 2005 at 7:01pm
|
Lead me not into temptation... I know the short cut, follow me.
|
 |