Webscrape Vba With If Condition
I am trying to import the bullet point from a website into an excel table (each bulletpoint filling with a li tag). Yet I am facing an important difficulty as some page I would lik
Solution 1:
Summary
I would gather a nodeList via css selectors to match on the relevant nodes. I would have two separate nodeLists. One for the generalities and another for the parts. I would determine the number of parts (as they repeat) and loop to those number of parts; concatenating the html for the repeated part that comes later with the former. Then put that combined html into a surrogate HTMLDocument variable and make a new nodeList of all the li
elements contained. Use a helper function to return the text of the nodeList nodes in an array and then write that out to the sheet on a new combined text per row basis.
VBA:
Option Explicit
Public Sub WindInfo()
'VBE> Tools > References:
'1. Microsoft, XML v6
'2. Microsoft HTML Object Library
'3. Microsoft Scripting Runtime
Dim xhr As MSXML2.XMLHTTP60: Set xhr = New MSXML2.XMLHTTP60
Dim html As MSHTML.HTMLDocument: Set html = New MSHTML.HTMLDocument
Dim ws As Worksheet: Set ws = ThisWorkbook.Worksheets("Sheet1")
With xhr
.Open "GET", "https://www.thewindpower.net/windfarm_en_7410_khizi.php", False
.send
html.body.innerHTML = .responseText
End With
Dim generalities AsObject, arrGen(), partsList AsObject
Dim r As Long
Set generalities = html.querySelectorAll("#bloc_texte table ~ table li")
arrGen = GetNodesTextAsArray(generalities)
Dim parts AsObject, numberOfParts As Long
Set partsList = html.querySelectorAll("h1 ~ h3, ul ~ h3")
r = 1If partsList.Length > 0 Then
numberOfParts = html.querySelectorAll("h1 ~ h3, ul ~ h3").Length / 2
Set parts = html.querySelectorAll("h3 + ul")
Dim i As Long, liNodes AsObject, arr()
Dim html2 As MSHTML.HTMLDocument: Set html2 = New MSHTML.HTMLDocument
For i = 0 To numberOfParts - 1
ws.Cells(r, 1).Resize(1, UBound(arrGen)) = arrGen
html2.body.innerHTML = parts.Item(i).outerHTML & parts.Item(i + numberOfParts).outerHTML
Set liNodes = html2.querySelectorAll("li")
arr = GetNodesTextAsArray(liNodes)
ws.Cells(r, 5).Resize(1, UBound(arr)) = arr
r = r + 1
Next
Else
Dim alternateNodeList AsObject: Set alternateNodeList = html.querySelectorAll("#bloc_texte h1 + ul")
If alternateNodeList.Length >= 1 Then
arr = GetNodesTextAsArray(alternateNodeList.Item(1).getElementsByTagName("li"))
Else
arr = Array("No", "Data", vbNullString)
End If
ws.Cells(r, 1).Resize(1, UBound(arrGen)) = arrGen
ws.Cells(r, 5).Resize(1, UBound(arr)) = arr
End If
End Sub
PublicFunctionGetNodesTextAsArray(ByVal nodeList AsObject) AsVariant()
DimiAsLong, results()
IfnodeList.Length = 0 ThenGetNodesTextAsArray = Array("No", "Data", vbNullString)
ExitFunctionEndIfReDimresults(1 To nodeList.Length)
Fori = 0 TonodeList.Length - 1
results(i + 1) = nodeList.Item(i).innerTextNextiGetNodesTextAsArray = resultsEndFunction
References:
Post a Comment for "Webscrape Vba With If Condition"