Skip to content Skip to sidebar Skip to footer

Parse HTML File Using Python Without External Module

I am trying to Parse a html file using Python without using any external module. The reason is I am triggering a jenkins job and running into some import issues with lxml and Beau

Solution 1:

For one element you could try to use re module or even string functions.

data = '''<tr class="test">
<td class="test">
<a href="no.html">track</a></td>
<td class="duration">0.390s</td>
<td class="zero number">0</td>
<td class="zero number">0</td>
<td class="zero number">0</td>
<td class="passRate">N/A</td></tr>

<tr class="suite">
<td colspan="2" class="totalLabel">Total</td>
<td class="passed number">271</td>
<td class="zero number">0</td>
<td class="failed number">3</td>
<td class="passRate suite">98%</td>
</tr>'''

# re module

import re

print(re.search('suite">(\d+)%', data).group(1))

# string functions

before = 'passRate suite">'
after  = '%'
start = data.find(before) + len(before)
stop  = data.find(after, start)

print(data[start:stop])

EDIT: to get othere values with re

import re

print('passed:', re.search('passed number">(\d+)', data).group(1))
print('zero:', re.search('zero number">(\d+)', data).group(1))
print('failed:', re.search('zero number">(\d+)', data).group(1))
print('Rate:', re.search('suite">(\d+)', data).group(1))

passed: 271
zero: 0
failed: 0
Rate: 98

Solution 2:

import re

f = open(HTML_FILE)
data = f.read()
before = '<td colspan="2" class="totalLabel">Total</td>'
after  = '%<'
start = data.find(before) + len(before)
stop  = data.find(after, start)

suite_filter = data[start:stop].strip()

RATE_PASS = re.search('suite">[ \n]+(\d+)', suite_filter).group(1)
PASS_COUNT = re.search('passed number">(\d+)', suite_filter).group(1)
SKIPPED_COUNT = re.search('zero number">(\d+)', suite_filter).group(1)

FAIL_COUNT = re.search('failed number">(\d+)', suite_filter).group(1)

TESTS_TOTAL = int(PASS_COUNT) + int(SKIPPED_COUNT) + int(FAIL_COUNT)

print RATE_PASS, PASS_COUNT, SKIPPED_COUNT, TESTS_TOTAL

Here is my solution as per the suggestions from @furas. Any improvements/suggestions are welcomed.


Post a Comment for "Parse HTML File Using Python Without External Module"