How To Select Div By Text Content Using Beautiful Soup?

April 21, 2024 Post a Comment

Trying to scrape some HTML from something like this. Sometimes the data I need is in div[0], sometimes div[1], etc. Imagine everyone takes 3-5 classes. One of them is always Biolog

Solution 1:

(1) To just get the biology grade only, it is almost one liner.

import bs4, re
soup = bs4.BeautifulSoup(html)
scores_string = soup.find_all(text=re.compile('Biology')) 
scores = [score_string.split()[-1] for score_string in scores_string]
print scores_string
print scores

The output looks like this:

[u'Biology A+', u'Biology B', u'Biology B', u'Biology B', u'Biology B']
[u'A+', u'B', u'B', u'B', u'B']

(2) You locate the tags and maybe for further tasks, you need to find the parent:

import bs4, re
soup = bs4.BeautifulSoup(html)
scores = soup.find_all(text=re.compile('Biology'))
divs = [score.parent for score in scores]
print divs

Output looks like this:

[<divclass="score">Biology A+</div>, 
<divclass="score">Biology B</div>, 
<divclass="score">Biology B</div>, 
<divclass="score">Biology B</div>, 
<divclass="score">Biology B</div>]

*In conclusion, you can use find_siblings/parent/...etc to move around the HTML tree.*

More information about how to navigate the tree. And Good luck with your work.

Solution 2:

Another way (using css selector) is:

divs = soup.select('div:contains("Biology")')

EDIT:

BeautifulSoup4 4.7.0+ (SoupSieve) is required

Solution 3:

You can extract them searching for any <div> element that has score as class attribute value, and use a regular expression to extract its biology score:

from bs4 import BeautifulSoup 
import sys
import re

soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')

for div in soup.find_all('div', attrs={'class': 'score'}):
    t = re.search(r'Biology\s+(\S+)', div.string)
    if t: print(t.group(1))

Run it like:

python3 script.py htmlfile

That yields:

A+
BBBB

Html5 Programming Language

How To Select Div By Text Content Using Beautiful Soup?

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "How To Select Div By Text Content Using Beautiful Soup?"