parse_tr

pyrcs.utils.parse_tr(header, trs)[source]

Parse a list of parsed HTML <tr> elements.

See also [PT-1].

Parameters
  • header (list) – list of column names of a requested table

  • trs (bs4.ResultSet) – contents under <tr> tags (bs4.Tag) of a web page

Returns

list of lists with each comprising a row of the requested table

Return type

list

Example:

>>> import bs4
>>> import requests
>>> from pyrcs.utils import fake_requests_headers, parse_tr

>>> source = requests.get('http://www.railwaycodes.org.uk/elrs/elra.shtm',
...                       headers=fake_requests_headers())
>>> parsed_text = bs4.BeautifulSoup(source.text, 'lxml')
>>> header_ = []
>>> for th in parsed_text.find_all('th'):
...     header_.append(th.text)
>>> trs_dat = parsed_text.find_all('tr')

>>> tables_list = parse_tr(header_, trs_dat)  # returns a list of lists
>>> type(tables_list)
<class 'list'>
>>> print(tables_list[-1])
['AYT', 'Aberystwyth Branch', '0.00 - 41.15', 'Pencader Junction', ' ']