parse_tr

pyrcs.utils.parse_tr(header, trs)

Parse a list of parsed HTML <tr> elements.

See also [PT-1].

Parameters
  • header (list) – list of column names of a requested table

  • trs (bs4.ResultSet) – contents under <tr> tags (bs4.Tag) of a web page

Returns

list of lists with each comprising a row of the requested table

Return type

list

Example:

>>> import bs4
>>> import requests
>>> from pyrcs.utils import fake_requests_headers, parse_tr

>>> example_url = 'http://www.railwaycodes.org.uk/elrs/elra.shtm'
>>> source = requests.get(example_url, headers=fake_requests_headers())

>>> parsed_text = bs4.BeautifulSoup(source.text, 'lxml')

>>> # noinspection PyUnresolvedReferences
>>> header_dat = [th.text for th in parsed_text.find_all('th')]

>>> trs_dat = parsed_text.find_all('tr')

>>> tables_list = parse_tr(header_dat, trs_dat)  # returns a list of lists

>>> type(tables_list)
list
>>> tables_list[-1]
['AYT', 'Aberystwyth Branch', '0.00 - 41.15', 'Pencader Junction', '']