parse_tr¶
-
pyrcs.utils.
parse_tr
(header, trs)¶ Parse a list of parsed HTML <tr> elements.
See also [PT-1].
- Parameters
header (list) – list of column names of a requested table
trs (bs4.ResultSet) – contents under <tr> tags (bs4.Tag) of a web page
- Returns
list of lists with each comprising a row of the requested table
- Return type
list
Example:
>>> import bs4 >>> import requests >>> from pyrcs.utils import fake_requests_headers, parse_tr >>> example_url = 'http://www.railwaycodes.org.uk/elrs/elra.shtm' >>> source = requests.get(example_url, headers=fake_requests_headers()) >>> parsed_text = bs4.BeautifulSoup(source.text, 'lxml') >>> # noinspection PyUnresolvedReferences >>> header_dat = [th.text for th in parsed_text.find_all('th')] >>> trs_dat = parsed_text.find_all('tr') >>> tables_list = parse_tr(header_dat, trs_dat) # returns a list of lists >>> type(tables_list) list >>> tables_list[-1] ['AYT', 'Aberystwyth Branch', '0.00 - 41.15', 'Pencader Junction', '']