parse_tr¶
-
pyrcs.utils.
parse_tr
(header, trs)[source]¶ Parse a list of parsed HTML <tr> elements.
See also [PT-1].
- Parameters
header (list) – list of column names of a requested table
trs (bs4.ResultSet) – contents under <tr> tags (bs4.Tag) of a web page
- Returns
list of lists with each comprising a row of the requested table
- Return type
list
Example:
>>> import bs4 >>> import requests >>> from pyrcs.utils import fake_requests_headers, parse_tr >>> source = requests.get('http://www.railwaycodes.org.uk/elrs/elra.shtm', ... headers=fake_requests_headers()) >>> parsed_text = bs4.BeautifulSoup(source.text, 'lxml') >>> header_ = [] >>> for th in parsed_text.find_all('th'): ... header_.append(th.text) >>> trs_dat = parsed_text.find_all('tr') >>> tables_list = parse_tr(header_, trs_dat) # returns a list of lists >>> type(tables_list) <class 'list'> >>> print(tables_list[-1]) ['AYT', 'Aberystwyth Branch', '0.00 - 41.15', 'Pencader Junction', ' ']