parse_tr
- pyrcs.parser.parse_tr(trs, ths, sep=' / ', as_dataframe=False)[source]
Parse a list of parsed HTML <tr> elements.
See also [PT-1].
- Parameters:
trs (bs4.ResultSet | list) – contents under
<tr>
tags of a web page.ths (list | bs4.element.Tag) – list of column names (usually under a
<th>
tag) of a requested table.sep (str | None) – separator that replaces the one in the raw data.
as_dataframe (bool) – whether to return the parsed data in tabular form
- Returns:
a list of lists that each comprises a row of the requested table
- Return type:
pandas.DataFrame | List[list]
Example:
>>> from pyrcs.parser import parse_tr >>> import requests >>> import bs4 >>> example_url = 'http://www.railwaycodes.org.uk/elrs/elra.shtm' >>> source = requests.get(example_url) >>> parsed_text = bs4.BeautifulSoup(markup=source.content, features='html.parser') >>> ths_dat = [th.text for th in parsed_text.find_all('th')] >>> trs_dat = parsed_text.find_all(name='tr') >>> tables_list = parse_tr(trs=trs_dat, ths=ths_dat) # returns a list of lists >>> type(tables_list) list >>> len(tables_list) // 100 1 >>> tables_list[0] ['AAL', 'Ashendon and Aynho Line', '0.00 - 18.29', 'Ashendon Junction', 'Now NAJ3']