parse_tr
- pyrcs.parser.parse_tr(trs, ths, sep=' / ', as_dataframe=False)
Parse a list of parsed HTML <tr> elements.
See also [PT-1].
- Parameters
trs (bs4.ResultSet or list) – contents under
<tr>
tags of a web pageths (list or bs4.element.Tag) – list of column names (usually under a
<th>
tag) of a requested tablesep (str or None) – separator that replaces the one in the raw data
as_dataframe (bool) – whether to return the parsed data in tabular form
- Returns
a list of lists that each comprises a row of the requested table
- Return type
pandas.DataFrame or List[list]
Example:
>>> from pyrcs.parser import parse_tr >>> import requests >>> import bs4 >>> example_url = 'http://www.railwaycodes.org.uk/elrs/elra.shtm' >>> source = requests.get(example_url) >>> parsed_text = bs4.BeautifulSoup(markup=source.content, features='html.parser') >>> ths_dat = [th.text for th in parsed_text.find_all('th')] >>> trs_dat = parsed_text.find_all(name='tr') >>> tables_list = parse_tr(trs=trs_dat, ths=ths_dat) # returns a list of lists >>> type(tables_list) list >>> len(tables_list) // 100 1 >>> tables_list[0] ['AAL', 'Ashendon and Aynho Line', '0.00 - 18.29', 'Ashendon Junction', 'Now NAJ3']