parse_tr¶
- pyrcs.parser.parse_tr(trs, ths, sep=' / ', as_dataframe=False)[source]¶
Parses a list of HTML
<tr>
elements and extracts data from a table.This function processes the rows from a table (
<tr>
tags) and assigns them to corresponding column headers (<th>
tags). It can return the data either as a list of lists or as a dataframe.See also [PT-1].
- Parameters:
trs (bs4.ResultSet | list) – The content of
<tr>
tags from a web page table.ths (list | bs4.element.Tag) – A list of column names (typically from
<th>
tags) for the table.sep (str | None) – The separator to replace any separators found in the raw data; defaults to
' / '
.as_dataframe (bool) – If
True
, returns the data as a Pandas DataFrame; defaults toFalse
.
- Returns:
A list of lists representing rows of the table, or a dataframe if
as_dataframe
isTrue
.- Return type:
pandas.DataFrame | list[list]
Examples:
>>> from pyrcs.parser import parse_tr >>> import requests >>> import bs4 >>> example_url = 'http://www.railwaycodes.org.uk/elrs/elra.shtm' >>> source = requests.get(example_url) >>> parsed_text = bs4.BeautifulSoup(source.content, 'html.parser') >>> ths_dat = [th.text for th in parsed_text.find_all('th')] >>> trs_dat = parsed_text.find_all(name='tr') >>> tables_list = parse_tr(trs=trs_dat, ths=ths_dat) # returns a list of lists >>> type(tables_list) list >>> len(tables_list) // 100 1 >>> tables_list[0] ['AAL', 'Ashendon and Aynho Line', '0.00 - 18.29', 'Ashendon Junction', 'Now NAJ3']