parse_tr

pyrcs.parser.parse_tr(trs, ths, sep=' / ', as_dataframe=False)[source]

Parses a list of HTML <tr> elements and extracts data from a table.

This function processes the rows from a table (<tr> tags) and assigns them to corresponding column headers (<th> tags). It can return the data either as a list of lists or as a dataframe.

See also [PT-1].

Parameters:
  • trs (bs4.ResultSet | list) – The content of <tr> tags from a web page table.

  • ths (list | bs4.element.Tag) – A list of column names (typically from <th> tags) for the table.

  • sep (str | None) – The separator to replace any separators found in the raw data; defaults to ' / '.

  • as_dataframe (bool) – If True, returns the data as a Pandas DataFrame; defaults to False.

Returns:

A list of lists representing rows of the table, or a dataframe if as_dataframe is True.

Return type:

pandas.DataFrame | list[list]

Examples:

>>> from pyrcs.parser import parse_tr
>>> import requests
>>> import bs4
>>> example_url = 'http://www.railwaycodes.org.uk/elrs/elra.shtm'
>>> source = requests.get(example_url)
>>> parsed_text = bs4.BeautifulSoup(source.content, 'html.parser')
>>> ths_dat = [th.text for th in parsed_text.find_all('th')]
>>> trs_dat = parsed_text.find_all(name='tr')
>>> tables_list = parse_tr(trs=trs_dat, ths=ths_dat)  # returns a list of lists
>>> type(tables_list)
list
>>> len(tables_list) // 100
1
>>> tables_list[0]
['AAL',
 'Ashendon and Aynho Line',
 '0.00 - 18.29',
 'Ashendon Junction',
 'Now NAJ3']