parse_table

pyrcs.parser.parse_table(source, parser='html.parser', as_dataframe=False)[source]

Parses HTML <tr> elements to create a table from the given source.

This function extracts data from the <thead> and <tbody> elements of an HTML table and processes it into a list of lists (rows of the table) or a dataframe.

Parameters:
  • source (requests.Response) – The response object containing the HTML table from a requested URL.

  • parser (str) – The parser to use for processing the HTML; options are 'html.parser' (default), 'html5lib' or 'lxml'.

  • as_dataframe (bool) – If True, the parsed data is returned as a dataframe. If False, it returns a list of lists and column names; defaults to False.

Returns:

A tuple containing a list of column names and a list of lists representing rows of the table; if as_dataframe=True, returns a dataframe.

Return type:

tuple[list, list] | pandas.DataFrame | list

Examples:

>>> from pyrcs.parser import parse_table
>>> import requests
>>> source_dat = requests.get(url='http://www.railwaycodes.org.uk/elrs/elra.shtm')
>>> columns_dat, records_dat = parse_table(source_dat)
>>> columns_dat
['ELR', 'Line name', 'Mileages', 'Datum', 'Notes']
>>> type(records_dat)
list
>>> len(records_dat) // 100
1
>>> records_dat[0]
['AAL',
 'Ashendon and Aynho Line',
 '0.00 - 18.29',
 'Ashendon Junction',
 'Now NAJ3']