get_page_catalogue

pyrcs.parser.get_page_catalogue(url, head_tag_name='nav', head_tag_txt='Jump to: ', feature_tag_name='h3', verbose=False)

Get the catalogue of the main page of a data cluster.

Parameters

url (str) – URL of the main page of a data cluster
head_tag_name (str) – tag name of the feature list at the top of the page, defaults to 'nav'
head_tag_txt (str) – text that is contained in the head_tag, defaults to 'Jump to: '
feature_tag_name (str) – tag name of the headings of each feature, defaults to 'h3'
verbose (bool or int) – whether to print relevant information in console, defaults to False

Returns

catalogue of the main page of a data cluster

Return type

pandas.DataFrame

Example:

>>> from pyrcs.parser import get_page_catalogue
>>> from pyhelpers.settings import pd_preferences

>>> pd_preferences(max_columns=1)

>>> elec_url = 'http://www.railwaycodes.org.uk/electrification/mast_prefix2.shtm'

>>> elec_catalogue = get_page_catalogue(elec_url)
>>> elec_catalogue
                                              Feature  ...
0                                     Beamish Tramway  ...
1                                  Birkenhead Tramway  ...
2                         Black Country Living Museum  ...
3                                   Blackpool Tramway  ...
4   Brighton and Rottingdean Seashore Electric Rai...  ...
..                                                ...  ...
17                                     Seaton Tramway  ...
18                                Sheffield Supertram  ...
19                          Snaefell Mountain Railway  ...
20  Summerlee, Museum of Scottish Industrial Life ...  ...
21                                  Tyne & Wear Metro  ...

[22 rows x 3 columns]

>>> elec_catalogue.columns.to_list()
['Feature', 'URL', 'Heading']