get_page_catalogue
- pyrcs.parser.get_page_catalogue(url, head_tag_name='nav', head_tag_txt='Jump to: ', feature_tag_name='h3', verbose=False)
Get the catalogue of the main page of a data cluster.
- Parameters
url (str) – URL of the main page of a data cluster
head_tag_name (str) – tag name of the feature list at the top of the page, defaults to
'nav'
head_tag_txt (str) – text that is contained in the head_tag, defaults to
'Jump to: '
feature_tag_name (str) – tag name of the headings of each feature, defaults to
'h3'
verbose (bool or int) – whether to print relevant information in console, defaults to
False
- Returns
catalogue of the main page of a data cluster
- Return type
pandas.DataFrame
Example:
>>> from pyrcs.parser import get_page_catalogue >>> from pyhelpers.settings import pd_preferences >>> pd_preferences(max_columns=1) >>> elec_url = 'http://www.railwaycodes.org.uk/electrification/mast_prefix2.shtm' >>> elec_catalogue = get_page_catalogue(elec_url) >>> elec_catalogue Feature ... 0 Beamish Tramway ... 1 Birkenhead Tramway ... 2 Black Country Living Museum ... 3 Blackpool Tramway ... 4 Brighton and Rottingdean Seashore Electric Rai... ... .. ... ... 17 Seaton Tramway ... 18 Sheffield Supertram ... 19 Snaefell Mountain Railway ... 20 Summerlee, Museum of Scottish Industrial Life ... ... 21 Tyne & Wear Metro ... [22 rows x 3 columns] >>> elec_catalogue.columns.to_list() ['Feature', 'URL', 'Heading']