Code

Download EDGAR filings

Open the program in Google CoLab

This program downloads EDGAR filings. The user can indicate the years, companies, and filing types they want to download (users can also choose to download filings for all companies). The program creates an index of the requested filings, downloads them from the EDGAR service, compresses them using GZIP, and saves them to the users Google Drive account. The index listing all the retrieved filings, the names of their associated downloaded file on Google Drive, and their associated companies, filing dates, and form types is also saved to the users Google Drive account as a .csv file and can be useful in subsequent processing of the downloaded filings. Note that the filings are downloaded as .txt files which contain HTML - this is how they appear natively on the EDGAR service.


Parse HTML in EDGAR filings

Open the program in Google CoLab

This program parses downloaded raw EDGAR filings using Beautiful Soup to extract the text. Text is saved to JSON files for later analysis. The user can indicate the years, companies, and filing types they want to parse, although all the related filings must already be downloaded.

Download Companies House Filing PDFs

Open the program in Google CoLab

This program allows for bulk downloads of Companies House filing PDFs based on a list of Registration Numbers and year range. The PDFs can either be downloaded locally or added to the user's Google Drive.