AUSCrawl

AUSCrawl is a web scraper and crawler that scrapes AUS Banner for data on every single course, instructor, level, and attribute for every semester in AUS since 2005 and saves it in an SQLite database to be queried.

Note: There is a WIP Python re-write of this project.

Why create this project?

I created this project as a way to practice using a headless browser to scrape mass data while also learning asynchronous code, using the Sequelize ORM and optimizing my code in general. Additionally, I think the dataset this project produces can allow many others to practice data science or build applications that make use of this data.

Prerequisites

To run this project, you will need NodeJS. I recommend using any version after v14.

How to get started

Download the repository: git clone https://github.com/DeadPackets/AUSCrawl
Enter the project and download required libraries: cd AUSCrawl && npm install
Now, simply run the project: node crawl.js
1. Additionally, if you want verbose output, run the following: VERBOSE=true node crawl.js

Libraries used in the project

Chalk is used for coloring the console output
Sequelize is the database ORM used to save the crawled data into SQLite
Puppeteer is the headless browser library used to browse and crawl the data from banner.

How does it work?

I am planning on writing a blog post soon.

Contribution

Sure! Simply fork the project, add your feature/fix and make a pull request. I will review them ASAP.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
crawl.js		crawl.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AUSCrawl

Why create this project?

Prerequisites

How to get started

Libraries used in the project

How does it work?

Contribution

About

Releases

Packages

Languages

License

DeadPackets/AUSCrawl

Folders and files

Latest commit

History

Repository files navigation

AUSCrawl

Why create this project?

Prerequisites

How to get started

Libraries used in the project

How does it work?

Contribution

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages