Skip to content

Latest commit

 

History

History
71 lines (40 loc) · 2.28 KB

README.md

File metadata and controls

71 lines (40 loc) · 2.28 KB

HTML-Parser

Table of Contents


Objective

  • The HTML-Parser tool was designed with an overall goal of allowing others to scrape HTML code for websites.

  • WARNING! This tool is meant to be used purely for education purposes ONLY and should not be used to scrape data from company websites who do not allow scraping of their websites or for malicious intent.

    • THIS TOOL IS NOT LIABLE FOR ANY IP BANS OR LEGAL DOCUMENTS RECEIVED FROM COMPANIES BY THE USER OF THIS TOOL.
  • The tool was created using Node.js and takes URLs that are placed in the urls.txt file and will go to those websites, grab the HTML code for that route, and create a new base text file containing the HTML code received in the response.

    • Additionally, the files created once the process has completed, will be named after the host of the domain pinged meaning that if you visit https://www.google.com, your text file will appear as google.com.
  • For the purposes of the point above, I have provided a few safe URLs that you can use in the urls.txt file to see how the tool works first hand. This is all open source so if you have ideas for making this better, let me know!


Dependencies

  • Node.js
  • Axios

User Flow

  • The dependencies above are all that the tool requires in order to be used successfully from your terminal.

    • NOTE: This tool is terminal based so you'll be working from the command line / terminal.
  • Once the dependencies neede are installed, the tool can be run using the command node urls.js urls.txt

  • If there are any URLs that are invalid, you will see the follwing message in your terminal.

TERMINAL:
Couldn't download http://www.asdfskadjf.com
  • It is important to note that invalid URLs will NOT cause the application to stop and will read over to the next line within the file.

  • Once the process has been completed, you will see new text based files appear in your directory which will be named the URLs hostname.


Technologies

  • Javascript
  • Node.js
  • Axios