Skip to content

Collecting and storing data from websites as JSON files

License

Notifications You must be signed in to change notification settings

moodizone/gift-scrap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraping Project

This project is designed for collecting and storing data from websites as JSON files. The repository uses Python and various libraries to interact with web pages, extract data, and handle dynamic content such as JavaScript-rendered pages and lazy loading. It also supports downloading associated assets like images.


Features

  • Scrape product details such as title, description, price, images, and source.
  • Handle dynamic web pages with Playwright.
  • Download and store images locally.
  • Save scraped data as JSON files.
  • Configurable and easy-to-use folder structure.

Technologies Used

  • Python 3.9+
  • Requests: For making HTTP requests.
  • BeautifulSoup4: For parsing HTML and extracting data.
  • Playwright: For handling dynamic pages and JavaScript interactions.
  • JSON: For structured data storage.

Scripts

python -m scripts.{script name}

About

Collecting and storing data from websites as JSON files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages