Webcrawl and scrapping assignment implementation using C# and HtmlAgilityPack
This project contains the source code for scrapping the five nepali news portal. The nepali news portals are:
- https://nayapatrikadaily.com/
- https://www.kantipurdaily.com/
- https://www.onlinekhabar.com/
- https://www.setopati.com/
- https://www.nepalaaja.com
All the extracted article will be saved to the database:
And further it will calculate the term frequency of the word from the scrapped article.
Also, I have scanned the links from the hamropatro.com, for identifying as the hub or authority.
External Libary:
- Hangfire (for scheduling extracting in background)
- DateConversion (converting Bikramsabat to A.D)
- EntityFramework (saving to sqlserver)
- HtmlAgilityPack (HTML parser written in C# to read/write DOM)