Skip to content

Conversation

@KingAkeem
Copy link
Member

Issue #102

Changes Proposed

  • Add functions for traversing links on webpage using Breadth First Search

Explanation of Changes

Two functions have been added, one which accepts the html of a webpage and an integer which represents the depth at which to stop. This function invokes the traversal function which searches the links using Breadth First Search algorithm.

@PSNAppz
Copy link
Member

PSNAppz commented Aug 8, 2018

@KingAkeem Awesome work 👏🏻.

@PSNAppz PSNAppz added this to the TorBot v1.3 milestone Aug 8, 2018
Copy link
Contributor

@agrepravin agrepravin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise


toVisit = list()
for link in links:
if targetLink == link and targetLink:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see value of and condition here. If targetLink == link it will always be targetLink, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure targetLink since Python is a dynamic language, it's impossible to tell ahead of time what items a list may contain. If a None were to somehow get inserted, I don't want it to return a false positive.

for link in links:
if targetLink == link and targetLink:
return depth
resp = requests.get(link)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if errors out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, didn't think about that. I'm going to just put a try-except block and just pass errors. If there are errors, then we can just assume the link isn't valid.

@PSNAppz PSNAppz merged commit 5ca2d21 into DedSecInside:dev Aug 10, 2018
@KingAkeem KingAkeem deleted the bfs_crawl branch August 10, 2018 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants