Wget Spider Crawl

  11:44 pm  Linux
Updated on

wget --spider -m --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains luclaverdure.com --no-parent luclaverdure.com

wget --spider -m http://luclaverdure.com 2>&1 | grep '^--' | awk '{ print $3 }' | grep -v '.(css|js|png|gif|jpg|JPG)$' > urls.txt

Reply
Share a link to this topic
close

23 November 2021 at 9:07 pm

To download only zip files from site:

wget -c -e robots=off -nd -nv -r -A .zip http://www.mysite.com

Share a link to this reply
close