Which website do you want to download?
But keep in mind. The larger the site, the larger the download. We do not recommend downloading huge sites like Qiita. This is because it takes thousands of MB to store all the media files you use.
The best sites to download are those with a lot of text and few images, and those that don't regularly add or change new pages. Ideally, you should have a static information site, an online ebook site, or a site you want to archive in case it goes down.
It's easy to save individual web pages for offline reading, but what if you want to download the entire website? Well, it's easier than you think! Here are some useful tools you can use to download a website for reading offline.
Wget is a command line utility that allows you to retrieve all kinds of files via HTTP and FTP protocols. Websites are served over HTTP, and most web media files can be accessed via HTTP or FTP, making Wget a great tool for ripping websites.
Wget
Available for Windows, Mac and Linux.
Wget is typically used to download a single file, but it can be used to recursively download all pages and files found from the first page.
wget -r -p https://www.joeyoder.com
However, for some sites, ripping a website can consume a lot of bandwidth and may detect and prevent you from doing so. To avoid this, you can use the user agent string to impersonate a web browser.
wget -r -p -U Mozilla https://www.joeyoder.com
If you want to be polite, you need to limit the download speed (so that it doesn't occupy your web server bandwidth) and pause between downloads (not too demanding and overloading your web server). To do so).
wget -r -p -U Mozilla --wait = 10 --limit-rate = 35K https://www.joeyoder.com
Wget is bundled with most Unix-based systems. On Mac, you can install Wget using one Homebrew command: brew install wget (how to set up Homebrew on Mac). On Windows, you should use this ported version instead.
Other WebCopy
Only available on Windows.
Cyotek's WebCopy retrieves website URLs and scans links, pages and media. When it finds a page, it recursively looks for more links, pages, and media until it finds the entire website. You can then use the configuration options to determine which parts to download offline.
The interesting thing about WebCopy is that you can set up multiple "projects", each with its own settings and configurations. This makes it easy to re-download many different sites at any time.
You can copy many websites in one project, so use them in a tidy plan (eg, a "tech" project for copying tech sites). How to download the entire website with WebCopy
Install and launch the app. Go to File> New and create a new project. Enter the URL in the Website field. Change the Save Folder field to the location where you want to save the site. Play around with Project> Rules (see details on WebCopy rules). Go to File> Save As and save your project. Click Copy Website on the toolbar to start the process.
Once the copy is complete, you can use the Results tab to check the status of individual pages and media files. The Error tab shows the problem you encountered, and the Skip tab shows the files that were not downloaded.
However, the most important is the sitemap, which shows the complete directory structure of the website detected by WebCopy.
To view the website offline, open File Explorer and navigate to the save folder you specified. Open index.html (or index.htm in some cases) in your browser of choice to start browsing.
HTTrack
Get a web page for reading offline with WinHTTRack
Available for Windows, Linux and Android.
HTTrack is arguably better than WebCopy because it's more well-known, open source, and available on platforms other than Windows, but the interface is a bit tricky and much is desired. But it works well, so don't let it soften you.
Similar to WebCopy, you can use a project-based approach to copy multiple websites and keep them all organized. You can pause and resume the download, or re-download the old and new files to update the copied website. How to download a website with HTTrack
Install and launch the app. Click Next to start creating a new project. Give your project a name, category, and base path, then click Next. Select Download websites for action and enter the URL for each website in the Web address box, one per line. You can also save the URL to a TXT file and import it. This is useful when you later re-download the same site. Click Next. Adjust the parameters as needed and click Finish.
Once everything is downloaded, you can browse the site as usual by navigating to the location where the file was downloaded and opening index.html or index.htm in your browser.
English Language Command Reference Download entire website offline
Wget
Available for Windows, Mac, and Linux.
While Wget is typically used to download single files, it can be used to recursively download all pages and files that are found through an initial page:
wget -r -p https://www.joeyoder.com
However, some sites may detect and prevent what you're trying to do because ripping a website can cost them a lot of bandwidth. To get around this, you can disguise yourself as a web browser with a user agent string:
wget -r -p -U Mozilla https://www.joeyoder.com
If you want to be polite, you should also limit your download speed (so you don't hog the web server's bandwidth) and pause between each download (so you don't overwhelm the web server with too many requests):
wget -r -p -U Mozilla --wait=10 --limit-rate=35K https://www.joeyoder.com
Wget comes bundled with most Unix-based systems. On Mac, you can install Wget using a single Homebrew command: brew install wget (how to set up Homebrew on Mac). On Windows, you'll need to use this ported version instead.
Other WebCopy
Available for Windows only.
WebCopy by Cyotek takes a website URL and scans it for links, pages, and media. As it finds pages, it recursively looks for more links, pages, and media until the whole website is discovered. Then you can use the configuration options to decide which parts to download offline.
The interesting thing about WebCopy is you can set up multiple "projects" that each have their own settings and configurations. This makes it easy to re-download many different sites whenever you want, each one in the same exact way every time.
One project can copy many websites, so use them with an organized plan (e.g. a "Tech" project for copying tech sites). How to Download an Entire Website With WebCopy
Install and launch the app.
Navigate to File > New to create a new project.
Type the URL into the Website field.
Change the Save folder field to where you want the site saved.
Play around with Project > Rules… (learn more about WebCopy Rules).
Navigate to File > Save As… to save the project.
Click Copy Website in the toolbar to start the process.
Once the copying is done, you can use the Results tab to see the status of each individual page and/or media file. The Errors tab shows any problems that may have occurred and the Skipped tab shows files that weren't downloaded.
But most important is the Sitemap, which shows the full directory structure of the website as discovered by WebCopy.
To view the website offline, open File Explorer and navigate to the save folder you designated. Open the index.html (or sometimes index.htm) in your browser of choice to start browsing.
HTTrack
Grab a webpage for offline reading with WinHTTRack
Available for Windows, Linux, and Android.
HTTrack is more known than WebCopy, and is arguably better because it's open source and available on platforms other than Windows, but the interface is a bit clunky and leaves much to be desired. However, it works well so don't let that turn you away.
Like WebCopy, it uses a project-based approach that lets you copy multiple websites and keep them all organized. You can pause and resume downloads, and you can update copied websites by re-downloading old and new files. How to Download a Website With HTTrack
Install and launch the app.
Click Next to begin creating a new project.
Give the project a name, category, base path, then click Next.
Select Download web site(s) for Action, then type each website's URL in the Web Addresses box, one URL per line. You can also store URLs in a TXT file and import it, which is convenient when you want to re-download the same sites later. Click Next.
Adjust parameters if you want, then click Finish.
Once everything is downloaded, you can browse the site like normal by going to where the files were downloaded and opening the index.html or index.htm in a browser.
credit to original Author Joel Lee (1604 Articles Published)
Japanese directly to the source https://tinyurl.com/website-backup-wget-japanese
Joel Lee (1604 article published)
Japanese directly to the source https://tinyurl.com/website-backup-wget-japanese
Recommended Posts