The site you would like to download has the following configuration:
Among them, you want to download only the files under
However, the page in
section_a contains links to
So I ran the following wget command:
However, the results are not what I expected, and
section_a as well as
section_c files of hierarchy are downloaded.
The parent hierarchy file has not been retrieved as expected.
Why is the
https://files.example/works/section_a specified with the
-np option still downloading files from the directory in the same hierarchy
Why do I download files from the hierarchy directory even though I specify
https://files.example/works/section_a with the
-np option on it?
Also, is there a way to download only
/ to the end of the URL and wgetting, it ended up as
404 Not Found.
Also, if you try to access it with a slash at the end, the page cannot be found.bash wget
If the specified URL is a directory, try running with
/ at the end.
no-parent does not work on wget
However, this disappointed me and went to the parent's class to find out why.
Finally, you have to add
Why are files in the same hierarchy downloaded even though I specify https://files.example/works/section_a with the -np option?
--no-parent, so you don't want to get the parent hierarchy.
The hierarchy is subject to retrieval.
Also, is there a way to download only section_a files?
section_c is obtained because
-r specifies recursive retrieval, so you should not add
-r-l1 as follows:
section_a was a file, but it is a directory.
If you have the URL
https://files.example/works/section_a, the underlying directory is
section_a is interpreted as a file.
section~a was a directory, the web server usually returns a message that redirects to
https://files.example/works/section_a/ to tell the client that it is a directory.
https://files.example/works/section_a/, the base directory will be
/works/section_a/, so the -np option will work as expected.
However, if you specify
https://files.example/works/section_a/ in wget, it becomes Not Found.
You may have returned the file directly as a redirect destination.
Do you see the following redirect message when you run wget?
If you have returned the file in
section_a as the redirect destination, you can specify the URL in wget.
The problem is if you returned a file outside of
section_a or if it was not redirected.If so, try specifying
/works/section_a in the
--accept-regex option as follows:
However, if the files required to display html under
section_a are outside
section_a, you will not be able to retrieve those files.
In that case, you might want to exclude
section_b from the
--reject-regex option as follows:
© 2022 OneMinuteCode. All rights reserved.