I want to know how web crawlers work and how to make them.

Asked 5 months ago, Updated 5 months ago, 17 views

I want to make a crawler, but I don't know what to start with.
I would appreciate it if you could let me know any of the questions below.

  • How to make a web crawler
  • How does the web crawler work

Please tell me how web crawlers work.
Also, please tell me how web crawlers collect web information.


2022-09-30 12:06

1 Answers

Crawler (how it works) is basically

  • Retrieve HTML source for web pages using http/https
  • Separate the visible part (body) from the part that only works inside the computer
  • Subscribe the body to the database (with relevant information) after passing through the natural language analysis process
  • Turn off ads and things you're not interested in
  • Do the same thing on the link (and the amount of data explodes so quickly that you need some execution resources)

Of these actions

  • In the natural language analysis section, each search engine has its own unique know-how, and each has its own unique core
  • Other than that, it's not that difficult (it takes some skill to hide that it's a machine)

I think it is the case.In fact, crawlers are just data acquisition units, so it's actually much more difficult and expensive to use the acquired data in the subsequent process (such as target ads)

Now the curl command allows you to get the first "source" and this may be enough for the first step.

2022-09-30 12:06

If you have any answers or tips

© 2023 OneMinuteCode. All rights reserved.