I want to get Amazon HTML and asin code via php.

Asked 2 months ago, Updated 2 months ago, 0 views

$html=file_get_contents("http://www.amazon.co.jp/s/fst=nb___mk_ja_JP=%E3%82%AB%E3%82%BF%E3%82%AB%E3%83%8A&url=node%3D466280&field-keywords=%E6%96%B0%E5%88%8A");
  //sjis conversion
  $domDocument = new DOMDocument();
  $domDocument->loadHTML ($html);
  $xmlString=$domDocument->saveXML();
  $xmlObject=simplexml_load_string($xmlString);
  var_dump($xmlObject);

Results

 bool(false)


I tried a website other than Amazon and found a plastic surgery.
What is the difference?

php api

2022-09-30 11:48

2 Answers

As you can see in the comment in the question, if you try with the URL shown, you will get a perspective error.
There is a grammatical error in the HTML that Amazon returns.

I can't think of a way to get through something with these errors, so I just point out the cause.

The following error occurred in my environment (PHP 5.6.9-pl0-gentoo) (only the first part is long):

PHP Warning:simplexml_load_string():Entity:line1771:parser error:Double hyphen with comment:<!--
<divid="main" skeleton-key="results in php shell code on line 1

Warning:simplexml_load_string():Entity:line1771:parser error:Double hyphen with comment:<!--
<divid="main" skeleton-key="results in php shell code on line 1
PHP Warning: simplexml_load_string(): <divid="main" skeleton-key="results --searchTemplate defaultLayout so_jp_ja --ref in php shell code on line 1

If you look at the XML in that part, the comment is incorrect because -- appears between the comment <!-- and -->.This causes a perspective error.
Try removing <!-- to --> in this XML source to pass.
( -- restrictions (due to SGML balance) are relatively forgettable.)

Now that we know the code fragments that cause the error, we could easily get to the cause by launching an interactive shell with php-a, executing one line at a time, and checking the variable contents with var_dump() as appropriate.In fact, I understood the cause in that way.
In this example, a warning/error message is displayed in $domDocument->loadHTML($html); and $xmlObject=simplexml_load_string($xmlString); so you may notice that HTML is wrong.


2022-09-30 11:48

It is usually a bad idea to parse human HTML data, so you should use the API.You may want to check the product from the search results, but you may be able to use the Product Advertising API.


2022-09-30 11:48

If you have any answers or tips


© 2022 OneMinuteCode. All rights reserved.