Skip to content

sitemap.xml build wrong urls for sitemapindex via cron #9440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
christian-forgacs opened this issue Apr 27, 2017 · 12 comments
Closed

sitemap.xml build wrong urls for sitemapindex via cron #9440

christian-forgacs opened this issue Apr 27, 2017 · 12 comments
Labels
bug report Issue: Cannot Reproduce Cannot reproduce the issue on the latest `2.4-develop` branch Issue: Clear Description Gate 2 Passed. Manual verification of the issue description passed Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Progress: needs update

Comments

@christian-forgacs
Copy link

In our setup the sitemap generation build wrong urls for the sitemapindex via cron.

Preconditions

  1. Magento 2.1.6
  2. PHP 7.0.8
  3. Home of www-data user (execute the cron) is /var/www/
  4. Sitemap generation via cron is activated
    • Generation over Magento 2 Admin Panel works correct
    • Generation over Magento 2 Cron generate wrong urls

Steps to reproduce

  1. Start sitemap generation via cron

Expected result

  1. Normally the urls in sitemapindex are correct and used from Magento 2 main directory.

Actual result

  1. In the sitemapindex via cron the complete path from the home directory of the www-data user to the Magento 2 directory is set.
@TomashKhamlai
Copy link
Contributor

Hello.
I tried to reproduce the issue you have reported.

Steps to reproduce:

  1. Execute this to get the time on the server: php -r '$date = date("m/d/Y h:i:s a", time()); echo "Server time is: " . $date . "\r\n"; exit;'
  2. Create the directory 'sitemap' in the root folder of your site.
  3. Navigate to Stores -> Settings -> Configuration -> Catalog -> XML Sitemap.
  4. Expand 'Generation' panel.
  5. Change 'Enable' to 'Yes'.
  6. Change 'Start Time' to 10 minutes later than the time from the step 1.
  7. Select 'Hourly' from 'Frequency' dropdown in 'Products Options', 'Categories Options' and 'CMS Pages Options'.
  8. Save config.
  9. Navigate to Marketing -> SEO & Search -> Site Map.
  10. Press 'Add Sitemap'.
  11. Change 'Filename' to 'sitemap.xml' and 'Path' to '/sitemap/'
  12. Press 'Save and Generate'.
  13. Wait for the cron generation to be executed by the time you specified in step 6.

Here is my sitemap.xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:content="http://www.google.com/schemas/sitemap-content/1.0" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url><loc>http://magento216.vg/home</loc><lastmod>2017-07-04T08:48:45+00:00</lastmod><changefreq>hourly</changefreq><priority>0.2</priority></url>
<url><loc>http://magento216.vg/enable-cookies</loc><lastmod>2017-07-04T08:48:43+00:00</lastmod><changefreq>hourly</changefreq><priority>0.2</priority></url>
<url><loc>http://magento216.vg/privacy-policy-cookie-restriction-mode</loc><lastmod>2017-07-04T08:48:43+00:00</lastmod><changefreq>hourly</changefreq><priority>0.2</priority></url>
</urlset>

Please, correct me if you tried in another way.

@hostep
Copy link
Contributor

hostep commented Jul 4, 2017

@TomashKhamlai, just FYI: it feels like this is a duplicate of #5321 (comment), but @christian-forgacs isn't talking about images in his opening post, so I'm not entirely convinced that he is reporting the same bug.

@TomashKhamlai
Copy link
Contributor

@christian-forgacs are you using NGINX as a server?

@christian-forgacs
Copy link
Author

@TomashKhamlai yes we're using NGINX as server.

@andrewkett
Copy link
Contributor

andrewkett commented Sep 11, 2017

I can also replicate this. The problem happens for me when running the sitemap generation through bin/magento cron:run from outside of the magento directory. e.g our magento code is in /var/www/src, if I run php src/bin/magento cron:run from /var/www the urls will contain src as part of the baseurl (https://www.example.com/src). Running from the magento directory, e.g php bin/magento cron:run from /var/www/src, works as expected.

I believe the problem can be traced back to \Magento\Sitemap\Model\Sitemap::_getStoreBaseDomain, the $storeDomain variable returned is incorrect under the conditions described. I believe it is due to the logic in this condition

Similarly if I run php /var/www/src/bin/magento cron:run from the root of our server the job actually fails with the following message:

Notice: Undefined property: Magento\Sitemap\Model\Observer::$_translateModel in /var/www/src/vendor/magento/module-sitemap/Model/Observer.php

This is due to $documentRoot being empty in the same condition causing this error:

Warning: strpos(): Empty needle in /var/www/src/vendor/magento/module-sitemap/Model/Sitemap.php

@magento-engcom-team magento-engcom-team added 2.1.x bug report Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Issue: Clear Description Gate 2 Passed. Manual verification of the issue description passed labels Sep 11, 2017
@RomaKis
Copy link
Contributor

RomaKis commented Sep 12, 2017

@christian-forgacs , thank you for your report.
We were not able to reproduce this issue by following the steps you provided.

Steps to reproduce with NGINX and Magento version 2.1.6 :

  1. Execute this to get the time on the server: php -r '$date = date("m/d/Y h:i:s a", time()); echo "Server time is: " . $date . "\r\n"; exit;'
  2. Create the directory 'sitemap' in the root folder of your site.
  3. Navigate to Stores -> Settings -> Configuration -> Catalog -> XML Sitemap.
  4. 'Generation' -> Change 'Enable' to 'Yes'.
  5. Change 'Start Time' to 2 minutes later than the time from the step 1.
  6. Save config. Flush cache.
  7. Navigate to Marketing -> SEO & Search -> Site Map. Press 'Add Sitemap'.
  8. Change 'Filename' to 'sitemap.xml' and 'Path' to '/sitemap/'.
  9. Press 'Save'.
  10. Wait for time from 1 step + 2 min and run command php magento2ce/bin/magento cron:run && magento2ce/bin/magento cron:run several times.

Please provide more details regarding your environment, or try to reproduce this
issue on a clean installation.

@magento-engcom-team
Copy link
Contributor

@christian-forgacs, thank you for your report.
We were not able to reproduce this issue by following the steps you provided. If you'd like to update it, please reopen the issue.

@magento-engcom-team magento-engcom-team added the Issue: Cannot Reproduce Cannot reproduce the issue on the latest `2.4-develop` branch label Sep 19, 2017
@mihaiaperghis
Copy link

I can confirm this happens when the sitemap itself is an index file that contains sitemap children (in shops with 50k+ pages).

So if I create a sitemap.xml file under the /pub/ path, its address will be

/pub/sitemap.xml

but it will contain sitemap children such as

http://example.com/websites/example.com/public_html/pub/sitemap-1-1.xml
http://example.com/websites/example.com/public_html/pub/sitemap-1-2.xml
...

which aren't valid paths.

Hope that helps!

@kurtinge
Copy link

kurtinge commented Dec 14, 2017

I can also confirm that this is an issue with the sitemap index.
The issue can be tracked back to \Magento\Sitemap\Model\Sitemap::_getDocumentRoot.
When you run as cron $this->_request->getServer('DOCUMENT_ROOT') will be empty, and realpath with an empty input will return the path where the cron is starting. Normally this is the home directory for the user running the cron job.

A workaround would be to change the cron job like this:

* * * * * cd /path/to/magento/root; /usr/bin/php /path/to/magento/root/bin/magento cron:run

And if you are running you document root inside pub:

* * * * * cd /path/to/magento/root/pub; /usr/bin/php /path/to/magento/root/bin/magento cron:run

@JavierYD
Copy link

More errors:

  1. the URLs to products inside the sitemap files are not fully SEO => remove index.php from URLs
  2. images must point to the source image path and not to the cache folder

https://piezas-portatiles.com/index.php/bateria-para-acer-aspire-e1-522-e1-530-li-ion-14-8v-2600mah-bt28.html2017-05-31T08:30:48+00:00daily1.0image:imageimage:lochttps://piezas-portatiles.com/pub/media/catalog/product/cache/03b00c63a37940b60c758eb1601e45c8/b/t/bt28.jpg</image:loc>image:titleBateria para Acer ASPIRE E1-522 E1-530 Li-ion 14,8v 2600mAh BT28</image:title></image:image>image:imageimage:lochttps://piezas-portatiles.com/pub/media/catalog/product/cache/03b00c63a37940b60c758eb1601e45c8/b/t/bt28-1.jpg</image:loc>image:titleBateria para Acer ASPIRE E1-522 E1-530 Li-ion 14,8v 2600mAh BT28</image:title></image:image>image:imageimage:lochttps://piezas-portatiles.com/pub/media/catalog/product/cache/03b00c63a37940b60c758eb1601e45c8/b/t/bt28-2.jpg</image:loc>image:titleBateria para Acer ASPIRE E1-522 E1-530 Li-ion 14,8v 2600mAh BT28</image:title></image:image>image:imageimage:lochttps://piezas-portatiles.com/pub/media/catalog/product/cache/03b00c63a37940b60c758eb1601e45c8/b/t/bt28-3.jpg</image:loc>image:titleBateria para Acer ASPIRE E1-522 E1-530 Li-ion 14,8v 2600mAh BT28</image:title></image:image>image:imageimage:lochttps://piezas-portatiles.com/pub/media/catalog/product/cache/03b00c63a37940b60c758eb1601e45c8/b/t/bt28-4.jpg</image:loc>image:titleBateria para Acer ASPIRE E1-522 E1-530 Li-ion 14,8v 2600mAh BT28</image:title></image:image>

@JavierYD
Copy link

Some products and categories have unfriendly URLs:

<loc>
https://piezas-portatiles.com/index.php/catalog/product/view/id/221071
</loc>

@Bhavik-kumar
Copy link

#9440 (comment)

above solution is work for me

magento-devops-reposync-svc pushed a commit that referenced this issue Dec 28, 2024
…-19-2024

[Support Tier-4-Kings glo16746] 12.19.2024 Regular delivery of bugfixes and improvements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report Issue: Cannot Reproduce Cannot reproduce the issue on the latest `2.4-develop` branch Issue: Clear Description Gate 2 Passed. Manual verification of the issue description passed Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Progress: needs update
Projects
None yet
Development

No branches or pull requests