News
Improving SEO: Generate a sitemap.xml for public projects
— Make your public projects show up in search results.

If you host Open Source projects on a self-hosted GitLab instance, you might have noticed that projects are not indexed very well by Google and other search engines. This can be improved by providing a sitemap.xml which lists public projects. GitLab provides no native solution for this yet, and has placed the feature request to implement this on the backlog. Luckily we can use GitLab to generate a sitemap for us with some simple configuration. The solution we offer here might not be perfect, but it helps a lot.
Generate the sitemap
First off, the generation of the sitemap itself. While GitLab doesn’t generate the sitemap automatically, it does contain code to generate it. This is because GitLab does offer a sitemap on gitlab.com/sitemap.xml. In order to generate a sitemap, place the Ruby script below somewhere on your GitLab instance. We use "/var/opt/gitlabhost/generate_sitemap.rb" in this example.
# generate_sitemap.rb
include Gitlab::Routing
file = Gitlab::Sitemaps::SitemapFile.new
# Add generic URLs
file.add_elements([explore_projects_url, explore_snippets_url, explore_groups_url])
# Add group and subgroup URLs
groups = GroupsFinder.new(nil).execute
file.add_elements(groups)
# Add project URLs
for group in groups do
projects = GroupProjectsFinder.new(
current_user: nil,
group: group,
params: { non_archived: true, visibility_level: Gitlab::VisibilityLevel::PUBLIC },
options: { exclude_shared: true }
).execute.include_project_feature.inc_routes
file.add_elements(projects)
for project in projects do
deployment = PagesDeployment.find_by(project_id: project.id)
if not deployment.nil?
parts = project.path_with_namespace.split('/')
namespace = parts.shift
file.add_elements([sprintf('https://%s.pages.example.com/%s', namespace, parts.join('/'))])
end
end
end
# Render sitemap
if file.empty?
abort('No URLs found to generate the sitemap')
else
File.write('/var/opt/gitlabhost/public/sitemap.xml', file.render)
puts 'Saved generated sitemap to /var/opt/gitlabhost/public/sitemap.xml'
end
As you can see, we use GitLab's internal sitemap and group-finder functionality to generate the sitemap. Be sure to replace "pages.example.com" with the domain name where your GitLab Pages are served if you have them.
You can run this script with the gitlab-rails runner from your GitLab server like this:
$ gitlab-rails runner /var/opt/gitlabhost/generate_sitemap.rb
This can easily be run in a daily cronjob like this:
0 2 * * * /usr/bin/gitlab-rails runner /var/opt/gitlabhost/generate_sitemap.rb
Now you have a sitemap.xml file generated in "/var/opt/gitlabhost/public/sitemap.xml" which is updated every day at 02:00.
Serve the sitemap
Generating the sitemap is only half of the solution. We also have to serve it and let crawlers know where to find it. By using the example from GitLab's documentation to serve a custom robots.txt, we can also serve a custom sitemap.xml. Add the following to your "/etc/gitlab/gitlab.rb" configuration file and run "gitlab-ctl reconfigure".
nginx['custom_gitlab_server_config'] = "\nlocation = /sitemap.xml { alias /var/opt/gitlabhost/public/sitemap.xml; }\n
\nlocation = /robots.txt { alias /var/opt/gitlabhost/public/robots.txt; }\n"
This looks a bit messy with all the "\n" entries, but this is necessary because this rule needs to be a one-liner which gets injected into nginx configuration and needs to be formatted properly for nginx to parse it correctly. This configuration already includes a reference to the robots.txt file which we will edit now.
$ cp /opt/gitlab/embedded/service/gitlab-rails/public/robots.txt /var/opt/gitlabhost/public/robots.txt
Copy GitLab’s robots.txt to a new location so we can edit it. Open the file and add "Sitemap: https://gitlab.example.com/sitemap.xml" at the bottom. Change "gitlab.example.com" with the domain on which your GitLab instance is available.
Conclusion
It takes a bit of custom configuration, but it is possible to generate a sitemap.xml for your GitLab instance using the steps described here. We hope this helps you solve this problem until GitLab offers a native solution.

Daniel
High Availability Engineer - Team Lead