Are content scrapers targeting your WordPress website? Consisting of bots that crawl the internet while copying content, they can prove troublesome for several reasons. Content scrapers will consume bandwidth and server resources, which could lead to longer page load times.
Even if they don’t harm your website’s page load times, content scrapers may jeopardize your site’s search rankings. After copying a piece of content on your website, they may republish it elsewhere. Search engines may then rank one of these duplicate versions rather than the original version found on your website. By protecting your WordPress website from content scrapers, you won’t have to worry about these bots harming your site’s page load times or rankings.
Disable RSS Feed
Disabling your website’s Really Simple Syndication (RSS) feed can protect it from content scrapers. There are different types of content scrapers. While they all copy content, some of the target specific types of content, such as RSS feeds. They seek out RSS feeds, and upon discovering one, they copy all of the included content so that it can be republished on another website.
Your WordPress website probably has at least one RSS feed. RSS feed creation is a native feature of the well-known content management system (CMS). When you install WordPress, it will automatically create an RSS feed consisting of your website’s most recent posts. You can disable this and all other RSS feeds, however, by modifying your website’s functions.php with the code mentioned at wordpress.stackexchange.com/questions/162811/how-to-secure-or-disable-the-rss-feeds.
Change the RSS Feed Setting
You can still use an RSS feed, but you should consider changing its settings to protect against content scrapers. With the default settings, all of the content from your most recent posts will be included in an RSS feed. Whether a post consists of 200 words or 2,000 words, content scrapers can completely copy it by targeting this RSS feed.
WordPress offers two settings for RSS feeds. Full text is the default setting that places all of your recent posts’ content in an RSS feed. The other setting is summary, which only places an excerpt of your recent posts’ content in an RSS feed. With the summary setting, content scrapers that target your website’s RSS feed will only be able to copy a small portion of its content.
You can change from the full text to the summary setting in the admin dashboard. After logging in to the admin dashboard, select the “Settings” button on the left sidebar menu and choose “Reading.” On the new settings page, you can choose either full text or summary for your website’s RSS feed.
Add Lots of Internal Links
When creating content, be generous with internal links. Internal links can discourage other users from targeting your website with content scrapers. Content scrapers are automated. They won’t selectively choose pieces of content to copy. Rather, they’ll automatically copy all of a given page’s or post’s content.
Internal links are links that connect one page on your website to another page on your site. Creating content with lots of internal links means your website will generate a similar amount of backlinks when content scrapers copy and republish your site’s content. The users who operate the content scrapers may not want to link to your website. When they see your website’s internal link-filled content, they may stop targeting it with content scrapers.
Install a Security Plugin
Installing a security plugin like Jetpack can give your website the upper hand on content scrapers. To copy your website’s content, content scrapers must visit it. Content scrapers behave differently than human visitors, though. They often have shorter page viewing sessions and send more Hypertext Transfer Protocol (HTTP) requests than human visitors. Security plugins are designed to look for suspicious behavior such as this.
A security plugin alone is often enough to keep content scrapers at bay. Once installed, they’ll monitor your website’s traffic while looking for signs of bot activity. If a security plugin believes a visitor is a bot, it will block all traffic originating from that Internet protocol (IP) address. Some of the top security plugins for WordPress include WordFence and Sucuri.
Another approach is to ignore content scrapers altogether. Assuming you have a high-quality web hosting package with a plentiful amount of bandwidth and server resources, content scrapers shouldn’t slow down your website. Speed-related performance issues typically only occur with low-end shared hosting packages.
If you’re worried that content scrapers will cause your website to lose some of its search rankings, you can use a sitemap. A sitemap is a file that serves as a directory for your website. It features the URLs or locations of all your website’s published pages. By using a sitemap, search engines will crawl the original versions of your website’s content before they crawl the duplicate versions created by content scrapers. As a result, content scrapers shouldn’t cause a loss of rankings.
Don’t just create a sitemap manually; use a plugin. A sitemap plugin will create a sitemap for your website, and it will update this newly created file whenever you make changes to your site. If you delete a page, for instance, it will remove that page from your website’s sitemap. If you publish a new page, on the other hand, the plugin will add that page to your website’s sitemap.
The XML Sitemaps plugin by Autocollo is a great choice. It’s completely free and distributed under the General Public License (GPL). Most importantly, the XML Sitemaps plugin will update your website’s sitemap automatically so that you don’t have to.
Content scrapers have been around for decades. They are bots that spider websites while copying their content. While content scrapers target all types of websites, those running WordPress are particularly at risk due to their RSS feed. You can protect your WordPress-powered website from content scrapers by disabling the RSS feed, changing the RSS feed settings, adding lots of internal links and using a security plugin. Alternatively, you can choose to ignore content scrapers if you take precautions to protect against slow speeds and a loss of ranking.
Pressable Helps Keep WordPress Websites Protected
As a premium managed WordPress hosting provider, Pressable provides customers with a web application firewall designed to prevent all types of cyber threats and keep your website up and running 24/7/365. Additionally, all Pressable hosting plans include Jetpack Security Daily for free (a $239 per year value) to provide an added layer of safety and protection.
Amanda serves as the Head of Sales and Enablement for Pressable. She's worked in the tech space for well over a decade and has spent the majority of that time building/training/leading teams. She loves travel and adventure and when she's not working, you can find her spending time with her family, lounging pool/beach-side, playing tennis, working out, and meeting people/making friends all along the way!