Skip to main content

Posts

Showing posts from December, 2016

How you can improve online visibility by optimizing your website for scraper bots

Digging further into internet data mining: how website structure affects your website's visibility and ease-of-use for web scrapers (and why it should matter to you). This post will be a continuation of the previous excursion into web scraping I did, as I posted about here:  https://techartworks.blogspot.com/2016/12/an-excursion-into-web-scraping-with.html In an attempt to take some first, be it tentative, steps into web scraping with Python this week (making use of the fantastic "requests" and "Beautifulsoup4" modules), I ran into a few walls, but also made a good bit of headway. In particular, I gained some insights into ways you could optimize your website's structure in order to improve its user friendlines both for users and web scrapers, thereby increasing the chances of your pages popping up when someone googles your name. Before going any further with this, there are a few notes I would like to make: Networks like LinkedIn provide ways to...

An excursion into web scraping with Python 3 and BeautifulSoup

A quick note to start: In the following web scraping program, I used: Python 3.5.2 the requests module for python 3 the BeautifulSoup4 module for python 3 Sublime Text 3 If you want to duplicate the process or code, please feel free :) Earlier this week I was going over some ideas in my head for fun little experiments I could do with Python - every time I do one of these, the language opens up just a little more for me, and I grow to love it even more than I already do (which is a lot). This time, I decided I would try my hand at web scraping. The first thing I noticed was the incredible ease python displays with the process, given the existing modules the language offers: in this case BeautifulSoup4, an enormously powerful data model for web-based HTML data that's been around for a while, combined with "requests", a module that allows you to get any URL's page code in pure text format, were all I needed to get things going. This all started when I...