GDPR Compliance Missive

Today, May 25th (a Friday, amusingly enough for those of us who know that Friday is the worst day to deploy new things), is the day that the General Data Protection Regulation (GDPR) goes into effect for the European Union, so, since I have a web site on The Internet, and I know there are Europeans who occasionally visit this site, and supposedly I’m a professional in the information technology field, I wanted to write about it quickly.

Today, May 25th (a Friday, amusingly enough for those of us who know that Friday is the worst day to deploy new things), is the day that the General Data Protection Regulation (GDPR) goes into effect for the European Union, so, since I have a web site on The Internet, and I know there are Europeans who occasionally visit this site, and supposedly I’m a professional in the information technology field, I wanted to write about it quickly.

Then I wrote over fifteen hundred words about it, so I failed.

I had been ignoring it entirely, since I didn’t think it applied to this blog. I had read about it briefly before, but I came away with the impression that it would only apply to web sites that collect and store what in the U.S. is called “PII” or “personally identifying identifiable information.” Things like names, addresses, phone numbers, social security numbers, birth dates, and things of that nature. (I’ve had some occasion in my career to peruse and apply federal government regulations regarding PII.) Things that only insurance companies and banks and medical facilities and social media sites typically collect.

But @Jaedia mentioned this morning that it also applies to sites that simply track page views, which would probably be Every Site On The Entire Internet. She kindly provided a link to an article on WordPress’s compliance with GDPR. And another one.

I went to a GDPR site (I can’t tell if that’s an “official” government GDPR site or not, to be honest-I don’t think it is) and started reading over the definitions, which are, not surprisingly for a government regulation, stupefyingly vague. I’m not completely convinced that it legally applies to blogs like this, but the links above suggest it does. (I have no idea who would even enforce GDPR violations on my site or how that mechanism would work… it seems unlikely that EU police would show up at my doorstep.) Or at least that it might apply. I feel like there is a bit of a cottage industry right now in giving out overly cautious advice (and selling products and services) about GDPR compliance to nervous web site owners. Here is another page on that GDPR site discussing how IP addresses might be personally identifying identifiable information in Europe.

I opened up the actual PDF of the GDPR, as linked on that site, and searched for “ip address” and found nothing.

In any case, WordPress went ahead and built in compliance anyway. This site is currently running WordPress 4.9.6 as of this writing. If you leave a comment on this site, there is supposed to be a checkbox that confirms it’s okay to store your name and email address. There are no registered user accounts on this site except for mine. Incidentally I would encourage you never to use your real identifying name on any WordPress blog comment anywhere on the entire Internet.

Based on my quick perusal of the GDPR, I’m not entirely sure WordPress’s checkbox goes far enough, but if it does, than then great.

So in the interests of full disclosure and transparency, all of your comments on this blog are stored in a database table with the name, email address, and web site you typed in (which is also displayed publicly with every comment, so it’s certainly not “private”), along with the IP address and user agent of the web client you were using when the comment was entered (that part is private). [UPDATE: The name and URL are public, the email, IP, and user agent are private.] That means I can geographically locate every commenter on this site, not precisely, but probably down to the country or even city, unless you are obfuscating it with proxies. That database resides on a shared (probably virtual) server at my web host. Other folks with other web sites that I don’t know also have access to that server. The credentials for accessing my database are stored in a plain text php configuration file on my web server, just like probably every other WordPress site. Yes, it’s true, that’s a notable security risk, but every self-hosted WordPress site has it. Theoretically I am the only one who can access that file using my account login credentials on the server, and the file is protected by OS access controls. However, folks who work at the web host can probably figure out how to open that config file, then open the database, and read all of your comments, names, and IP addresses. So, you would be wise not to post your credit card information or home address in a comment on this or any other WordPress site. (I would delete it anyway.) I also have the credentials for the database stored in KeePass on my PC. The password for that KeePass file is not especially secure, if you want to know the honest truth, but my house is locked, and my wi-fi is reasonably secure, and I am the only person that I know of who can access my computers, and my computer passwords are much more secure. The point is that I also can login to the WordPress database and read all of your IP addresses and user agents at will. I can edit your comments and change them around to make you sound dumb if I want to. I’ve never done so and never would, but I could and any other WordPress site owner could too. (I have occasionally edited grammar mistakes in my own comments but I have never edited anyone else’s comments here that I can remember.) (Incidentally I’ve never logged into and looked at my WordPress database until just now, only to verify that what I’ve said here is true.)

Most of the above probably applies to every WordPress site you’ve ever visited, by the way. Certainly the self-hosted ones. The ones hosted on WordPress are probably less vulnerable to the site owner itself, but probably more vulnerable to people who work at WordPress.

I do have statistics enabled on this WordPress installation and on WordPress.com through Jetpack, so IP addresses and user agent strings are monitored for that purpose. I use this information periodically to see which posts get more traffic than others. (In the vast majority of cases, every post gets the same relatively non-existent traffic regardless.) WordPress didn’t make any changes to bring itself into GDPR compliance there, which seems odd to me. But if WordPress says it’s okay to track everyone’s IP for statistics, then I guess it’s okay. Or maybe it isn’t. I’ll let you know if anyone from the EU knocks on my door.

In addition, so you can make an informed decision about whether or not to click on my web site, HTTP statistics are also tracked by my web host. Every time you connect to my web site, your browser sends a ton of data to the web server, and my web host captures at least some of that. I can go to a page generated by AWStats and read all kinds of aggregated data about my web site. It’s likely your Internet Provider is also capturing some data about your visits to my site as well. Sadly my web host does not make it easy or cheap to enable HTTPS connections to this site to protect us both-one reason I am considering moving.

I believe I also have Google Analytics installed although to be honest I haven’t looked at them in ages. So your web client headers are going to Google as well. At this point in 2018, you should probably assume that your every online interaction, no matter how big or small, is making its way to Google somehow.

I suppose I can work on a privacy policy, although there is not much to say. I just told you all the data I collect above, if that’s even “data.” I don’t sell any information to anyone, and nobody has ever asked to buy anything. I don’t collect emails for mailing lists. If this site ever becomes popular (which is still decades away on its current trajectory), I would probably enable some ads, which would of course change things a bit because God only knows what those Javascript-laced ads would collect from you, and I’d be obliged to try to disclose that.

Allegedly I’m supposed to let you know if there’s a “data breach” at this site.

unless the personal data breach is unlikely to result in a risk to the rights and freedoms of natural persons.

The compromising of commenter IP addresses and user agents doesn’t seem like it would risk the rights and freedoms of anyone, so I don’t feel like I need to try to pull out all of my commenters’ email addresses from the database and send a mass email. But I will write a blog post if it happens. (My web host would have to find out about it first and send me an email, because it’s highly unlikely I would notice, unless someone actually started changing my site. Even then you would probably notice before I did.)

My podcast’s MP3 files are hosted on Archive.org, which as far as I know tracks nothing except a count of downloads. I’m not able to access any other tracking information, at least. Presumably it would be their problem to deal with whatever storage of personal data occurs on their site.

That’s all I can think of to say about GDPR.

P. S. I’ve been investigating migrating to another blog platform and/or another blog host, and I’m debating dropping support for blog comments altogether. I think blog comments are approaching the point of obsolescence in the modern world. I would rather discuss my blog posts on Twitter or Discord, personally. And yes, part of the reason is that it absolves me of the legal responsibility of managing comments, in an age when governments are starting to put increasing pressure on web site owners to police their users.

Looking for fediverse mentions...