Python Archives - Matthew Gove Blog

How to Bulk Edit Your Photos’ EXIF Data with 10 Lines of Python

Matt Gove — Fri, 13 May 2022 15:00:00 +0000

Keeping up-to-date EXIF data is critically important for managing large libraries of photos. In addition to keeping your library organized, EXIF data also lets you sort, filter, and search your photo library on numerous criteria, making it easy to find the images you want, and fast. Unfortunately, many photographers, including myself, tend to let things slip when it comes to keeping metadata up to date. As a result, when it comes time to edit your EXIF data, you need to do it in bulk, which can be a tedious and time-consuming task.

EXIF stands for Exchangeable Image Format. It’s a standard that defines the data or information related to any media you capture with a digital camera. It can include data such as:

EXIF Data Seen in Adobe Lightroom

Image Size
File Name and Location on Your Computer
Camera and Lens Make/Model
Exposure Settings (Aperture, Shutter, ISO, etc)
Date and Time the Photo was Taken
Location and Elevation Where the Photo was Taken
Photographer’s Name and Contact Info
Copyright Info
Software Used to Post-Process the Image
Much More

Why Do You Need to Edit EXIF Data?

Regardless of whether you need to add or remove EXIF data, there are plenty of reasons to edit it. In my nearly two decades doing photography, here are some of the more common reasons I’ve had to edit EXIF data.

Strip out sensitive information when you post photos publicly on the web
Add location data or geotag images for cameras that don’t have GPS
Add or update titles, descriptions, and captions
Rate, label, and tag images
Add or update your contact info and/or copyright information

Maintaining Fully-Populated EXIF Data Makes Browsing and Searching Your Photo Library a Breeze

If you’re planning to sell your photography in any way, shape, or form, you better have fully populated EXIF data. For example, let’s consider stock photography websites. They use the EXIF data embedded in your images to return the most relevant images in search results. Without fully-populated EXIF data, your images won’t be returned in their searches, and you won’t make any sales as a result.

Available Tools to Bulk Edit EXIF Data

Thankfully, there are numerous tools available so you can edit your photos’ EXIF data. They all support bulk editing, so it doesn’t matter whether you’re updating one photo or a million.

Photo editors and organizers such as Adobe Lightroom
EXIF Editors are available for all operating systems, including iOS and Android. Many are free.
Python

Do be aware that while many EXIF editors advertise themselves as free, they often come with heavy restrictions if you don’t want to pay for the full software. Because of these restrictions, Python is one of the few tools that can edit EXIF data both for free and without restrictions. In this tutorial, we’re going to use Python to add EXIF data to a series of images.

Python Image Libraries

In previous tutorials, we’ve used Python’s Pillow Library to do everything from editing and post-processing to removing noise from and adding location data to photos. And we’ll show in this tutorial that you can use Pillow to add, edit, and remove EXIF data. However, there’s a better Python tool to manage and edit your EXIF data: the exif library.

So what makes the exif library better than Pillow? For that, we have to look at how the computer stores and reads EXIF data. In the computer, EXIF parameters are stored as numeric codes instead of English words. For example, instead of “Camera Make”, the computer just sees 271. Likewise, the computer sees 36867 instead of “Date/Time Image Taken”.

To edit the EXIF data using Pillow, you need to know the numeric key for each field you want to edit. Considering that there are thousands of these numeric keys in use, you’ll spend an incredible amount of time just searching for they numeric keys you want. On the other hand, the exif library uses human-readable keys to edit the EXIF data. We’ll go over how to use both libraries, so you can decide which one you prefer.

Install the Pillow and Exif Libraries (If You Haven’t Already)

Before diving into EXIF data, you’ll need to install the Python Pillow and Exif libraries if you haven’t already. With pip, it’s a couple quick commands in a Terminal or Command Prompt.

pip3 install pillow
pip3 install exif

Import the `Image` Property from both the Pillow and Exif Libraries into Your Python Script

In order to run code from the Pillow and Exif libraries, you’ll need to import the Image property from each library into your Python script. To avoid name conflicts, we’ll call them ExifImage and PillowImage. From the Pillow library, we’ll also import the ExifTags property, which converts the numeric EXIF tags into human-readable tags.

from exif import Image as ExifImage
from PIL import Image as PillowImage
from PIL import ExifTags

Images for this Demo

I’ve included three images with this tutorial, but you can add as many of your own as you please. One set has the metadata intact, which we’ll use for reading the metadata. I stripped the EXIF data out of the other set, so we can add it with Python. I also took the images with three different cameras.

Image Description	Camera	Geotagged
Chicago Cubs vs Boston Red Sox Spring Training game in Mesa, Arizona	Samsung Galaxy	Yes
Stonehenge Memorial in Washington State	Nikon D3000	No
Beach Scene on Cape Cod, Massachusetts	Canon EOS R5	No

Back Up Your Images Before You Begin

Before you do anything with the Python code, make a copy of the folder containing your original images. You never know when something will go wrong. With a backup, you can always restore your images to their original state.

Reading EXIF Data with Python

First, we’ll loop through the images and extract the camera make and model, the date and time the image was taken, as well as the location data.

Parameter	Pillow Property	Exif Property
Camera Make	271	make
Camera Model	272	model
Timestamp	36867	datetime_original
GPS Info	34853	gps_latitude, gps_longitude

The steps to read the EXIF data from each image and output it to the terminal window are as follows.

Open the image
Extract the value of each metadata tag displayed in the table above
Print the human-readable tag and the value in the terminal window.

Define the Universal Parameters You’ll Use to Extract EXIF Data in the Python Script

In our Python script, the first thing we need to do is define the universal parameters we’ll use throughout the script. First, we’ll create a list of the image filenames.

images = ["baseball.jpg", "cape-cod.jpg", "stonehenge.jpg"]

Next, define the EXIF Tags for the Pillow library that will extract the data we want. Remember, that Pillow uses the EXIF numeric tags.

PILLOW_TAGS = [
    271,    # Camera Make
    272,    # Camera Model
    36867,  # Date/Time Photo Taken
    34853,  # GPS Info
]

Finally, create a variable to store the EXIF tags for the Exif library. The Exif library uses human-readable tags, so please consult their documentation for the full list of tags.

EXIF_TAGS = [
    "make",
    "model",
    "datetime_original",
    "gps_latitude",
    "gps_latitude_ref",
    "gps_longitude",
    "gps_longitude_ref",
    "gps_altitude",
]

Read EXIF Data with the Pillow Library

To extract the EXIF data from all of the images at once, we’ll loop through the images variable we defined above. We’ll print the image filename and then set the image path inside the with-metadata folder.

for img in images:
    print(img)
    image_path = "with-metadata/{}".format(img)

Next, open the image with Pillow and extract the EXIF data using the getexif() method.

pillow_img = PillowImage.open(image_path)
img_exif = pillow_img.getexif()

Now, we’ll loop through the tags. Pillow has a property called ExifTags that we’ll use to get the human-readable definition of each numeric tag. Do note that you’ll need to wrap it in a try/except block to skip properties that are not set. Without it, you’ll get an error and the script will crash if a property is not set. For example, the Cape Cod and Stonehenge images do not have GPS/location data. Finally, print the human-readable tag and the value to the Terminal window.

for tag in PILLOW_TAGS:
    try:
        english_tag = ExifTags.TAGS[tag]
        value = img_exif[tag]
    except:
        continue
    print("{}: {}".format(english_tag, value))

Final Pillow Code

Put it all together into nice, compact block of code.

for img in images:
    print(img)
    image_path = "with-metadata/{}".format(img)
    pillow_img = PillowImage.open(image_path)
    img_exif = pillow_img.getexif()
    
    for tag in PILLOW_TAGS:
        try:
            english_tag = ExifTags.TAGS[tag]
            value = img_exif[tag]
        except:
            continue
        print("{}: {}".format(english_tag, value))

When you run the script, it will output the info about each photo.

baseball.jpg
Make: samsung
Model: SM-G965F
DateTimeOriginal: 2019:03:25 18:06:05
GPSInfo: æ0: b'Øx02Øx02Øx00Øx00', 1: 'N', 2: (33.0, 25.0, 50.0), 3: 'W', 4: (111.0, 52.0, 53.0), 5: b'Øx00', 6: 347.0, 7: (1.0, 6.0, 0.0), 27: b'ASCIIØx00Øx00Øx00GPS', 29: '2019:03:26'å

cape-cod.jpg
Make: Canon
Model: Canon EOS R5
DateTimeOriginal: 2022:03:22 20:58:52

stonehenge.jpg
Make: NIKON CORPORATION
Model: NIKON D3000
DateTimeOriginal: 2022:02:16 21:36:08

A Quick Word on Interpreting the GPS Output

When you look at the GPS output, you’re probably wondering what the hell you’re looking at. To decipher it, let’s break it down and look at the important components. FYI, Pillow returns latitude and longitude coordinates as tuples of (degrees, minutes, seconds).

1: 'N', 
2: (33.0, 25.0, 50.0), 
3: 'W', 
4: (111.0, 52.0, 53.0),
6: 347.0,

Here’s what it all means.

'N' indicates that the latitude coordinate is in the northern hemisphere. It returns 'S' for southern hemisphere latitudes.
(33.0, 25.0, 50.0) contains the degrees, minutes, and seconds of the latitude coordinates. In this case, it’s 33°25’50″N.
'W' indicates that the longitude coordinate is in the western hemisphere. It returns 'E' for eastern hemisphere longitudes.
The (111.0, 52.0, 53.0) tuple contains the degrees, minutes, and seconds of the longitude coordinates. Here, it’s 111°52’53″W.
347.0 is the altitude at which the photo was taken, in meters.

Remember, it’s the baseball picture that’s geotagged. If we plot those coordinates on a map, it should return the Chicago Cubs’ Spring Training ballpark in Mesa, Arizona. Indeed, it even correctly shows us sitting down the first base line.

Chicago Cubs vs. Boston Red Sox Spring Training Game in March, 2019

Read EXIF Data with the Exif Library

Extracting EXIF data from your photos using the Exif library is very similar to the Pillow library. Again, we’ll start by printing the image filename and set the image path inside the with-metadata folder.

for img in images:
    print(img)
    image_path = "with-metadata/{}".format(img)

Next, we’ll read the image into the Exif library. However, unlike Pillow, the Exif library automatically extracts all the EXIF data when you instantiate the Image object. As a result, we do not need to call any additional methods or functions.

with open(image_path, "rb") as input_file:
    img = ExifImage(input_file)

Because the Exif library automatically extracts the EXIF data, all you need to do is just loop through the tags and extract each one with the get() method. And unlike the Pillow library, the Exif library also automatically handles instances where data points are missing. It won’t throw an error, so you don’t need to wrap it in a try/except block.

for tag in EXIF_TAGS:
    value = img.get(tag)
    print(“{}: {}”.format(tag, value))

Final Exif Library Code

When you put everything together, it’s even cleaner than using the Pillow library.

for img in images:
    print(img)
    image_path = “with-metadata/{}”.format(img)
    with open(image_pdaath, ”rb”) as input_file:
        img = ExifImage(img_file)

    for tag in EXIF_TAGS:
        value = img.get(tag)
        print(“{}: {}”.format(tag, value))

When you run the script, the output from the Exif library should be identical to the output from the Pillow library, with one exception. The Exif Library breaks down the GPS data into its components. You’ll still get the same tuples you do with the Pillow library, but it labels what each component of the GPS data is using English words instead of numeric codes.

baseball.jpg
make: samsung
model: SM-G965F
datetime_original: 2019:03:25 18:06:05
gps_latitude: (33.0, 25.0, 50.0)
gps_latitude_ref: N
gps_longitude: (111.0, 52.0, 53.0)
gps_longitude_ref: W
gps_altitude: 347.0

cape-cod.jpg
make: Canon
model: Canon EOS R5
datetime_original: 2022:03:22 20:58:52
gps_latitude: None
gps_latitude_ref: None
gps_longitude: None
gps_longitude_ref: None
gps_altitude: None

stonehenge.jpg
make: NIKON CORPORATION
model: NIKON D3000
datetime_original: 2022:02:16 21:36:08
gps_latitude: None
gps_latitude_ref: None
gps_longitude: None
gps_longitude_ref: None
gps_altitude: None

Writing, Editing, and Updating EXIF Data Using Python

To demonstrate how to write and edit EXIF data, we’re going to add a simple copyright message to the all three images. That message will simply say ”Copyright 2022. All Rights Reserved.” We’ll also add your name to the EXIF data as the artist/photographer.

Universal Tags We’ll Use Throughout the Python Script

Just like we did when we read the EXIF data from the image, we’ll define the artist and copyright tags we’ll use to edit the EXIF data in each library. We’ll also store the values we’ll set the tags to in the VALUES variable.

PILLOW_TAGS = [
    315,     # Artist Name
    33432,   # Copyright Message
[

EXIF_TAGS = [
    “artist”,
    ”copyright”,
]

VALUES = [
    “Matthew Gove”,    # Artist Name
    ”Copyright 2022 Matthew Gove. All Rights Reserved.”  # Copyright Message
]

How to Edit EXIF Data with the Pillow Library

In order to edit the EXIF data, you need to open the image with the Pillow Library and load the EXIF data using the getexif() method. This code is identical to when we read the metadata. The only difference is that we’re loaded the image from the without-metadata folder.

for img in images:
    image_path = “without-metadata/{}”.format(img)
    pillow_image = PillowImage.open(image_path)
    img_exif = pillow_img.getexif()

Now, all we have to do is loop through the tags we want to set (which are in the PILLOW_TAGS variable) and set them to the corresponding values in VALUES.

for tag, value in zip(PILLOW_TAGS, VALUES):
    img_exif[tag] = value

Finally, just save the changes to your image. For the purposes of this tutorial, we are saving the final images separate from the originals. When you update your EXIF data, feel free to overwrite the original image. You can always restore from the backup we made if needed.

output_file = img
pillow_img.save(output_file, exif=img_exif)

That’s all there is to it. When you put it all together, you have a nice, efficient, and compact block of code.

for img in images:
    image_path = “without-metadata/{}”.format(img)
    pillow_image = PillowImage.open(image_path)
    img_exif = pillow_img.getexif()

    for tag, value in zip(PILLOW_TAGS, VALUES):
        img_exif[tag] = value

    output_file = img
    pillow_img.save(output_file, exif=img_exif)

How to Edit EXIF Data with the Exif Library

Editing EXIF data with the Exif library is even easier than it is using Pillow. We’ll start by loading the image without the metadata into the Exif library. You can cut and paste this code from the script that reads the EXIF data. Just don’t forget to change the with-metadata folder to without-metadata.

for img in images:
    image_path = “without-metadata/{}”.format(img)
    with open(image_path, ”rb”) as input_file:
        exif_img = ExifImage(input_file)

Here’s where it gets really easy to edit the EXIF data and set new values. If you have a lot of EXIF data to edit, by all means put everything into a loop. However, for simplicity, you also do this.

exif_img.artist = “Matthew Gove
exif_img.copyright = “Copyright 2022 Matthew Gove. All Rights Reserved.”

Then save the file. Like we did with the Pillow library, we’ll save everything to a new file for purposes of the tutorial. However, feel free to overwrite the images when you use it in the real world.

output_filepath = img
with open(output_filepath, ”wb”) as ofile:
    ofile.write(exif_img.get_file())

Put it all together and you can update and edit your EXIF data with just 10 lines of Python code.

for img in images:
    image_path = "without-metadata/{}".format(img)
    with open(image_path, "rb") as input_file:
        exif_img = ExifImage(input_file)
    
    exif_img.artist = "Matthew Gove"
    exif_img.copyright = "Copyright 2022 Matthew Gove. All Rights Reserved."

    with open(img, "wb") as ofile:
        ofile.write(exif_img.get_file())

Confirming Your EXIF Edits Worked

The final step in editing your EXIF data is to confirm that the Python code actually worked. In the script, I copied logic from when we read the EXIF data to confirm that our edits were added and saved correctly. Indeed, when you run the script, you’ll see the following confirmation in the Terminal window. Alternatively, you can open the photo in any photo editor, such as Adobe Lightroom, to confirm that the new EXIF data has been added to it.

PILLOW
=======
baseball.jpg 
Artist: Matthew Gove
Copyright: Copyright 2022 Matthew Gove. All Rights Reserved.

cape-cod.jpg
Artist: Matthew Gove
Copyright: Copyright 2022 Matthew Gove. All Rights Reserved.

stonehenge.jpg
Artist: Matthew Gove
Copyright: Copyright 2022 Matthew Gove. All Rights Reserved.

################################

EXIF
======
baseball.jpg
artist: Matthew Gove
copyright: Copyright 2022 Matthew Gove. All Rights Reserved.

cape-cod.jpg
artist: Matthew Gove
copyright: Copyright 2022 Matthew Gove. All Rights Reserved.

stonehenge.jpg
artist: Matthew Gove
copyright: Copyright 2022 Matthew Gove. All Rights Reserved.

Download the Code in This Tutorial

You can download the code we wrote in this tutorial from our Bitbucket repository. Please feel free to play around with it and update it to suit your needs. If you have any questions, leave them in the comments below.

Conclusion

Python is an incredibly powerful tool to update and edit your EXIF data. And best of all, it’s one of the few EXIF editing tools that is completely free, without any restrictions on what you can do with it. It’s fast, easy-to-use, and infintely scalable. EXIF metadata is not the sexiest aspect of photography by any means. But it is one of the most critical. When you don’t manage it correctly, you are literally costing yourself both time and money.

If you want help getting started with your EXIF data, please get in touch with us today. As experts in both photography and data science, there are not many people who know the ins and outs of EXIF data better than we do. Alternatively, if you would just like to see more tutorials, I invite you to please join our email list and subscribe to our YouTube channel. See you in the next tutorial.

The post How to Bulk Edit Your Photos’ EXIF Data with 10 Lines of Python appeared first on Matthew Gove Blog.

How to Remove Noise from Photos with 14 Lines of Python…and Blow Lightroom Out of the Water

Matt Gove — Fri, 11 Feb 2022 16:00:00 +0000

As a photographer, you will run into the frustration of noise in their low-light photos and having to remove it at some point. It’s a much of a guarantee as taxes and death. No matter what you do in post processing, it seems like every adjustment you make only makes the noise in your photos worse. As a result, you only grow more frustrated.

Thankfully, we can turn to our secret weapon, Python, to remove noise from our photos. While it’s not well documented, Python has some incredibly powerful image processing libraries. With a proper understanding of the algorithms, we can use Python to remove nearly all the noise from even the grainiest of photos. And to put our money where our mouth is, we’re going to put our Python script up against Adobe Lightroom to see which one can better remove noise from photos.

The Problem with Noise in Low Light Photos

No matter your skill level, whenever you head out to take photos in low light, you probably dream of coming home with a photo that looks like this.

Post-Sunset Light at Arches National Park in Utah

Instead, you come home with a monstrosity like this.

Grainy Post-Sunset Light at Arches National Park in Utah

Despite these two pictures being taken with the same camera less than 20 minutes from each other, why did the first one come out so much better than the first? Yes, post-processing does play a small role in it, but the main culprit is the camera settings and the photo composition. No amount of post-processing can bring back lost data in a photo that’s incorrectly composed. Because the second photo is not correctly composed or exposed, much of the data in the bottom half of the frame is lost.

Let’s compare the two photos.

Parameter	First Photo	Second Photo
Time of Sunset (MST)	4:57 PM	4:57 PM
Photo Timestamp (MST)	5:13 PM	5:29 PM
Sun Angle	Sun Behind Camera	Looking into Sun
Shutter Speed	1/20 sec	1/10 sec
Aperture	f/4.0	f/5.3
ISO Level	800	800
Focal Length	55 mm	160 mm

I took both photos with my Nikon DSLR Camera

Poor Composition Leads to Noise in Photos

From the photo metadata, we can easily conclude that the difference between the two shots is indeed the composition. More specifically, it’s the sun angle. When you take a picture looking into the sun, it will be higher contrast than a picture that’s taken with the sun behind you. When taken to extremes, you can actually have data loss at both the dark and light ends of the spectrum.

And that’s exactly the result when you take the photo a half hour vs 15 minutes after sunset. Because the second photo looks into the sunset, being further past sunset exacerbates the increase in contrast. As a result, you need to choose whether you want to properly expose the dark land or the colorful sky. You can’t have both. On the other hand, the first photo is able to capture the full spectrum of light that’s available, resulting in the spectacular dusk colors.

What Causes Noise: A Crash Course in ISO Levels

The ISO level sets how sensitive you camera is to light. Most cameras set the ISO levels automatically by default. The more sensitive your camera is to light, the brighter your photos will be. Lower ISO levels result in sharper images, while high ISO levels lead to grain in your photos.

Exactly how much grain appears in your photos depends on your camera’s sensor size. Professional cameras with large sensors often don’t have much of an issue with grain. On the other end of the spectrum, small or entry-level cameras are much more sensitive to grain because their sensors are so much smaller. Tiny sensors are why cell phone cameras struggle so much in low light. The technology has certainly gotten better over the past five years, but it’s still far from perfect.

On a bright, sunny day, use low ISO levels to keep photos sharp and avoid overexposing them. Alternatively, use high ISO levels for low light or night photography. Under normal conditions, your ISO levels should be between 200 and 1600. However, ISO levels on some very high end cameras can reach as high as 2 million.

Even Professional Image Processors Like Adobe Lightroom Can Only Do So Much to Remove Noise from Your Photos

As powerful as Adobe Lightroom is, it has its limits. You can’t just blindly take photos in low light and expect to turn them into masterpieces with some combination of Lightroom and Photoshop. As we mentioned earlier, no amount of post-processing can recover lost data in your photos. It’s up to you to properly compose your photos and use the correct camera settings.

However, even with proper composition, professional image processors like Adobe Lightroom can only get rid of so much noise. Adobe Lightroom does a spectacular job getting rid of much of the noise in your photos, but you’ll eventually find yourself in a situation where there’s just too much noise for it to handle.

However, where Adobe Lightroom leaves off, our secret weapon takes over.

Python Has Powerful Image Processing Capabilities

It’s not well advertised, but Python has incredibly powerful image processing libraries that as a photographer, you can use to boost both your productivity and income. However, I want to caution that you should use Python as a tool to compliment Adobe Lightroom, not replace it. Being able to write your own scripts, functions, and algorithms in Python to add to the functionalities of Lightroom is incredibly powerful and will set you apart from just about every other photographer.

Indeed, I do my post processing with Adobe Lightroom. Afterwards, I use Python to format, scale, and watermark pictures that I post both to this blog and to the Matt Gove Photo website. When I used to write blog posts that had lots of pictures (and before I had Adobe), it often took me upwards of an hour or more to manually scale and watermark each image. Then I had to make sure nothing sensitive was being put on the internet in the metadata. That all can now be accomplished in just a few seconds, regardless of how many pictures I have. Furthermore, my Python script will automatically remove sensitive metadata that I don’t want out on the internet as well.

You may recall that in some of our previous Python tutorials, we have used the Python Imaging Library, or Pillow, to process photos. Today, we’ll be using the OpenCV library to remove noise from our photos.

How Python Algorithms Remove Noise From Photos

Whenever you write Python code, you should try to understand what built-in functions are doing. This will not only give you a better understanding of what your script is doing, but you will also write code that is faster and more efficient. That’s especially critical when processing large images that require lots of computing power.

Example: Removing Noise from COVID-19 Data

Before diving into our photos, let’s look at a very simple example of removing noise from a dataset. Head over to our COVID-19 dashboard and look at the time series plots of either new daily cases or new daily deaths. Without any noise removal, the plots of the raw data are messy to say the least.

Raw Curves of New Daily COVID-19 Cases in Several Countries

To smooth the data curves and remove noise, we’ll use a moving average. For every data point on the curve, we’ll calculate the average number of new daily cases over the previous seven days. You can actually average as many days as you want, but the industry standard for COVID-19 data is seven days. We’ll plot that 7-Day Moving Average instead of the raw data. The resulting curves are much cleaner and presentable.

New Daily COVID-19 Case Curves, using the 7-Day Moving Average to Remove Noise

People use moving averages more much more than just COVID-19 data. It’s often used to smooth time series in the Stock Market, scientific research, professional sports, and much more. And we’ll use that exact same concept to remove noise in our photos.

How to Average Values with Python to Remove Noise in Photos

There are several ways to go about doing this. The easiest way is if you take several versions of the same shot, lay them on top of each other, and average the corresponding pixels in each shot. The more shots you take, the more noise will be removed. To ensure that your scene does not shift in the frame as you take the shots, use a tripod.

Mathematically, noise is random, so averaging noise pixels will effectively remove the noise. The scene that is actually in your shot does not change, so the non-noise pixels should far outnumber the noise pixels when you calculate the average. As a result, calculating the average removes the noise.

Consider the following equations. For the sake of this argument, let’s say you’re looking at just a single pixel. The actual value of the pixel in the scene is 10. However, in four of your shots, noise is introduced, and the camera records values of 4, 15, 9, and 18. Remember that the average is the sum of the values divided by the number of values.

In your first attempt, you take 10 shots of the scene. How would you do in noise removal?

average = ((6*10) + 4 + 15 + 9 + 18) / 10 = 106 / 10 = 10.6

Not bad, seeing as the actual value of the pixel should be 10. But we can do better. Instead of taking 10 shots of the scene, let’s take 100 instead.

average = ((96*10) + 4 + 15 + 9 + 18) / 100 = 10.06

That’s much better. It may not seem like much, but even just a small change in value can make a big difference for removing noise.

What Does This Method of Removing Noise From Photos Look Like in Python

Believe it or not, we can write the “stacking average” algorithm to remove noise from photos in just 12 lines of Python. We’ll use numpy for the calculations because it can natively store, calculate, and manipulate grids or matrices of data with just a line or two of code. As a result, all of the photo data will remain in the grid or matrix of pixels we’re familiar with. We don’t need to break it down into rows, columns, or anything else.

First let’s make sure you have installed numpy and OpenCV. If you haven’t, you can easily install them with pip. Open a Terminal or Command Prompt and execute the following commands.

pip3 install numpy
pip3 install opencv-python

Next, it’s time to write our Python script. Let’s start by importing everything we need. The cv2 library we’re importing is part of OpenCV.

import os
import numpy as np
import cv2

Second, tell Python which folder the image you’ll be averaging are stored in. Then list their filenames, skipping any hidden files that are in the image directory.

folder = "noise-imgs"
image_files = [f for f in os.listdir(folder) if not f.startswith('.')]

Third, open and read the first image into the average variable using OpenCV. Store pixel data as a numpy floating point number. You’ll use this variable to store image data and calculate the average of the image pixels.

path = "{}/{}".format(folder, files[0])
average = cv2.imread(path).astype(np.float)

Fourth, add all of the remaining images in the directory to you average variable. Don’t forget to skip the first image because we’ve already added it in the previous step.

for f in files[1:]:
    path = "{}/{}".format(folder, f)
    image = cv2.imread(path)
    average += image

Fifth, divide your average variable by the number of images to calculate your average value.

average /= len(image_files)

Finally, normalize the averaged image and output it to a jpeg file. The cv2.normalize() function boosts the quality, sharpens the image and ensures the colors are not dark, faded, or washed out.

output = cv2.normalize(average, None, 0, 255, cv2.NORM_MINMAX)
cv2.imwrite("output.jpg", output)

That’s it. There are only 14 lines of code. It’s one of the easiest scripts you’ll ever write.

Example: Dusk in the Oregon Pines

We’ll throw our Python algorithm a tough one to start. Let’s use a photo of a stand of pine trees in Oregon taken at dusk on a cloudy winter night. Here’s the original.

Original low light photo has plenty of noise

The photo is definitely grainy and really lacks that really crisp sharpness and detail. I don’t know about you, but I’d be tempted to just throw it away at first glance. However, what happens if we let our Python algorithm have a crack at removing the noise from the photo?

Photo after removing the noise with our Python algorithm

That looks much better! I’ll admit, the first time I ran the algorithm, I was absolutely floored at how well it worked. The detail was amazingly sharp and crisp. I unfortunately had to shrink the final image above to optimize it for the web, so the detail doesn’t appear quite as well as it does in the original. For the ultimate test, we’ll put our Python algorithm up against Adobe Lightroom’s noise removal shortly.

A Travel Photography Problem: What Happens If You Only Have a Single Shot and It’s Impossible to Recreate the Scene to Get Multiple Shots?

Good question. This is a common problem with travel photography, and is why I always encourage you to take multiple shots of things while you’re traveling. You never know when you might need them. Unfortunately, the above method really doesn’t work very well in this case. However, there are other ways to remove the noise.

We’ll use the same strategy to remove the noise as we did for the COVID-19 data. But instead of averaging over the previous seven days, we’ll average each pixel or cluster of pixels with the pixels that surround it. However, there’s a catch, here. The more layers of pixels you include in your average, the less sharp your image will be. You’ll need to play around to see what the exact balance is for your specific photo, but the OpenCV documentation recommends 21 pixels.

Thankfully, the OpenCV library has this algorithm built into it, so we don’t need to write it.

cv2.fastNlMeansDenoisingColored(source, destination, templateWindowSize, searchWindowSize, h, hColor)

source The original image
destination The output image. Must be same dimensions as source image.
templateWindowSize Size in pixels of the template patch that is used to compute weights. Should be odd. Defaults to and is recommended to be 7 pixels.
searchWindowSize Size in pixels of the window that is used to compute weighted average for given pixel. It’s value should be odd. Defaults to is recommended to be 21 pixels
h Luminance component filter component. Bigger h value perfectly removes noise but also removes image details, smaller h value preserves details but also preserves some noise
hColor Color component filter component. 10 should be enough to remove colored noise and do not distort colors in most images.

When we run the image through the OpenCV algorithm, it outputs the following.

Sunset at Arches National Park After OpenCV Removed the Noise

The noise is certainly removed, but the image is still very dark and you can see the fuzziness around the edges. To sharpen the image back up, go back into Lightroom and find the original image. Remove as much of the noise from the original image as possible in Lightroom, and then export it. Next, average the OpenCV image and the Lightroom image using the stacking method from the previous section. That will both sharpen the image and brighten the colors.

Sunset at Arches National Park After Final Processing with Python Algorithm

That looks much better than the original photo. Other than a little touch-up post processing in Lightroom, that’s about as much as we can help this photo.

How Does Our Python Script Hold Up Against Adobe Lightroom’s Noise Reduction?

All right, it’s time for the ultimate test. It’s time to see how our Python algorithm does against Lightroom’s noise reduction capabilities. If you read the title of this post, you can probably guess how this went. Turns out it wasn’t even close. Our Python script blows Lightroom out of the water.

Left: Noise Removed from Photo with Adobe Lightroom
Right: Noise Removed from Photo with our Python Algorithm

Despite the results, I want to caution you that this comparison is a bit like comparing apples to oranges. Sticking with the pine tree reference, it would be like cutting a tree down with a chainsaw vs. a hand saw. Because Lightroom only has access to the single photo, it must use the algorithm that takes a cluster of pixels and averages the pixels surrounding it to remove the noise. Do you notice how much sharper the Python image is compared to the Lightroom image? It’s because our Python algorithm has far more resources available to it to remove the noise from the photo. 63 times the resources to be exact. That’s why it’s not exactly a fair comparison.

Lightroom vs. Python Comparison on a Level Playing Field

To level the playing field, forget about averaging over multiple photos to remove the noise. Let’s say we only took one photo of the pine forest in Oregon. As a result, we can use only the single original image. We’ll process it using the same method we did for the sunset at Arches National Park in the above section. When we put it up against Lightroom this time, it’s a much closer contest. However, I still give the edge to the Python algorithm because the final image is noticeably sharper.

Left: Noise Removed with Adobe Lightroom
Right: Noise Removed with Python, without averaging over multiple shots

Want to Try the Python Script Yourself?

If you want to try out any of the Python algorithms we covered in this post, please download the scripts for from out Bitbucket repository.

Conclusion

Removing noise from photos is an endless frustration for photographers at all skill levels. To add insult to injury, high-end image processors such as Adobe Lightroom can only do so much to remove noise. However, with mathematical knowledge of how noise works, we can write an algorithm that does even better than Lightroom. And best of all, it’s only 14 lines of Python code. You can actually apply these algorithms to videos as well, but that’s a discussion for another day.

However, even though we put our algorithm up against Lightroom, we mustn’t forget that as photographers and image processors, Python must be used as a tool to complement Lightroom, not replace it. Because when we pit the two against each other, it’s only us that suffer from reduced productivity and a less effective workflow. If you’d like to boost your productivity by adding Python to your photography workflow, but don’t know where to start, please reach out to me directly or schedule a free info session today. I can’t wait to see what the power of Python can do for you.

The post How to Remove Noise from Photos with 14 Lines of Python…and Blow Lightroom Out of the Water appeared first on Matthew Gove Blog.

How to Boost Your GIS Productivity with Python Automation in 5 Minutes

Matt Gove — Fri, 05 Nov 2021 16:00:00 +0000

Python Automation is one of the most powerful ways to improve your GIS workflow. In the past, many tasks in traditional GIS applications have had minimal support for writing your own code, and often required crude hacks to install obscure libraries.

As Python has rapidly grown in both functionality and popularity, it is now widely supported across, and even built into many GIS platforms. Adding Python scripting to your GIS workflow can accomplish tedious hours-long tasks in seconds. Full automation of your GIS processes with Python will free you up to focus the more important aspects of your project, regardless of what industry you’re in.

Automate Your Desktop GIS Application

Did you know that ESRI ArcGIS and QGIS are both written in Python? As a result, Python automation integrates effortlessly with both GIS platforms. The Python libraries for each platform are incredibly powerful, fast, and easy to use.

However, be aware that the Python libraries for ArcGIS and QGIS are specific to each platform. If you ever decide to change platforms, you’ll need to rewrite all of your Python scripts.

QGIS Window with Python Console

I recommend starting small to get your feet wet with GIS Python automation. Start by automating the symbology and color of your data before diving into data manipulations, calculations, and analysis. Then you can start tackling more complicated processes, such as file conversions, modifying layers, switching projections, and much more.

Automate Your Web-Based GIS Application

Automating web-based GIS applications with Python is not quite as seamless as with ArcGIS or QGIS. However, you can easily argue that it’s even more powerful. Web-based GIS applications are a bit more complicated than desktop-based platforms. In addition to the GIS software, you often need special servers and databases that are designed specifically for geospatial data.

Thankfully, this increased complexity also means that there are more opportunities for automation. I use Python automation on nearly all of my web-based GIS applications. I don’t have tutorials for all of these yet, but here are a few of my favorites.

Generate Region Mapping Files and/or Metadata
Set Up and Configure the Server
Create and Configure your Geodatabase or Data Repository
Data Entry and/or Analysis
Convert ESRI Shapefiles into a Web-Friendly Format
Mathematical Modeling

Python Automation Updates Our COVID-19 Dashboard Every Day

Remote Sensing Automation with Python

Most sensors these days come with Python libraries when you buy them. You should absolutely take advantage of those libraries. With Python, you can calibrate and configure the sensors exactly how you want them, not the way the manufacturer wants them.

In May of 2019, I installed sensors on the weather station I have at my house. The weather station runs on a network of Raspberry Pi’s. A Python script reads the data from each sensor, QA/QC’s it, and records it in the weather station’s database. If a sensor goes offline or makes a bad reading, the weather station pulls the data from the National Weather Service.

DIY Weather Station: Building a Solar Radiation Shield from Scratch
Wiring Power and Internet to the Sensors
Installing the Data Logger and Connecting the Sensors
Database Configuration and Programming the Sensor Readings
Troubleshooting a Sensor Gone Awry

https://youtube.com/watch?v=twZNWximYd0

Take your remote sensing automations even further. Use Python GeoPandas to plot your data on a map. Perform a high-level data analysis using pandas or matplotlib. You can easily automate the whole process or leave yourself as much manual control as you wish.

Data Entry Automation with Python

Without data, you don’t have a GIS application. You just have a map. Furthermore, geodatabases and data repositories come in all different shapes and sizes. Thankfully, Python can easily handle all of these data types and schemas thanks to its robust and dynamic data science libraries.

Python’s pandas library is one of the most powerful data analysis libraries available in any programming language. The fact that it’s free and open source is even more incredible, given how expensive licenses to proprietary software can be. pandas can handle just about any data format and size you throw at it.

However, pandas on its own does not support any geographical or location-based data. Enter Python’s GeoPandas extension of the pandas library. GeoPandas gives you the ability to analyze geospatial data and generate maps using the same tools you have in pandas. Easily populate a geodatabase or assemble a repository of any supported GIS format, including shapefiles, CSV, GeoJSON, KMZ, and much more. For more information, please visit our collection of GeoPandas tutorials.

Python GeoPandas can create beautiful maps without a GIS application

Data Analysis Automation with Python

With over 12 years of experience in professional data analysis, I know firsthand how tedious having repetitive tasks can be. Instead of the monotony of having to repeat those tasks over and over, why not automate them with Python? After all, Python developers created both pandas and matplotlib for that exact purpose. In the context of GIS, you can fully or partially automate many common tasks.

Repetitive tasks to prepare and/or format the data for analysis
Create maps of different areas using the same data parameters
Generate multiple maps of the same areas using different data parameters

Python has plenty of powerful data analysis libraries available for geospatial data

How to Trigger Your GIS Automation

To reach the nirvana of full automation, a Python script alone is not enough. You also need to automate the automation. Fear not, though, triggering your automation is the easy part. You have two options to choose from.

Trigger Your Automation to Run at a Set Time

The majority of GIS automations run at the same time every day. Our COVID-19 dashboard is the perfect example of this. We have a Python script that downloads the COVID-19 data from the Johns Hopkins University GitHub repository, parses it, and adds it to our database. Unfortunately, our web hosting plan does not allow us to fully automate the script, so we automate it on a local machine and then upload the database to the production server.

Scheduling the automation on your computer or server is quick and easy. On Linux and Mac OS, use the crontab command to schedule a cron job, which is a job that runs at fixed dates, times, and/or intervals. Alternatively, use the Task Scheduler on Windows. Both schedulers give you full flexibility to schedule jobs exactly when you want them to run.

Trigger the Script to Run When a Specific Event Occurs

Alternatively, not all jobs run at a specific time or interval. Have a look at the map of the Matt Gove Photo albums and videos. There is no logical need to run the job at a set time or interval. Instead, we update the map whenever we add a photo album or video to the database. As soon as the new album or video lands in the database, it automatically adds the data to the map.

In Python, the simplest way to trigger your GIS automation is a call to a function that runs the automation. For example, let’s look at the logic of adding photos and videos to the Matt Gove Photo map. In its simplest form, the logic for adding a photo album would look something like this.

# Add a Photo Album to the Database
add_photo_album_to_database(album_parameters)

# Once the database is updated, update the map
update_map(album_parameters)

This example is very oversimplified, but you get the point. For even finer control, use conditional logic and loops for precisely triggering your scripts exactly when you want.

Don’t Forget to Test Your Automation Scripts Before Putting Them into Production

We all make this mistake at one point or another. You beam with pride when you finish your automation script, and schedule it to run for the first time overnight. The next morning, you log in eagerly expecting to see the output of your automation. Instead, you see nothing, or even worse, you see an error message. You facepalm yourself because you forgot to test everything!

The best way to test your automation is to write a few quick unit tests once you finish your script. If you’re unfamiliar with a unit test, it tests individual units of code in your script. You tell the test the expected outcome for a given parameter, and then run that parameter through the block of code. If the script output matches the expected output, the test passes. If not, it fails.

For example, let’s say you programmed a calculator application. To set up a unit test for addition, execute 2 + 2 with the calculator, and see if you get 4. Repeat the process with unit tests for subtraction, multiplication, and division. The best part about unit tests is that you can run a lot of them in a short amount of time. If you’ve written them correctly, they’ll tell you exactly where in the script any problems are.

Use Creativity and Innovation in Your Python Automation

Once you get your feet wet with GIS automation using Python, keep automating. I encourage you to get creative and come up with new, innovative ways that will improve your workflow even further. The sky really is the limit when it comes to automation.

Conclusion

The days of managing bloated and complicated workflows with expensive software are a thing of the past. Python automation is the future, not just in GIS, but also in nearly every industry out there. Start out with simple tasks to whet your palette. Once you get a taste of it, don’t be afraid to tickle that creative or innovative itch we all have. You’ll be amazed at the amount of time and money it can save. Let us help you get started today.

Top Photo: View of Death Valley from Sea Level
Death Valley National Park, California – February, 2020

The post How to Boost Your GIS Productivity with Python Automation in 5 Minutes appeared first on Matthew Gove Blog.

How to Automate Region Mapping in TerriaJS with 49 Lines of Python

Matt Gove — Fri, 15 Oct 2021 16:00:00 +0000

A little over a month ago, we examined the benefits of using region mapping in your TerriaJS applications. Region mapping allows you to reduce your GIS application’s data usage by over 99%, permitting you to display massive datasets on two and three-dimensional maps that load quickly and are highly responsive.

Our COVID-19 Dashboard Map Makes Extensive Use of Region Mapping in TerriaJS

Despite the power and benefits region mapping offers, setting it up can be tedious, time consuming, and rife with type-o’s if you attempt to do it manually. This can be particularly frustrating if you have a large number of regions you want to map. Thankfully, there’s a much easier way. With just 49 lines of Python code, you can generate region maps of any size in just seconds, freeing up valuable time for you to focus on more important tasks.

What You’ll Need

You’ll need a few items to get started with your TerriaJS region mapping automation.

The ESRI Shapefile or GeoJSON file that you used to generate your Mapbox Vector Tiles (MVTs) for the region mapping
The Python script you’ll write in this tutorial
A Terminal or Command Prompt

What is a Region Mapping File

In TerriaJS, a region map consists of two JSON files.

The actual region map
The region mapping configuration file

The Python script we’re writing in this tutorial generates the actual region map. The region map tells TerriaJS the order that each polygon appears in the vector tiles. The configuration file instructs TerriaJS which vector tile parameters contain the region’s unique identifier, name, and more. You can easily write a short Python script to generate the configuration. However, unless you have a lot of region maps you’re generating, I find that the configuration is so short, it’s easier to do manually.

Convert all Shapefiles to GeoJSON Before Automating Region Mapping

If you have ESRI Shapefiles, convert them to GeoJSON before automating your region mapping. We do this for two reasons.

Python can read and parse GeoJSON files natively. To parse shapefiles, you’ll need a third-party library such as GeoPandas.
Mapbox’s Tippecanoe program, which generates the Mapbox Vector Tiles we use for region mapping, requires files to be input in GeoJSON format.

The GeoJSON should contain at least three properties for each feature. We covered these properties in detail in our previous article about region mapping, so I won’t repeat them here. You’re welcome to add as many properties as you want, but it should at the bare minimum contain the following three features.

A Feature ID, or FID.
A unique identifier you’ll use to identify the feature in the CSV data files you load into Terria.
The name of the feature

Most of our map tiles actually contain several unique identifiers that can be used. For example, countries actually have two sets of ISO (International Organization for Standardization) standards, plus a United Nations code that can be used to uniquely identify them. We can use any of these three, plus our own unique identifier to map them to our Mapbox Vector Tiles.

Country	ISO Alpha 2 Code	ISO Alpha 3 Code	UN Numeric Code
Australia	AU	AUS	036
Brazil	BR	BRA	076
Canada	CA	CAN	124
France	FR	FRA	250
Mexico	MX	MEX	484
New Zealand	NZ	NZL	554
United States	US	USA	840

Unique Country ID Examples

All right, let’s dive into the Python code.

First, Input Python’s Built-In `json` Library

The real magic of region mapping in TerriaJS is that all of the files are just JSON files, which are compatible with every popular programming language today. As a result, all we need to do the automation is Python’s json library. The json library comes standard with every installation of Python. We just need to import it. We will also be using Python’s os library to ensure the correct paths exist to output our region mapping files.

import json
import os

First, Write a Python Function to Extract the Feature ID from each Feature in the GeoJSON

This is because you need to go down several layers into the GeoJSON and we need this functionality a lot. It’s best practice to avoid string literals in programming, so we’ll use a function to extract the Feature ID from any given feature in the GeoJSON file.

def fid(f):
    return f["properties"]["FID"]

Define Your Input Parameters

We define the GeoJSON file name, the layer name in the TerriaJS region mapping configuration (regionMapping.json), and the property we’ll use as the unique identifier in our region mapping. Even if your MVT tiles have multiple unique identifier, you may only choose one for each region map. You’ll need to generate a second region map if you want to use multiple unique identifiers.

GEOJSON = "world-countries.geojson"
LAYER = "WORLD_COUNTRIES"
MAPPING_ID_PROPERTY = "alpha3code"
OUTPUT_FOLDER = "regionIds"

If you prefer not to have to manually edit the Python script every time you want to change files, you can easily update the Python script to extract this information from the filename or receive it through command line arguments.

Define the Output Path for Your Region Map File

I like to do this right away because it uses the input parameters. This both gets it out of the way so we don’t need to deal with it later and eliminates the need to scroll up and down looking for variable names if we were to define these at the end. We’ll structure our output paths so they can just be copied and pasted into Terria when we’re done. No renaming or reshuffling of files required.

geojson_filename = GEOJSON.replace(".geojson", "")
output_json_filename = "region_map-{}.json".format(geojson_filename)
output_fpath = "{}/{}".format(OUTPUT_FOLDER, output_json_filename)

In this example, the output filepath or our region mapping file will be regionIds/region_map-world-countries.json.

Read the GeoJSON into Python

Because GeoJSONs are just a specific type of JSON file, we can use Python’s built in json module to read and parse the GeoJSON file.

with open(GEOJSON, "r") as infile:
    raw_json = json.load(infile)

Sort the GeoJSON features by Feature ID

When you convert the GeoJSON to MVT tiles, Tippecanoe sorts the GeoJSON by Feature ID. For the region mapping to work correctly, the feature in the region map must appear in the exact same order as they appear in the Mapbox Vector Tiles. If they’re not in the exact same order, your data will be mapped to the wrong features in TerriaJS. What’s particularly insidious about this issue, is that it will appear as if your data is mapped to the correct county, even though it’s not.

When I first tried to set up region mapping for the World Countries vector tiles, I did not have the features in the right order. Data for the United States was mapped to Ukraine and labeled as Ukraine’s data. France’s data showed up in Fiji, and New Zealand’s data appeared to be Nepal’s. And that’s just to name a few. Except for a few countries at the top of the alphabet that start with “A”, this issue plagued every country. Once I sorted the GeoJSON features by Feature ID, the problem magically went away.

raw_features = raw_json["features"]
features = sorted(raw_features, key=fid)

Note here that we use the fid function we defined above to identify the Feature ID in the sorted() function.

TerriaJS Region Maps are just Arrays of Sorted Feature ID’s

To generate the region map for TerriaJS, all we need to do is just loop through the sorted features and create an array of the unique ID for each feature. Remember that the unique ID is different from the Feature ID. For world countries, TerriaJS comes with the Alpha 2 Codes built into it. For this tutorial, we’ll use the ISO Alpha 3 code as our unique identifier, which we defined in the MAPPING_ID_PROPERTY variable in the use input. If you’ve forgotten, the alpha 3 code is just a 3-letter code that identifies each country. For example, use “USA” for the United States, “CAN” for Canada, “JPN” for Japan, and so forth.

We’ll generate the region map with a simple for loop.

region_mapping_ids = []

for f in features:
    properties = f["properties"]
    region_map_id = properties[MAPPING_ID_PROPERTY]
    region_mapping_ids.append(region_map_id)

Assemble the full TerriaJS Region Mapping JSON in a Python Dictionary

TerrisJS region mapping files are breathtakingly simple. They require three parameters.

layer: The name of the layer in the TerriaJS Region Mapping configuration (regionMapping.json). We defined it above with the LAYER variable in the user input.
property: The name of the unique identifier in your vector tiles that you’ll use in the CSV files you load into Terria. For the world countries, we’re using the alpha2code identifier. We defined this in the user input using the MAPPING_ID_PROPERTY variable.
values: This is the the array of sorted feature ID’s we created in the previous section

It’s important to note that Python does not actually support JSON format. Instead, when you read a JSON file into Python, it actually converts the JSON into a Python dictionary. When we export the dictionary to JSON format, Python just simply does the conversion in the opposite direction.

Anyway, the TerriaJS region mapping JSON should look like this as a Python dictionary.

output_json = {
    "layer": LAYER,
    "property": MAPPING_ID_PROPERTY,
    "values": region_mapping_ids,
}

Also, don’t forget to create the output directory if it doesn’t exist. Your Python script will crash without it.

if not os.path.isdir(OUTPUT_FOLDER):
    os.makedirs(OUTPUT_FOLDER)

Finally, Write the TerriaJS Region Mapping JSON to a `.json` File

Thanks to Python’s json module, we can output the region mapping file with just a couple lines of code.

with open(output_fpath, "w") as ofile:
    json.dump(output_json, ofile, indent=4)

If you did everything correctly, the region mapping JSON you just created should look like this.

{
    "layer": "WORLD_COUNTRIES",
    "property": "alpha3code",
    "values": [
        "AFG",
        "ALB",
        "DZA",
        "AND",
        "AGO",
        "ATG",
        ...
    ]
}

But we’re not quite done, yet!

Add Your New TerriaJS Region Mapping to the Configuration File

The final step is to add your new region map to the TerriaJS Region Mapping configuration file. By default, the configuration file is located at data/regionMapping.json. However, if you can’t find it, it’s defined in the config.json file at the root of the Terria application. You are more than welcome to automate this, too, but I find that it’s so simple, it’s often easier to just do manually.

The configuration file instructs TerriaJS how to interpret each region map. You’ll need to include several parameters.

Parameter	Description
`layerName`	The name of the layer in the Mapbox Vector Tiles. If you can’t remember it, check your vector tiles’ metadata.
`server`	The URL from where your Mapbox Vector Tiles are served
`serverType`	The type of files the server is serving. For this tutorial, use `MVT`, which stands for “Mapbox Vector Tiles”.
`serverMinZoom`	The minimum zoom level of the vector tiles
`serverMaxNativeZoom`	The maximum zoom level the vector tiles support natively
`serverMaxZoom`	The maximum zoom level the vector tiles support
`regionIdsFile`	The path to the TerriaJS Region Mapping JSON we created in this tutorial
`regionProp`	The name of the property in the vector tiles that contains the unique identifier we’re using in the region map.
`aliases`	Any column headers in the CSV data file that TerriaJS should interpret as the unique identifier of your region map
`description`	A description of the region map
`bbox`	The bounding box of your vector tiles, in the format `[west, south, east, north]`
`nameProp`	The name of the property in the vector tiles that contains the name of each feature

Using the layer we defined in the LAYER variable as the JSON key, your regionMapping.json configuration file should look like the following.

{
    "regionWmsMap": {
        ...
        "WORLD_COUNTRIES": {
            "layerName": "world-countries",
            "server": "https://yourvectortileserver.com/worldcountries/{z}/{x}/{y}.pbf,
            "serverType": "MVT",
            "serverMinZoom": 0,
            "serverMaxNativeZoom": 3,
            "serverMaxZoom": 8,
            "regionIdsFile": "data/regionIds/region_map-world-countries.json",
            "regionProp": "alpha3code",
            "aliases": ["alpha3code", "country_id", "COUNTRY_ID"],
            "description": "World Countries",
            "bbox": [
                -179.99999999999999999999999,
                -85,
                179.999999999999999999999999,
                85
            ],
            "nameProp": "name"
        },
        ...
    }
}

Create a Dummy CSV File to Test it Out

The funnest part of any project is seeing it all come to life once you’re done. To do this, create a dummy CSV file to test that your region mapping actually works. Pick a bunch of countries at random, assign them some random values as a dummy parameter, and see if they show up on the map.

alpha3code	Value
USA	7
CAN	4
DEU	15
THA	12
AUS	9
BRA	3

To load the file into TerriaJS, just click on the upload button at the top of the workbench, next to the “Browse Data” or “Explore Map Data” button. If the region mapping is working properly, you should see your data appear appear on the map.

The Dummy CSV Data on a World Map

Conclusion

Region mapping is one of the most powerful and efficient ways to display enormous amounts of data in TerriaJS. Automating the region mapping process only saves you valuable time and makes your application even more powerful.

While manually region mapping feature sets such as the 50 US states or the roughly 200 world countries may initially seem manageable, it rapidly becomes a nightmare once you try to scale it up. Sticking just within the United States, what if instead of the 50 states, you were mapping America’s more than 3,000 counties? Our COVID-19 Dashboard map does just that. Or even worse, the United States has over 41,000 postal codes and more than 73,000 census tracts. Can you imagine having to assemble those region maps manually, or the opportunity for type-o’s manually entering tens of thousands of data points?

Instead, save yourself the time, money, and hassle. We’ve made the Python script available for free on Bitbucket so you can configure region maps of any size in TerriaJS in just seconds. And if you ever run into issues, we’re always here to help you with any questions you may have. Happy mapping!

Top Photo: Snow-Capped Sierra Nevada Provide a Spectacular Backdrop for Lake Tahoe’s Brilliant Turquoise Waters
Meeks Bay, California – February, 2020

The post How to Automate Region Mapping in TerriaJS with 49 Lines of Python appeared first on Matthew Gove Blog.

13 Stunning Examples Showing How Easy It Is to Spread Disinformation without Manipulating Any Data

Matt Gove — Fri, 30 Jul 2021 16:00:00 +0000

The spread of disinformation and fake news seems like it’s about as American as apple pie these days. As a data scientist, it’s beyond horrifying watching so much disinformation rip through every facet of our society like wildfire. Sure, you grow to expect it from the idiots on the internet. But the fact that it now dominates everything from the news media to our education system to our jobs? That’s much more concerning.

Before we get too far, I want to say that the content of this post is designed for educational purposes only. I do not endorse the spread of disinformation or any conspiracy theories in any way. You should always back up your arguments with strong logic and easily-verifiable facts.

Recent statistics about disinformation over the past year or two are eye opening.

67% of Americans have interacted with disinformation or face news on social media.
56% of Facebook users cannot identify fake news that aligns with their own beliefs.
Less than 30% of American adults trust the news media.
In the third quarter of 2020 alone, Facebook saw over 1.8 billion engagements with fake news.

And that’s not even the tip of the iceberg.

How Do We Create and Spread Disinformation?

Sadly, it’s far too easy to create, publish, and spread disinformation these days. There is an endless list of different methods to create disinformation, but here are a few of the more popular ones.

Manipulating Data or Statistics
Using Logical Fallacies
Making an argument that uses flawless logic, but the statements that are input into the argument are false
- Example: Rocks are vegetables. I like to eat vegetables. Therefore, I like to eat rocks.
Injecting technical jargon and fancy words into a statement that is otherwise complete BS
Just making something up off the top of your head.

One of My First Memorable Encounters with Real World Disinformation

One my first encounters with disinformation in the “real world” came after graduating into the teeth of the Great Recession in 2009. Like so many people at the time, I struggled mightily to find work. As the election season began heating up, it was quite clear that Republicans were going to do very well in the 2010 midterms. At the time, Democrats controlled the House, the Senate, and the White House. The economic recovery was moving painfully slowly, and unemployment remained stubbornly high.

Then, all of a sudden, shortly before the 2010 midterms, the unemployment rate mysteriously dropped, and it dropped a lot. What happened? Was the recovery finally kicking into high gear? Not really. Turns out, the number of unemployed people hadn’t really changed at all.

Instead, the Obama administration had decided that they didn’t like the optics of high unemployment levels, so they changed how the unemployment rate was calculated so it looked lower than it actually was. Long term unemployment was a particular problem coming out of the Great Recession, so they simply stopped including the long-term unemployed when they calculated the unemployment rate. Thankfully, the media called them out on it. As a result, the different methods of calculating the unemployment rate became much more transparent.

The Most Insidious Way to Spread Disinformation: A Look at the 2020 Election and the COVID-19 Pandemic

Today, we’re going to look at one of the most subtle, insidious, and incredibly effective ways to spread disinformation. You don’t need to manipulate any data or statistics. Nor do you need to tie yourself in knots using pretzel logic to make your argument.

Indeed, all you need to use is a little equivocation. When you equivocate, you tell part of the truth, but not the whole truth. The part of the truth you don’t want revealed is usually obfuscated in vague language. When done effectively, you’re not telling the whole truth, but you’re not telling a bold-faced lie, either.

Disinformation Spread in the 2020 Election: It All Starts with a Simple Map

Take yourself back to election night. You’ve cast your vote, and it’s time to sit down and watch the election returns. Regardless of which TV network or website you’re watching, they’re filling in this map.

On the surface, this map looks completely harmless. More importantly for the TV networks, their audience understands this map without needing any explanation.

In reality, this map is one of the most misleading ways to present election returns that exists. It infuriates me to no end that people still use it. One of the most common arguments I hear from people who look at this map is that there is so much more red than blue on the map, there is no possible way Trump lost the election.

There’s A Lot This Map Does Not Show

It’s true, there is far more red than blue on the map. And that’s exactly why the map is so misleading. To pop holes in that argument, let’s look at what the map shows and what it doesn’t show.

What the Map Shows

The winner of each county

What the Map Doesn’t Show

How many votes were cast
The population of each county
The margin of victory
The percentage of the vote each candidate received

To further show how useless that map is, let’s compare it to the results of the 2016 election. Recall that in the 2020 election, Biden won 306-232 in the Electoral College. In 2016, Trump won by that exact same margin. Now compare the two maps using the slider. Can you easily tell which candidate won?

2016

2020

Not only can you not easily tell which candidate won, the 2016 and 2020 maps are practically identical. The only county with any significant population that changed colors between the two elections was Maricopa County in Arizona. This map has played a significant role in Maricopa County being the target of so many election-related conspiracy theories.

Introduce Population and Vote Tallies into the Map to Improve It

In order to better present the election results, you’ll need to incorporate at least one of either population or number of votes cast. Ideally you can incorporate both. First, let’s look at map of population by county.

If you overlay the population map on either map of election results above, you should notice a very distinct correlation. The Democrat candidate won the more populous counties almost exclusively. When you have such a perfect correlation, it means that you have figured out which statistic is skewing the data on your maps and leading to the spread of disinformation.

So exactly how do we show population on our map? The easiest way is to put a colored dot inside each county instead of shading the entire county. Then scale the diameter of the dot based not on population, but instead on the number of votes cast for the winning candidate. Like our choropleth map, the dots be shaded blue or red to indicate which candidate won.

It’s not perfect, but it gives a much more accurate picture of the 2020 election results.

For comparison, here’s the same map for the 2016 election.

But Wait, Trump Won the 2016 Election 306-232. This Map Doesn’t Reflect That!

Good catch! You’re partially correct. Trump did win the 2016 election 306-232. And the 2016 map does show a lot more blue on it. So what gives? Trump won the Electoral College vote in 2016, but Hillary Clinton won the popular vote. The election maps with the scaled dots on them reflect the popular vote, not the Electoral College vote.

Vote	Donald Trump	Hillary Clinton
Electoral College	306	232
States Won	30	20, plus DC
Total Votes Cast	62.9 million	65.8 million
Percentage of Vote	46.1%	48.2%

2016 Election Voting Statistics

A Look at 2004: The Most Recent Election the Republican Candidate Won the Popular Vote

The 2004 presidential election marks the only time in recent history that the Republican Candidate won the popular vote. In 2004, President George W. Bush won both the Electoral College (286-251) and 50.7% of the popular vote (62 million to 59 million). Our map does correctly indicate that Bush won the popular vote that year.

So Can We Create an Electoral College Map That Does Not Spread Disinformation?

Because the Electoral College is a state-level process, it’s impossible to do so at the county level. However, we can recreate the map using scaled dots to represent the Electoral College. Like the county-level choropleth maps, population skews the Electoral College choropleth maps, leaving the ripe for the spread of disinformation as well.

2020

2016

Can Any Maps Debunk the Spread of Election Disinformation and Conspiracy Theories?

Maps can certainly explain what happened in Trump’s rise to power in 2016 and Biden’s triumph in 2020. Unfortunately, people that believe in conspiracy theories are often so irrational, it’s unlikely to convince them.

To show what led to Trump’s rise as well as his demise, let’s brainstorm a few changes we may want to look at when comparing the 2020 election to 2016.

Demographics
Voter behavior
Candidate popularity
Voter turnout

To save you the hassle, we’re going to look at the total voter turnout between the two elections, as well as who those voters were voting for. We’ll do this for each county. The math is simple, just addition and subtraction.

total_vote_difference = total_votes_2020 - total_votes_2016
dem_vote_difference = dem_votes_2020 - dem_votes_2016
rep_vote_difference = rep_votes_2020 - rep_votes_2016

To determine which candidate gained the most ground, simple compare the Democrat vote differences to the Republican vote differences.

vote_difference = dem_vote_difference - rep_vote_difference

If vote_difference is a positive number, it means the Democrats gained votes. If it’s negative, the Republicans gained votes. The larger the magnitude of vote_difference, the bigger those gains were.

Let’s Look at Maps of Vote Gains

Let’s look at those maps. In addition to comparing 2020 to 2016, I’ve included a map that compares 2016 to 2012.

Let’s also look at total voter turnout in each county.

There are a few conclusions I can draw from these maps to combat disinformation and conspiracy theories.

Metric	2020	2016
Voter Turnout	Trump was so polarizing, he turned out massive numbers of voters on both sides.	Many “on-the-fence” voters, especially those that lean Democrat, stayed home for various reasons.
Candidate Popularity	As ferociously devoted as Trump’s base was, Democrat voters hated him even more.	Both candidates were wildly unpopular. Many voters felt Trump was the lesser of two evils.
Independents	The independents that went for Trump in 2016 turned on him in 2020. Many moderate Republicans voted for Biden, too.	Many independents, especially across the Rust Belt, voted for Trump. The numbers out of Detroit are particularly fascinating.
Suburban Voters	Suburban voters revolted against Trump. There are huge Democratic gains in nearly every major city	Dem-leaning suburban voters stayed home or went for Trump, particularly in Detroit and Milwaukee.
Where The Election Flipped	Large Democratic turnout in 6 metropolitan areas won the Election for Biden: Philadelphia, Pittsburgh, Detroit, Atlanta, Phoenix, and Milwaukee	Rust belt voters that felt abandoned by Obama came out in droves for Trump, and flipped Pennsylvania, Ohio, Michigan, and Wisconsin, a total of 64 Electoral Votes
Florida	Trump picked up significant votes in Miami-Dade County (likely Cuban Americans voting against socialism), giving him a comfortable win in the state.	The Interstate 4 Corridor (Tampa to Daytona) that delivered the state to Obama in 2012 swung significantly back to the right and went for Trump.

All right, enough about the election. Let’s move on and look at some COVID-19 data.

The COVID-19 Pandemic: A Stunning Exercise in the Spread of Disinformation

If there’s anything that’s torn through the United States faster than COVID-19 itself, it’s the disinformation associated with it. No matter what facet of the pandemic we’re talking about, we cannot agree with our fellow Americans on anything.

Want to know what’s even more frightening? It’s even easier to spread disinformation about COVID-19 than it is about the election. And we don’t have to worry about the election putting us in the hospital or killing us.

The Default COVID-19 Maps are Plagued by the Same Population Issue the Election Is

By default, most media outlets show new daily COVID-19 cases by either state or county. While that’s perfectly fine if that’s what you’re looking for, it is a terrible map if you’re trying to identify hot spots. Here’s a recent map of new daily COVID-19 cases in the United States. Take a guess as to where the hottest spot for COVID-19 is.

New Daily COVID-19 Cases in the United States – 18 July, 2021

Looking at this map, you’ll likely identify two hotspots: Florida and the Southwest. Yes, COVID-19 is raging in Florida, Los Angeles, and Las Vegas, but neither of those spots is where the worst outbreak is. And where is that outbreak right now? It’s in Missouri and Arkansas, but you wouldn’t know it looking at this map.

Color Schemes: The Most Insidious Way to Spread Disinformation

The color bar on any map seems innocent enough. Its primary purpose it to make your map look really good. How bad can it be?

Turns out, the color scheme is particularly deceptive. You don’t need to do anything to the actual data. Nor do you need to twist yourself up in pretzel logic just to make your point. Even worse, people choose bad color schemes accidentally all the time, spreading disinformation without even realizing it.

While there are all kinds of ways to manipulate the color bar, here are the three most common.

Change the Upper and/or Lower Limits of Your Color Bar

Look at the map of new daily cases above. The data range goes from 0 to 1,462 new daily cases. Now what would happen if I increased the upper limit by an order of magnitude, from 1,500 to 15,000? All of the counties would be shaded either white or very light green, and it would look like there’s no COVID-19 at all.

Conversely, what if I reduced the upper limit from 1,500 down to 5? It would look like the world is about to end, with COVID-19 spreading everywhere. That’s clearly not an accurate representation of what’s going on, either.

Don’t forget, both maps show the exact same dataset. All we did was change the color bar.

Change the Break Points of Your Color Bar

By default, most mapping and GIS programs default to breaking the color bar up in even increments or so that points are distributed evenly throughout the color bar. While neither is perfect, they work well is many cases.

Now let’s take this to the extreme. For this example, you’re a corrupt leader who wants to publish a map showing no COVID-19, despite the fact that it’s raging in your area. Using the same 0 to 1,500 scale, you set the first section of the color bar to cover 0 to 1,300. The remaining colors are set in increments of 50: 1,301 to 1,350; 1,351 to 1,400, and so forth.

That map makes it look like there is basically no COVID-19 spreading in the United States.

Alter the Number of Breaks in Your Color Bar

While there are certainly isolated circumstances when you want to increase the number of breaks, this method is far more effective when you reduce the number of breaks in your color bar. In our original map, there are 7 breaks for a total of 8 colors.

Now, let’s reduce the color bar from 8 colors to 2. The light yellow color will cover 0 to 750 new cases per day. Likewise, the dark blue color will cover 751 to 1,500 new daily cases.

As for the result? Once again, it looks like there is no COVID-19 in the United States. On other days, though, some areas that are raging look like there’s nothing there. At the same time, other areas that do not have a problem look like COVID-19 is exploding out of control. Talk about disinformation!

I Shouldn’t Give You Any More Ideas to Spread Disinformation, But…

I know what you’re thinking. There’s no way people can so blatantly manipulate the color bar and get away with it. Your intuitions are correct, but those examples we just looked at are extreme examples.

You can easily combine these methods to much more subtly mislead your audience. There are also plenty of other ways to mess with the color scheme that I haven’t touched on here. One easy way is to invert the colors. You can also use an illogical progression of colors throughout the color bar.

This is why when you look at any kind of figure, you should always verify both the color scheme and its limits before you make any assumptions about it. All it takes is a quick glance at the legend.

Use Logarithmic Scaling to Reduce Color Bar Manipulation

So is there anything we can do to reduce such easy color bar manipulation? If you’re dealing with a large range of data, use logarithmic scaling. For those of you who are unfamiliar with the logarithmic scale, it’s simple.

Instead of incrementing your axis in multiples of a number, you’re incrementing it by powers of that number. For example, a linear scale using multiples of 10 would be 10, 20, 30, 40, 50, 60, and so on. A logarithmic scale using powers of 10 would be 1, 10, 100, 1,000, 10,000, 100,000, 1,000,000, and so on.

Why a logarithmic scale? First off, it has preset intervals, so it’s very difficult to subtly alter the breaking points in your color bar. The logarithmic scale’s preset intervals also limit or prevent the data from shifting if you change the limits of the color bar. For example, on the COVID-19 map, 400 new daily cases will fall in the 100 to 1,000 section, no matter how high I set the upper limit of the color bar.

What Color Scale Do I Use?

On my COVID-19 Dashboard Map, I use a hybrid logarithmic scale. It’s simply a logarithmic scale with breaks half way through each section of the scale. So instead of break points being at 1, 10, 100, 1,000, and so forth, they are at 1, 5, 10, 50, 100, 500, 1,000, 5,000, and so on.

The reason I chose a hybrid logarithmic scale is because the data range was not big enough to use a straight logarithmic scale. As a result, the map would have been too misleading, and would not have accurately shown areas where COVID-19 is surging.

Look at Other Parameters to Counter Disinformation

Listen to your gut. If it’s telling you a map or figure is misleading, it likely is. Regardless if you’re looking at published map or creating a map to publish, look at other parameters in the same dataset. The more parameters that backup your reasoning, the stronger your argument will be.

Normalize the Data by Population

In our COVID-19 dataset, the easiest way to get around the population issue is to normalize the data by population. Instead of the raw number of new daily cases, plot the number of new daily cases per million people.

New Daily COVID-19 Cases per 1 Million People – 18 July, 2021

That’s a big step in the right direction. You can at least see the big outbreak of cases in Missouri and Arkansas. However, Florida is also getting hit very hard right now, and this map makes Florida look a lot better than it actually is.

14-Day Change in New Daily Cases

Next up, let’s look at the two-week change in new daily cases. It’s a great map for identifying which way cases are trending, but it can be very misleading if you don’t know how to interpret it.

For example, if a county has just peaked and is starting to decline, the county will show bright green. Woo-hoo, right! Not so fast. You’re just past the peak. COVID-19 is still raging.

Here’s what the recent map looks like.

14-Day Change in New COVID-19 Cases – 18 July, 2021

You should never rely on this map alone to make any decisions related to COVID-19. When you start analyzing the map, keep in mind that this map only shows the trends. It does not show how much COVID-19 is in the counties. Look at Massachusetts. It looks like it’s in worse shape than Missouri and Arkansas.

The map doesn’t show that Massachusetts has incredibly low case loads because it’s the most vaccinated state in the country. On the other hand, Missouri and Arkansas have some of the lowest vaccination rates in the country, which is why the Delta variant is ripping through their communities at such an astonishing rate.

Active Cases Per Million People

The number of active cases per million people looks very similar to the new daily case loads per million people. As a result, you can see the big surge in Missouri and Arkansas, but the surges in both Florida and Las Vegas are lost in the noise.

Active COVID-19 Cases per 1 Million People – 18 July, 2021

Odds Any One Person You Interact With in Public is Infected

When I drove across the country at the peak of the COVID-19 pandemic last winter, I wanted to minimize my risk of contracting the virus. Calculating the odds that any one random person you cross paths with is a great way to do that. All you need to do is divide the number of active cases by the population.

Again, it’s plagued by the same issue. You can see the big COVID-19 outbreak in Missouri and Arkansas. However, it doesn’t pop off the page and instantly draw your eye to it. Nor can you really see the ongoing surges in Florida or Las Vegas.

Odds Any 1 Random Person is Infected with COVID-19 – 18 July, 2021

None of These Plots Show Hot Spots Well. What Now?

I know what you’re thinking. You just spent this entire post explaining how easy it is do spread disinformation through color bar manipulation. You can’t be about to suggest it now just to show where the COVID-19 outbreaks are.

Rest assured, we will not be doing anything to the color bars. Doing otherwise is flat out hypocritical. Instead, we can use Matt’s Risk Index. The index is essentially a weighted average of all of the parameters we just looked at. It’s designed to make hot spots and high-risk areas really jump off the page. If you’re interested in the math behind Matt’s Risk Index, we discussed it in detail when I first unveiled the index last winter.

Before looking at Matt’s Risk Index, recall where the hot spots in the United States are right now.

Missouri and Arkansas
Florida Peninsula
Clark County, Nevada (Las Vegas)
Los Angeles County, California

LA County’s huge population likely keeps its risk level quite low for now, but the other three areas should leap off the page when you look at Matt’s Risk Index.

Matt’s COVID-19 Risk Index – 18 July, 2021

The Matt’s Risk Index map also seems to confirm health officials’ concerns that the southeast US is at very high risk for a Delta variant surge. Louisiana, Mississippi, Alabama, and Tennessee are some of the least vaccinated states in the country, and there are significant outbreaks of the Delta variant on either side of them right now.

My Favorite Example: Georgia’s Stunningly Boneheaded Decision to Spread COVID-19 Disinformation

What goes through the minds of some people when they make graphics is beyond me. In May, 2020 the Georgia Department of Health tried to make its argument to its citizens that it was okay to reopen everything and resume our normal day-to-day lives. COVID-19 was a thing of the past.

To support their argument, the State of Georgia published a chart that at first glance showed steadily declining COVID-19 cases. Unfortunately, when you took a closer look, one small problem appeared. The dates were in the wrong order.

Where does Sunday take place twice a week? And May 2 come before April 26?
The State of Georgia, as it provides up-to-date data on the COVID-19 pandemic.
In the latest bungling of tracking data for the novel coronavirus, a recently posted bar chart on the Georgia Department of Public Health’s website appeared to show good news: new confirmed cases in the counties with the most infections had dropped every single day for the past two weeks.
In fact, there was no clear downward trend.
Atlanta Journal Constitution

You can read the full story from the Atlanta Journal Constitution.

Thankfully, Governor Brian Kemp’s office quickly fixed the error as soon as they got called out for spreading disinformation. But there is no reasonable excuse at all to be publishing that garbage in the first place, let alone the middle of major public health emergency.

Not surprisingly, the late night comedians had a field day with it.

Data and Source Code That Generates the Maps in This Post

I believe in transparency, especially when it comes to the spread of disinformation. You can find the Python code and the data that is used to generate every map in this post in our Bitbucket Repository.

Data Sources

Dataset	Source
County Presidential Election Results	MIT Election Data and Science Lab
Electoral College Results	US Federal Government National Archives
COVID-19 Data	Queried from our COVID-19 Dashboard database, which gets its data from Johns Hopkins University

Conclusion

In today’s era of disinformation, it’s shockingly easy to spread disinformation. Maps are one of the easiest, subtlest, and most effective ways to spread a wealth of disinformation. The double-barreled combination of the 2020 Election and the COVID-19 pandemic hit the United States with a tsunami of stupidity that has proven time and time again to have deadly consequences.

Thanks to data gurus around the world, disinformation is being called out more than ever before. Armed with the proper knowledge and logic, you can easily recognize, call out, and disprove disinformation. Today, I ask you for one small favor. Reach out to your favorite data guru, and express your appreciation for their work. Follow them on social media, donate some money to their cause, or simply thank them for their efforts. It’s a small gesture that can make a big impact both in your world and theirs.

Top Photo: The Snow-Capped Sierra Nevada Provide a Stunning Backdrop to a Beautiful Winter Day at Lake Tahoe
Glenbrook, Nevada – February, 2020

The post 13 Stunning Examples Showing How Easy It Is to Spread Disinformation without Manipulating Any Data appeared first on Matthew Gove Blog.

Python Tutorial: How to Create a Choropleth Map Using Region Mapping

Matt Gove — Fri, 23 Jul 2021 16:00:00 +0000

Several weeks ago, you learned how to create stunning maps without a GIS program. You created a map of a hurricane’s cone of uncertainty using Python’s GeoPandas library and an ESRI Shapefile. Then you created a map of major tornadoes to strike various parts of the United States during the 2011 tornado season. You also generated two bar charts directly from the shapefile to analyze the number of tornadoes that occurred in each state that year. However, we did not cover one popular type of map: the choropleth map.

2011 Tornado Tracks Across Dixie Alley

Today, we’re going to take our analysis to the next level. You’ll be given a table of COVID-19 data for each US State in CSV format for a single day during the COVID-19 pandemic. The CSV file has the state abbreviations, but does not include any geometry. Instead, you’ll be given a GeoJSON file that contains the state boundaries. You’ll link the data to the state boundaries through a process called region mapping and create a choropleth map of the data.

Why Do We Use Region Mapping to Create Choropleth Maps?

The main reason we use region mapping is for performance. When you use region mapping, you only need to load your geometry once, regardless of how many data points use that geometry. Each data point uses a unique identifier to “map” it to the geometry. You can use the ISO state or country codes, or you can make your own ID’s. Without region mapping, you need to load the geometry for each data point that uses it.

To show you the performance gains, let’s use COVID-19 data as an example. In our COVID-19 Dashboard’s Map, you can plot data by state for several countries. For Canada, the GeoJSON file that contains the provincial boundaries is 150 MB. We’re roughly 500 days into the COVID-19 pandemic. A quick back-of-the-envelope calculation shows just how much data you’d need to load without region mapping.

data_load_size = size_of_geojson * number_of_days
data_load_size = (150 MB) * (500 days)
data_load_size = 75,000 MB = 75 GB

Keep in mind, that 75 GB is just for the provincial boundaries. It does not include any of the COVID-19 data. And it only grows bigger and bigger every day.

Region Mapping and Vector Tiles Allow Us to Load Canada’s Provincial Boundaries into our COVID-19 Map using Less Than 2 MB of Data.

Using region mapping, you only need to load the provincial boundaries once. With the GeoJSON file, that’s only 150 MB. In our COVID-19 map, we actually take it a step further. Instead of GeoJSON format, we use Mapbox Vector Tiles (MVT), which is much more efficient for online maps. The MVT geometry for the Canadian provincial boundaries is only 2 MB. Compared to possibly 75 GB of geometry data, 2 MB wins hands down.

What is a Choropleth Map?

A choropleth map displays statistical data on a map using shading patterns on predetermined geographical areas. Those geographic areas are almost always political boundaries, such country, state, or county borders. They work great for representing variability of a given measurement across a region.

A Sample Choropleth Map Showing New Daily Worldwide COVID-19 Cases on 14 July, 2021

An Overview of Creating a Choropleth Map in Python GeoPandas

The process we’ll be programming in our Python script is breathtakingly simple using GeoPandas.

Read in the US State Boundaries from the GeoJSON file.
Import the COVID-19 data from the CSV file.
Link the data to the state boundaries using the ISO 3166-2 code (state abbreviations)
Plot the data on a choropleth map.

Required Python Dependencies

Before we get started, you’ll need to install four Python modules. You can easily install them using either anaconda or pip. If you have already installed them, you can skip this step.

geopandas
pandas
matplotlib
contextily

The first item in our Python script is to import those four dependencies.

import geopandas
import pandas
import matplotlib.pyplot as plt
import contextily as ctx

Define A Few Constants That We’ll Use Throughout Our Python Script

There are a few values we’ll use throughout the script. Let’s define a few constants so we can easily reference them.

GEOJSON_FILE = "USStates.geojson"
CSV_FILE = "usa-covid-20210102.csv"

# 3857 - Mercator Projection
XBOUNDS = (-1.42e7, -0.72e7)
YBOUNDS = (0.26e7, 0.66e7)

The XBOUNDS and YBOUNDS constants define the bounding box for the map, in the x and y coordinates of the Mercator projections, which we’ll be using in this tutorial. They are not in latitude and longitude. We’ve set them so the left edge of the map is just off the west coast (~127°W) and the right edge is just off the east coast (~65°W). Likewise, the top of the map is just above the US-Canada border (~51°N), and the bottom edge is far enough south (~23°N) to include Florida peninsula and the Keys.

Read in the US State Boundaries Using GeoPandas

GeoPandas is smart enough to be able to automatically figure out the file format of most geometry files, including ESRI Shapefiles and GeoJSON files. As a result, we can load the GeoJSON the exact same way as we loaded the ESRI Shapefiles in previous tutorials.

geojson = geopandas.read_file(GEOJSON_FILE)

Read in Data From the CSV File

You may have noticed that we did not import Python’s built in csv module. That was done intentionally. Instead, we’ll use Pandas to read the CSV.

On the surface, it may look like the main benefit is that you only need a single line of code to read in the CSV data with Pandas. After all, it takes a block of code to do the same with Python’s standard csv library. However, you’ll really reap the benefits in the next step when we go to map the data to the state boundaries.

data = pandas.read_csv(CSV_FILE)

Map the CSV Data to the State Boundaries in the GeoJSON File

When you read the GeoJSON file in with the geopandas.read_file() method, Python stores it as a Pandas DataFrame object. If you were to read in the CSV data using Python’s built-in csv library, Python would store the data as a csv.reader object.

Here’s where the magic happens. By reading in the CSV data with Pandas instead of the built-in csv library, Python also stores the CSV data as a Pandas DataFrame object. If we has used Python’s built-in csv library, mapping the CSV data to the state boundaries would be like trying to combine two recipes, where one was in imperial units, and the other was in metric units.

The Pandas developers built the DataFrame objects to be easily split, merged, and manipulated, which means that once again, we can do it with just a single line of code.

full_dataset = geojson.merge(data, left_on="STATE_ID", right_on="iso3166_2")

Let’s go over what that line of code means.

geojson.merge(data, ... ): Merge the CSV data store in the data variable into the US State boundaries stored in the geojson variable.
left_on="STATE_ID": The property that contains the common unique identifier in the GeoJSON file is called STATE_ID.
right_on="iso3166_2": The property (column) that contains the corresponding unique identifier in the CSV data is called iso3166_2.

The ISO 3166-2 Code: What’s in the Mapping Identifier?

In this tutorial, we’re using each state’s unique ISO 3166-2 code to map the CSV data to the state boundaries in the GeoJSON. So what exactly is an ISO 3166-2 code? It’s a unique code that contains the country code and a unique ID for each state. The International Organization for Standardization, or ISO, maintains a standardized set of codes that every country in the world uses.

In many countries, including the United States and Canada, the ISO 3166-2 codes use the same state and province abbreviations that their respective postal services use. As you’ll see in the table, though, not all countries do.

ISO 3166-2 Code	State/Province	Country
US-CA	California	United States
US-FL	Florida	United States
US-NY	New York	United States
US-TX	Texas	United States
CA-BC	British Columbia	Canada
CA-ON	Ontario	Canada
AU-NSW	New South Wales	Australia
AU-WA	Western Australia	Australia
ZA-MP	Mpumalanga	South Africa
IT-BO	Bologna	Italy
RU-CHE	Chelyabinskaya Oblast	Russia
IN-MH	Maharashtra	India
TH-50	Chaing Mai	Thailand
JP-34	Hiroshima	Japan
FR-13	Bouches-du-Rhône	France
AR-X	Córdoba	Argentina
KG-C	Chuy	Kyrgyzstan

Sample ISO 3166-2 Codes from Various Countries

Write a Function to Generate a Choropleth Map

Once the CSV data has been successfully linked to the state boundaries in the GeoJSON, everything is stored in a single Pandas DataFrame object. As a result, the code to plot the data will be nearly identical to the maps we created in previous GeoPandas tutorials.

Like the tornado track tutorial, you’ll be creating several different maps. To avoid running afoul of the DRY (Don’t Repeat Yourself) principle, let’s put the plotting code into a function that we can call.

First, let’s define the function. We’ll pass it X parameters.

def choropleth_map(mapped_dataset, column, plot_type):

Initialize the Figure

Inside that function, let’s first initialize the figure that will hold our choropleth map.

ax = mapped_dataset.plot(figsize=(12,6), column=column, alpha=0.75, legend=True, cmap="YlGnBu", edgecolor="k"

There’s a lot in this step, so let’s unpack it.

figsize=(12,6): Plot should be 12 inches wide by 6 inches tall
column=column: Plot the column name that was passed to the choropleth_map() function.
alpha=0.75: Make the map 75% opaque (25% transparent) so you can see through it slightly.
legend=True: Include the color bar legend on the figure
cmap="YlGnBu": Use a Yellow-Green-Blue color map
edgecolor="k": Color the state outlines/borders black

Remove Axis Ticks and Labels From Your Choropleth Map

If we use the standard WGS-84 (EPSG:4326) projection to plot the continental US, the map comes out short and wide. For a better aspect ratio, we’ll convert the data into the Mercator Projection (EPSG:3857). Unfortunately, that means the x and y axes will no longer be in latitude and longitude, and will instead be in the coordinates of the Mercator Projection. To avoid any confusion, let’s just hide the labels on the x and y axes.

ax.set_xticks([])
ax.set_yticks([])

Set the Title of Your Choropleth Map

Next, we’ll set the title, exactly like we’ve done in previous tutorials.

title = "COVID-19: {} in the United States\n2 January, 2021".format(title)
ax.set_title(plot_type)

Because we’re only working with a specific date, we’ve hard-coded the date into the function. However, if you’re working with multiple dates, you can easily update the code so that the correct dates display on the maps.

Zoom the Map to Show the Continental United States

Now, let’s set the bounding box to show only the Lower 48.

ax.set_xlim(*XBOUNDS)
ax.set_ylim(*YBOUNDS)

Add the Basemap For Your Choropleth Map

Penultimately, add the basemap for the choropleth map. We’ll use the same Stamen TonerLite basemap that we used in both the Hurricane Dorian Cone of Uncertainty and the maps of the 2011 tornado tracks. We’ll get the projection from the dataset so we don’t have to worry about the basemap and the data being in different projections.

ctx.add_basemap(ax, crs=full_dataset.crs.to_string(), source=ctx.provicers.Stamen.TonerLite, zoom=4)

Save Your Choropleth Map to a png File

Finally, save the plots to a png image file.

output_path = "covid19_{}_usa.png"
plt.savefig(output_path)

Let’s Generate 4 Choropleth Maps

Now that we have our function to generate the choropleth maps, let’s make 4 maps of COVID-19 data on 2 January, 2021, which was the peak of the winter wave in the United States.

New Daily Cases
Total Cumulative Cases
New Daily Deaths
Total Cumulative Deaths

columns_to_plot = [
    "new_cases", 
    "confirmed",
    "new_deaths",
    "dead"
]

plot_types = [
    "New Daily Cases",
    "Total Cumulative Cases",
    "New Daily Deaths",
    "Total Cumulative Deaths",
]

for column, plot_type in zip(columns, plot_types):
    choropleth_map(full_dataset, columns, plot_type)
    print("Successfully Generated Choropleth Map for {}...".format(plot_type))

Let There Be Maps

After running the script, you’ll find 4 choropleth maps in the script directory.

Download the Script and Run It Yourself

We encourage you to download the script from our Bitbucket Repository and run it yourself. Play around with it and see what other kinds of choropleth maps you can come up with.

Conclusion

Region mapping is an incredibly powerful way to efficiently display massive amounts of data on a map. For example, when we load the Canadian provincial data in our COVID-19 map, the combination of region mapping plus the Mapbox Vector tiles has resulted in a 99.997% reduction in the size of the provincial boundary being loaded. These savings are critical to the success of online GIS projects. Nobody in their right mind is going to sit around and wait for 75 GB of state boundaries to download every time the map loads.

Many people think that high-level tasks such as region mapping are confined to tools like ESRI ArcGIS. While Python GeoPandas is certainly not a replacement for a tool like ArcGIS, it’s a perfect solution for organizations that don’t have the budget for expensive software licenses or don’t do enough GIS work to require those licenses. If you’re ready, we can help you get started building maps with GeoPandas today.

If you’re ready to try a few exercises yourself, we’ve got a couple challenges for you.

Next Steps, Challenge 1:

Revisit our tutorial plotting 2011 tornado data. Revise that script so that instead of generating a map of the tornado tracks, you create a choropleth map of the number of tornadoes to strike each state in 2011. I’ll give you a hint to get started. You don’t need to use region mapping for this because the data is already embedded in the shapefile.

Next Steps, Challenge 2:

In the Bitbucket Repository, you’ll find a CSV File of COVID-19 data for each country for 2 January, 2021. Go online and find a GeoJSON or ESRI Shapefile of world country borders. Then use region mapping to create the same 4 choropleth maps we generated in this tutorial, except you should output a map of the world countries, not a map of US States. I’ve included all of the ISO Country Codes in the CSV file so you can use the Alpha 2, Alpha 3, or Numeric codes.

Top Photo: Beautiful Geology in Red Rock Country
Sedona, Arizona – August, 2016

The post Python Tutorial: How to Create a Choropleth Map Using Region Mapping appeared first on Matthew Gove Blog.

Color Theory: A Simple Exercise in Mathematics and Graphic Design

Matt Gove — Fri, 02 Jul 2021 16:00:00 +0000

We are exposed to color theory every single day of our lives. Most of the time, we don’t think twice about it. However, did you know that judgements of your credibility are 75% based on the design of your website? First impressions are 94% related to the look and design of your website. And 46% of customers base their purchasing decisions on the aesthetic appeal of your website. Indeed, color theory and graphic design is that important.

Unfortunately, the vast majority of people have a terrible eye when it comes to graphic design, and many organizations cut corners on their designs. While any of us can choose colors, professional graphic designers have a knack for choosing colors that are just stunning together. So how do they do it? They don’t just pull these colors out of thin air.

It turns out we can use mathematics to better understand color. The mathematical color theory we’re looking at today is aimed at web and application design, but you can certainly use it for painting your house, coordinating fashion outfits, or any other type of design.

The Color Wheel: The Foundation of Color Theory

Isaac Newton invented the color wheel in 1666. Mapping the color spectrum onto a circle easily allows us to identify relationships between colors. You probably saw a color wheel when you were in elementary school. However, you can easily find color wheels that professionals use through a quick Google Images search. Here’s an example of one.

A New Take on Primary Colors

Time for another flashback to elementary school. Do you remember the primary colors? I know you do. If you’ve forgotten, they are are red, yellow, and blue, or RYB for short. You cannot make primary colors by combining mixtures of other colors.

Did you know that that’s not the only set of primary colors? Those primary colors you learned in elementary school are only telling part of the story. The other set of primary colors consists of red, green, and blue, or RGB. Wait, what? Nope, that’s not a type-o. You actually use the RGB primary color scheme in your day-to-day life than you do the RYB.

So how on earth can green possibly be a primary color? I thought yellow and blue made green. You’re right, but only partially right. Here’s the rest of the story. The RYB primary color scheme you were taught in elementary school applies to mixing colors of paint or ink. Whenever you mix colors of light, you use the RGB color scheme.

Where Do We Use the RGB Primary Color Scheme?

It’s everywhere. First and foremost, your eyes use the RGB primary color scheme to interpret color. Anything with a screen also uses it. Your phone does. So does your computer and your television. So do those electric signs you see on the freeway. The next time you’re in a studio or at the theatre, look up at the lights. You’ll see they are red, green, and blue.

Thankfully, mathematical color theory remains the same, regardless of which primary color scheme you’re using.

Denoting Color Mathematically

In order to apply mathematical theory to color, we’re going to have to put some numbers behind it. Isaac Newton, the man who invented the color wheel, was one of the greatest mathematicians of all time, so most of the heavy lifting is done for us. We simply use the RGB Model break the color down into its red, green, and blue components. You can think of it either as a three-dimensional vector or as a 1×3 matrix.

color = RGB(red, green, blue)

So what numbers do we put in each component? 0 to 100 would be a good guess. That’s actually an accepted way to do it, but when you’re working with computers, there’s a better way. Computers use the binary system, which uses powers of two. A single byte consists of 8 parts called bits. As a result, the maximum value a byte can hold is 2⁸, or 256. We’ll set each component of our color to a number between 0 and 255.

black = RGB(0, 0, 0)
white = RGB(255, 255, 255)
red = RGB(255, 0, 0)
green = RGB(0, 255, 0)
blue = RGB(0, 0, 255)

Choosing a Coordinate System for Our Color Wheel

To fully understand color theory, we’ll need to plot the color wheel on a graph. By default, most people start with a cartesian (x, y) coordinate system. And cartesian coordinates work just fine for color theory. However, there’s one big catch. The color wheel is circular, which means we need to deal with angles. And in a cartesian grid, that means trigonometry, and lots of it.

I don’t know about you, but I’d rather not have to bring sines and cosines into this. Thankfully, there’s a much better coordinate system to use. And best of all, the only math you’ll need is addition and subtraction. There’s no trigonometry required.

Enter the Polar Coordinate System

Instead of a rectangular grid, the polar coordinate system is based on concentric circles around the (0, 0) coordinate. Instead of (x, y), polar coordinates are given as (r, θ). The r coordinate refers to the radius, or how far you are from the origin. Theta (θ) is the angle from a horizontal line that extends to the right from the origin. In degrees, theta is a number between 0 and 360.

Convert RGB to Hue, Saturation, Lightness (HSL) to Make the Mathematics of Color Theory Even Easier

Here’s where the magic happens. The hue, Saturation, Lightness, or HSL model, is just another way to denote and analyze colors. Like RGB, it is comprised of three components. Can you tell which component will complement our polar coordinate system perfectly?

Hue	The angle on the color wheel, from 0 to 360°. Red is 0°, green is 120°, and blue is 240°.
Saturation	The grey level, as a percentage. 0% is a shade of grey. 100% is full color.
Lightness	How light or dark the color is, as a percentage. 0% is black, while 100% is white.

The hue component overlays perfectly with our polar coordinate system. To perform color theory, all we need to do is add or subtract hue values to obtain complementary colors. No trigonometry required. You don’t even have to touch the saturation or lightness values. The process is breathtakingly simple.

Convert your primary color to HSL notation. The nuts and bolts of that conversion is beyond the scope of this tutorial. However, I wrote a Python script that does the conversion so you can perform your own color analysis.
Add and/or subtract the hue angles to determine your complementary colors. We’ll do some hands-on exercises with that below.
Convert the colors in your color scheme back to RGB notation.

For the examples below, let’s use red as the primary color because θ = 0 for red. That way, the angles on the plots will make much more sense.

Complimentary Color Theory

The complimentary color is the color that is directly across the color wheel from your primary color. Mathematically, just add 180° to the hue of your primary color and plot it on your polar coordinate system.

hue_complimentary = hue_primary + 180
saturation_complimentary = saturation_primary
lightness_complimentary = lightness_primary

Real World Example of Complimentary Colors

Look no further than the world of North American sports to find logos that use complimentary colors.

Tricolor: Adjacent Color Theory

With adjacent colors, the goal is to have two additional colors that are compatible with your primary color. All three colors should be near each other on the color wheel. One of the adjacent colors should be slightly cooler than your primary color. The other should be slightly warmer than your primary color.

Adjacent colors work best with more subdued colors. When used with bright, vivid colors, they can really overwhelm your senses.

A Word About Angles in Color Theory

Using polar coordinates, your two adjacent colors are offset from your primary color by the same angle on the color wheel. Most designers find that 30° to 45° is the optimum range, but anywhere between 20° and 60° is acceptable. If you use an angle less than 20°, the colors will be so similar you’ll have a hard time telling them apart. Use an angle greater than 60°, and they’re really not adjacent colors any more.

In the equations below, the phi variable (φ) represents the angle your adjacent colors are offset from the primary. In the plot below, φ = 30°.

hue1 = hue_primary + phi
hue2 = hue_primary - phi

saturation1 = saturation2 = saturation_primary
lightness1 = lightness2 = lightness_primary

Adjacent Colors for Red, Offset by 30° on the Color Wheel

Real World Examples of Adjacent Colors

Many companies you interact with in day-to-day life use adjacent colors.

Tricolor: Triad Color Theory

The theory behind triad colors is identical to adjacent colors, with one distinct difference. Instead of being offset φ degrees from your primary color, triad colors are offset φ degrees from its complementary color. Instead of just a single complementary color, you’ll have two. If you’re looking for a tricolor scheme and have bright colors, triad colors work much better than adjacent colors.

Using a single triad color is also a great alternative to using a complementary color. Which one to use will largely depend on your project, but if you don’t like the look of your color scheme using a complementary color, try it with a single triad color. Out in the real world, there is no more textbook example of using a single triad color than the iconic bleu, blanc, et rouge of the Montréal Canadiens. Well, at least the bleu and rouge parts.

The Montréal Canadiens Iconic Red Home Sweaters, accented with a Blue Stripe. Image courtesy of NBC Sports.

Mathematically, triad colors are calculated the same way as the adjacent colors, except they’re offset from the complimentary color. The same rules for the angles apply here. In the plots below, let’s again use 30° as the offset angle.

hue_complimentary = hue_primary + 180
hue1 = hue_complimentary + phi
hue2 = hue_complimentary - phi

saturation1 = saturation2 = saturation_primary
lightness1 = lightness2 = lightness_primary

Can you see where the Montréal Canadiens got their color scheme?

Real World Examples of Triad Colors

We’ll go back to sports team logos here. Can you spot the common thread in the color schemes? Do note that many of these teams use a single triad color like the Montréal Canadiens do.

So did you spot the common thread in the color schemes? Red, white, and blue are an incredibly popular triad color scheme in North American sports. It makes sense given the color scheme of the American flag. However, don’t forget that the Canadian flag had blue in it as recently as the 1960s.

Tetrad Color Theory: Make a Rectangle on the Color Wheel

Tetrad colors combine the best of complimentary, adjacent, and triad colors into a beautiful 4-color scheme. Using the naked eye, it’s incredibly difficult to pull off a good tetrad color scheme, but it becomes much simpler when you put the math behind it to work. When done correctly, if you plot the four points on the color wheel and then connect the dots, you’ll have a perfect rectangle.

Also called “Double Complimentary Colors”, it’s actually quite easy to come up with a tetrad color scheme mathematically.

Start with your primary color.
Select one of the adjacent colors to your primary color. It does not matter which one.
Calculate the complimentary color from your primary color.
The triad color is simply the complementary color of the adjacent color you chose in Step 2.

hue_complimentary = hue_primary + 180
hue_adjacent = hue_primary + phi
hue_triad = hue_adjacent + 180 = hue_complimentary + phi

saturation1 = saturation2 = saturation_primary
lightness1 = lightness2 = lightness_primary

While the plot below uses an offset angle of 30°, I find tetrad colors work much better with a larger angle. Many of our real-world examples use angles of at least 45 to 60°.

Real World Examples of Tetrad Colors

You’ll need to look to the tech industry to find the best examples of tetrad colors in the real world.

Interestingly, there is one North American professional sports team that pulls off a tetrad color scheme incredibly well. Any guesses as to which team it is? I’ll give you a hint. It’s an NBA team.

Monochromatic Shading

Our final color theory is the simplest. You don’t have to worry about converting RGB to HSL notation. You’ll want to use RGB notation. Throw away your polar coordinates, too. They won’t be needed here.

Monochromatic shading is nothing more than making a lighter and darker version of your primary color. The secret to pulling it off is that both the lighter and darker versions must be scaled the same amount from your primary color.

The scaling factor should be a percent, in decimal form. In other words, use 0.25 to scale your colors by 25%. Like the tricolor schemes, scale in moderation. 25 to 60% is a pretty safe range.

red_light = red_primary * (1 + scaling_factor)
green_light = green_primary * (1 + scaling_factor)
blue_light = blue_primary * (1 + scaling_factor)

red_dark = red_primary * (1 - scaling_factor)
green_dark = green_primary * (1 - scaling_factor)
blue_dark = blue_primary * (1 - scaling_factor)

Our Strategy

On our websites, we employ a simple strategy. We start with a tetrad color scheme, choose primary and secondary colors from that, and compliment them with greys as necessary. However, we do actually use all 4 colors.

Primary Color
Secondary Color
Accent Color that is used sparingly to complement the primary and secondary colors.
Alert Color: A high-contrast color used to draw attention to certain items, such as error messages, warnings, sale announcements, etc.
Greys accent headers, footers, and other parts of the website as needed

We also use monochromatic scaling to make buttons and links darker when you hover over them, and make them lighter when they’re disabled.

Conclusion

Choosing a color scheme can be incredibly difficult. While it’s no substitution for a professional designer, knowledge of basic color theory goes along way towards your success. Use color theory mathematics to know what you want prior to hiring a designer. They’ll be able to work much more efficiently, and you’ll save yourself some money as well.

If you want to explore color theory further and try out some of the color theory math for yourself, I’ve put Python scripts on the Bitbucket repository. And if you have any graphic design needs or just a general question, please don’t hesitate to get in touch today.

Top Photo: Brilliant Colors Light Up the Desert Sky During a Spectacular Winter Sunrise
Wittmann, Arizona – December, 2017

The post Color Theory: A Simple Exercise in Mathematics and Graphic Design appeared first on Matthew Gove Blog.

The Ultimate in Python Data Processing: How to Create Maps and Graphs from a Single Shapefile

Matt Gove — Fri, 18 Jun 2021 16:00:00 +0000

Last week, using Python GeoPandas, we generated two simple geographic maps from an ESRI Shapefile. After plotting a simple map of the 32 Mexican States, we then layered several shapefiles together to make a map of the Cone of Uncertainty for Hurricane Dorian as she brushed the coast of Florida, Georgia, and the Carolinas.

Today, we’re going to get a bit more advanced. We’re going to look at filtering and color data based on different criteria. Then, we’ll make some bar charts from the data in the shapefile. And we’ll do it all without using a GIS program. Once again, we’ll be looking at weather data. This time, we’ll be mapping tornado data from one of the busiest and deadliest tornado seasons ever.

A Review of the 2011 Tornado Season

We’re going to look at 2011 because it was one of the busiest and also one of the deadliest tornado season on record. In addition to the Super Outbreak across Dixie Alley on 27 April, there were two violent EF-5 tornadoes within 48 hours of each other in late May.

One struck Joplin, Missouri on 22 May, tragically killing 168 people. The other struck Piedmont, Oklahoma on 24 May. That tornado hit the El Reno Mesonet Site, which recorded a wind gust of 151 mph prior to going offline. To this day, that record stands as the strongest wind gust the Oklahoma Mesonet has ever recorded.

Tornado Damage in Moore, Oklahoma Less Than 10 Days After the Horrific EF-5 Tornado on 20 May, 2013

Tornado Track Data

The National Weather Service’s Storm Prediction Center in Norman, Oklahoma handles everything related to tornadoes and severe weather in the United States. In addition to its forecasting and research operations, it maintains an extensive archive of outlooks, storm reports, and more.

In that archive, you can find shapefiles of all tornado, wind, and hail reports since 1950. We’ll use the “Paths” shapefile for tornado data, but you can repeat the exercise with wind and hail data, too. From that file alone, we will generate:

Map of 2011 tornado tracks in the Eastern and Central United States, colored by strength
Zoomed in Map of May, 2011 tornado tracks for Oklahoma, Kansas, Missouri, and Arkansas, colored by strength
Zoomed in map of the 27 April, 2011 Super Outbreak across Mississippi and Alabama, colored by strength

Python Libraries and Dependables

For this exercise, you’ll need to make sure you’ve installed several Python libraries. You can easily install the libraries with pip or anaconda.

GeoPandas
Pandas
Matplotlib
Contextily

Digging into the Raw Data

Before we begin, we need to figure out exactly which parameters are in the Shapefile. Because we don’t have a GIS program, we can use GeoPandas. To list all, columns, run the following Python code.

shp_path = "1950-2018-torn-aspath/1950-2018-torn-aspath.shp"
all_tornadoes = geopandas.read_file(shp_path)
print(all_tornadoes.columns)

This outputs a list of all column names in the shapefile. Because we are filtering by year, and coloring by strength, we are only interested in those columns.

yr is the year
st defines the 2-letter state abbreviation
mag represents the tornado strength, using the Enhanced Fujita Scale

Read in the Data From the Shapefile

Before we dive into the Python code, let’s first import all of the libraries that we’ll need to make our geographic maps and graphs.

import geopandas
import pandas as pd
import matplotlib.pyplot as plt
import contextily as ctx

We’ll import the data the same way did last week.

shp_path = "1950-2018-torn-aspath/1950-2018-torn-aspath.shp"
all_tornadoes = geopandas.read_file(shp_path)

Filtering the Raw Data

Applying a filter to the raw data is incredibly simple. All you need to do is define the filtering criteria, and then apply it. You can do it in a single line of code. However, to make it easier to understand, we’ll do it in two.

Recall our filtering criteria. We’re using data from 2011.

filter_criteria = (all_tornadoes["yr"] == 2011)

Applying the filter is as simple as parsing any other list.

filtered_data = all_tornadoes[filter_criteria]

Finally, don’t forget to convert it to the WGS-84 projection. If you’ve forgotten it, the EPSG code for WGS-84 is 4326.

filtered_data = filtered_data.to_crs(epsg=4326)

Plotting the Data on a Map

Using Python, we’re going to create three geographic maps of the 2011 tornado paths with GeoPandas.

The eastern 2/3 of the United States
Southern Great Plains (Oklahoma, Kansas, Missouri, and Arkansas)
Dixie Alley (Mississippi, Alabama, Georgia, and Tennessee)

Because we’re generating three geographic maps, let’s put the map generation code into a Python function. We’ll call it plot_map, and we’ll pass it 4 arguments.

data: A list of the data we filtered from the shapefile in the previous section
region: One of the three regions defined above. We’ll define them as follows.
- United States
- Tornado Alley
- Dixie Alley
xlim: An optional list or tuple defining the minimum and maximum longitudes to plot, in the format [min, max]. If omitted, the map will be scaled to fit the entire dataset.
ylim: An optional list or tuple defining the minimum and maximum latitudes to plot, in the format [min, max]. If omitted, the map will be scaled to fit the entire dataset.

In Python, we’ll create our geographic maps with a function called plot_map().

def plot_map(data, region, xlim=None, ylim=None):
    # Put Your Code Here

Extra Arguments Needed to Generate the Plot

We’ll use the same methods we used last week to plot the data on a map. First, we’ll plot the data, passing it a few extra parameters compared to last time.

column: Tells Python which column in the shapefile to plot.
- We’ll use the mag column for tornado magnitude.
legend: A boolean argument that tells Python whether or not to display the legend on the map.
- We will include the legend (legend = True)
cmap: The color map to use to shade the data on the map.
- We’ll use the “cool” color map, which will shade the tornado paths from blue (weakest) to pink (strongest).

Put it all together into a single line of code.

ax = data.plot(figsize=(12,6), column="mag", legend=True, cmap="cool")

Zoom In On Certain Areas

To zoom in on our three areas, we’ll need to set the bounding box. All a bounding box does in define the minimum and maximum latitudes and longitudes to include on the map. If the xlim and ylim arguments are passed to the function, let’s use them to set the bounding box.

if xlim:
    ax.set_xlim(*xlim)
if ylim:
    ax.set_ylim(*ylim)

Next, we’ll set the map’s title using the region variable we passed to the function.

title = "2011 Tornado Map: {}".format(region)
ax.set_title(title)

While we’re looking at the region, let’s also use the region to set the output filename for our map. We’ll replace spaces with dashes and make everything lowercase.

fname_region = region.replace(" ", "-").lower()

The final piece of the map to add is the basemap, using the same arguments as the Hurricane Dorian cone of uncertainty. If you need to jiggle your memory bank, here are those parameters again.

ax: The plotted data
crs: The coordinate reference system, or projection
source: The basemap style (we’ll use the Stamen TonerLite style again)

ctx.add_basemap(ax, crs=data.crs.to_string(), source=ctx.providers.Stamen.TonerLite

Finally, just save the map in png format. Incorporate the fname_region variable we defined above to give each file a unique name.

figname = "2011-tornadoes-{}.png".format(fname_region)
plt.savefig(figname)

Create Bar Charts

To further demonstrate the incredible power of Python and GeoPandas, let’s make a few bar graphs using data we’ve parsed from the shapefile. While top-line GIS programs such as ESRI ArcGIS include graphing capabilities, most do not. After this, you’ll be able to create publication-ready bar charts from shapefile data in less than 50 lines of code. And it didn’t cost you a dime.

Let’s create two bar charts.

One with the 10 states that recorded the most tornadoes in 2011
The other with the 10 states that recorded the fewest tornadoes in 2011

Count the Number of Tornadoes that Struck Each State in 2011

Because we’re only looking for raw tornado counts, all we have to do is loop through the state (st) column in the shapefile and count how many times each state appears. First, let’s initialize a dictionary to store our state counts. The dictionary keys will be the 2-digit state abbreviations. For example, to get the number of tornadoes in Kansas, you would simply call state_counts["KS"].

states = filtered_data["st"]
state_counts = dict()

Next, all you have to do is count. Do note that the shapefile data contains data for Puerto Rico and the District of Columbia, so we will skip those because they are not states. Then, if a state is already in the state_counts dictionary, just add one to its count. If it’s not yet in the state_counts dictionary, initialize the count for that state, and set it to 1.

for state in states:
    # Filter Out DC and Puerto Rico
    if state in ["DC", "PR"]:
        continue

    # Count the Data
    if state in state_counts.keys():
        state_counts[state] += 1
    else:
        state_counts[state] = 1

Sort the Counts in Ascending and Descending Order

Sorting data is with Python is easy thanks to the sorted() function. You’ll need to pass the function several arguments.

The raw data to sort. In our case, we’re using the state_counts dictionary.
Which parameter to base the sorting on
The sort function sorts from least to greatest by default, so if we’re sorting from least to greatest, we’ll need to tell Python to reverse the sort order.

fewest_counts = sorted(state_counts.items(), key=lambda x: x[1])[:10]
most_counts = sorted(state_counts.items(), key=lambda x: x[1], reverse=True)[:10]

I know this may look a little confusing, so let’s break down what everything means.

state_counts.items(): A list of key, value tuples of data in the dictionary to sort. For the state_counts dictionary, the tuples would be (state_abbreviation, number_of_2011_tornadoes).
key=lambda x: x[1]: The key argument tells Python on which parameter to sort the data. lambda x: x[1] tells Python to sort based on the second element of each tuple in state_counts.items(), which is the number of tornadoes that occurred in each state in 2011.
reverse=True: The graph of states with the most tornadoes in 2011 sorts the counts from greatest to smallest. Because sorted() sorts from smallest to greatest by default, reverse=True simply tells Python to reverse the order of the sort.
[:10]: Instructs Python to only take the first 10 items of each sorted dataset.

A Function to Generate the Bar Graphs

The most powerful aspect of the GeoPandas library is that it comes with all of the Pandas Data Analysis tools already built into it. As a result, you can perform your entire data analysis from within GeoPandas, regardless of whether or not each piece has a GIS component to it. We’ll use the Pandas library to create our bar graphs. Because we’re creating multiple plots, let’s create a function to generate each plot.

def plot_top10(sorted_list, title):
    # Code Goes Here

Our function requires two articles.

sorted_list is either the fewest_counts or most_counts variable we defined in the previous section
title is the title that goes at the top of the graph

Python’s Data Analysis Libraries Use Parallel Arrays to Plot (x, y) Pairs

Both pandas and matplotlib make heavy use of parallel arrays to define (x, y) coordinates to plot. Converting our (state_abbreviation, number_of_2011_tornadoes) tuples into parallel arrays is easy. We’ll loop through each tuple in the sorted data, put the states into one array, and put the tornado counts into the other.

states = []
num_tornadoes = []

for pair in sorted_list:
    state = pair[0]
    count = pair[2]

    states.append(state)
    num_tornadoes.append(count)

Pandas expects the data to be passed to it in a dictionary. The dictionary keys are the labels that go on each axis. Add key/value pairs to the dictionary for each parameter you’ll be analyzing.

data_frame = {
    "Number of Tornadoes": num_tornadoes,
    "States": states,
}

Next, read the data into Pandas. The index parameter tells Pandas which variable to plot on the independent (x) axis. We want to plot the states variable on the x-axis.

df = pandas.DataFrame(data_frame, index=states)

Then, create the bar chart and label the title and axes. The rot parameter tells Pandas the rotation of each bar, so you can create a horizontal or vertical bar chart. We’re creating a vertical bar chart, so we’ll set it to zero.

ax = df.plot.bar(rot=0, title=title)
ax.set_xlabel("State")
ax.set_ylabel("Number of Tornadoes")

Finally, save the graph to your hard drive in png format.

figname = "{}.png".format(title)
plt.savefig(figname)

Generate the Maps and Bar Graphs

Now that we have defined our Python functions to generate the geographic maps and bar graphs, all we have to do is call them. Before we do that, let’s recall the parameters we have to pass to each function.

The plot_map() function to generate the map requires 4 arguments. If you can’t remember what each argument is, we defined them above.

plot_map(data, region, xlim, ylim)

And for the plot_top10 function to create the bar graphs, we need 2 arguments.

plot_top10(sorted_list, title)

Now, just call the functions. First we’ll do the maps.

# East-Central US
plot_map(filtered_data, "United States", (-110, -70))

# Oklahoma - Kansas - Missouri - Arkansas
plot_map(filtered_data, "Tornado Alley", (-100, -91), (33, 38))

# Dixie Alley
plot_map(filtered_data, "Dixie Alley Super Outbreak", (-95, -81), (29, 37))

And the bar charts…

# Highest Number of Tornadoes
plot_top10(most_counts, "US States with the Most Tornadoes in 2011")

# Fewest Number of Tornadoes
plot_top10(fewest_counts, "US States with the Fewest Tornadoes in 2011")

When you run the scripts, you should get the following output. First, the three maps. You can click on the image to view it full size.

And here are the two bar charts.

Try It Yourself: Download and Run the Python Scripts

As with all of our Python tutorials, feel free to download the scripts from our Bitbucket Repository and run them yourself.

Additional Exercises

If you’re up for an extra challenge, modify the script to plot any or all of the following, or create your own.

2011 Hail and Wind Reports
Plot only significant (EF-3 and above) tornadoes
Display the May, 2013 tornado outbreaks (Moore and El Reno) in Central Oklahoma
Create a map of the 1974 Super Outbreak across the midwest and southern United States
Re-create the same maps and graphs for Canada. You can download Canadian tornado track data from the Environment Canada archives.

Conclusion

If you only need to generate static geographic maps, Python GeoPandas is one of the most powerful tools you can use. It lets you plot nearly any type of geospatial data on a publication-ready map. Even better, it comes with one of the most complete and dynamic data analysis libraries that exists today.

While it’s certainly not a replacement for a full GIS suite like ESRI’s ArcGIS, I highly recommend GeoPandas if you want to avoid expensive licensing fees or have heavy data processing needs that some GIS applications can struggle with.

If you’d like to get started with GeoPandas (or any other GIS application), get in touch today to talk to our GIS experts about your geospatial data analysis. Once you get started with GeoPandas, you’ll be impressed with what you can do with it.

Top Photo: A Wedge Tornado Forms Over an Open Prairie
Chickasha, Oklahoma – May, 2013

The post The Ultimate in Python Data Processing: How to Create Maps and Graphs from a Single Shapefile appeared first on Matthew Gove Blog.

Python GeoPandas: Easily Create Stunning Maps without a GIS Application

Matt Gove — Fri, 11 Jun 2021 16:00:00 +0000

Python is the world’s third most popular programming language. It’s also one of the most versatile languages available today. Not surprisingly, Python has incredible potential in the field of Geographic Information Systems (GIS). That potential has only barely begun to get tapped with libraries like GeoPandas.

In the past, we’ve looked at many different uses for Python, including how to make basic GIS maps. Unfortunately, the Basemap module is quite limiting on its own. My biggest complaint about it is actually how the maps look. It’s far too easy to make low-quality maps that look like they’re stuck in the 1980’s.

Enter Python’s GeoPandas Project

The Python Pandas library is an incredibly powerful data processing tool. Built on the popular numpy and matplotlib libraries, it’s a sleek combination of power, speed, and efficiency. You can easily perform complex data analysis in just a fraction of the time you could with Microsoft Excel or even raw Python.

GeoPandas is an add-on for Pandas that lets you plot geospatial data on a map. In addition to all of the benefits of Pandas, you can create high-quality maps that look incredible in any publication and on any medium. It supports nearly all GIS file formats, including ESRI Shapefiles. And do you know what the best part is? You don’t even need a GIS program to do it.

Today, we’re going to learn how to use GeoPandas to easily make simple, but stunning GIS maps. We’ll generate each map with less than 40 lines of code. The Python code is easy to read and understand, even for a beginner.

Getting Started with GeoPandas

Before getting started, you’ll need to install GeoPandas using either pip or anaconda. You can find the installation instructions from their website below. Please note their warning that pip may not install all of the required dependencies. In that case, you’ll have to install them manually.

Your browser does not support iframes. Please visit the GeoPandas website for detailed installation instructions.

All right, let’s dive in to the fun stuff. As always, you can download the full scripts from the Bitbucket repository.

Exercise #1: Display an ESRI Shapefile on a Map

Before we do any kind of number crunching and data analysis, we need to make sure we can load, read, and plot a shapefile using GeoPandas. In this example, we’ll use a shapefile of Mexican State borders, but you can use any shapefile you desire.

First, let’s import the libraries and modules we’ll be using. In addition to GeoPandas, we’ll be using matplotlib, as well as contextily. The contextily library allows us to set a modern, detailed basemap.

import geopandas
import matplotlib.pyplot as plt
import contextily as ctx

Diving into the code, we’ll first read the shapefile into pandas.

shp_path = "mexico-states/mexstates.shp"
mexico = geopandas.read_file(shp_path)

A Word on Projections

If we were to plot the map right now, we’d run into a major issue. Do you have any guesses as to what that issue might be? The boundaries in the shapefile will not align with the boundaries on the basemap because they use different coordinate reference systems (CRS’s), or projections. In that event, you’ll wind up with a map like this.

What happens when your shapefile is not in the same projection as your basemap.

While the basemap is in the Pseudo-Mercator Projection, the shapefile is in the North American Datum, or NAD-83, projection. The European Petroleum Survey Group (EPSG) maintains a standardized database of all coordinate reference systems. Thankfully, you only need one line of code to convert the shapefile to the Pseudo-Mercator Projection. The EPSG code for the Pseudo-Mercator Projection is 3857.

mexico = mexico.to_crs(epsg=3857)

Plot the State Borders and Basemap

With both the basemap and the shapefile in the same coordinate reference system, we can plot them on the map. First, we’ll do the shapefile. We’ll pass the plot() method three parameters.

Figure Size is 1200×800 pixels
50% Transparency, or alpha=0.5
State borders should be black (edgecolor="k"). Geopandas uses matplotlib syntax to define colors.

ax = mexico.plot(figsize=(12,8), alpha=0.5, edgecolor="k")

Now, let’s use the contextily library to add the basemap. You can set the zoom level of the basemap, but I prefer to let Geopandas figure it out automatically. If you zoom in too much, you can easily crash your Python script.

ctx.add_basemap(ax)

Finally, save the figure to your hard drive.

plt.savefig("mexico-state-borders.png")

There’s still plenty of room for improvement, but our map is off to a great start!

Exercise #2: Use Layers to Map a Hurricane’s Cone of Uncertainty

Layers make GIS, graphic design, and much more incredibly powerful. In this example, let’s stack three layers. We’ll generate a map of the cone of uncertainty of a hurricane. If we can generate the plot quickly, warnings and evacuation orders can be issued, and we can save lives. We’ll look at Hurricane Dorian as it bears down on Florida and the Bahamas in 2019.

The National Hurricane Center GIS Portal

The National Hurricane Center maintains a portal of both live GIS data as well as archives dating back to 2008. While I included the Hurricane Dorian shapefiles in the Bitbucket repository, I encourage you to browse the NHC archives and run our script for other hurricanes. Their file naming system can be a bit cryptic, so you can look up advisory numbers in their graphics archive.

Using the Hurricane Dorian shapefile as an example, here’s how the filename breaks down. The filename is al052019-033_5day_pgn.shp

al: Atlantic Hurricane
05: The fifth storm of the season
2019: The 2019 calendar year
033: NHC Advisory #33
5day: 5-Day Forecast
pgn: The polygon of the cone of uncertainty.

The NHC also provides shapefiles for the center line and points of where the center of the hurricane is forecast to be at each subsequent advisory. We’ll use all three in this example.

Python Code

Like the Mexico map, let’s start by importing the modules and libraries we’ll be using.

import geopandas
import matplotlib.pyplot as plt
import contextily as ctx

There are three components of the cone of uncertainty: the polygon, the center line, and the points where the center of the storm is forecast at each subsequent advisory. Each has its own shapefile. We’ll use a string substitution shortcut here so we don’t have to retype the filename three times. The .format() method substitutes the parameter passed to it into the curly brackets in the filepath.

SHP_PATH = "shp/hurricane-dorian/al052019-033_5day_{}.shp"
polygon_path = SHP_PATH.format("pgn")
line_path = SHP_PATH.format("lin")
point_path = SHP_PATH.format("pts")

Now, read the three shapefiles into GeoPandas.

polygons = geopandas.read_file(polygon_path)
lines = geopandas.read_file(line_path)
points = geopandas.read_file(point_path)

For the projections, we’re going change things up slightly from the Mexico map. Look at the x and y-axis labels of the Mexico map. They’re in the units of the projection instead of latitude and longitude. Instead of using the pseudo-mercator projection let’s use the WGS-84 projection, which uses EPSG code 4326. WGS-84 uses latitude and longitude as its coordinate system, so the axis labels will be latitude and longitude.

polygons = polygons.to_crs(epsg=4326)
lines = lines.to_crs(epsg=4326)
points = points.to_crs(epsg=4326)

Layering the Shapefiles on a Single Map

Before plotting the shapefiles, think about how you may want to color them. Because we’re dealing with a Category 5 hurricane that’s an imminent threat to population centers, let’s shade the cone red. While we’re at it, let’s make the center line and points black so they stand out.

The facecolor parameter defines the color a polygon is shaded. We’ll also make the cone more transparent so you can see the basemap underneath it better. That way, there’s no doubt as to where the storm is heading.

To stack the layers on a single map, define a figure (fig) variable with the initial layer. Then reference that variable to tell GeoPandas to plot each subsequent layer on the same map (ax=fig).

fig = polygons.plot(figsize=(10,12), alpha=0.3, facecolor="r", edgecolor="k")
lines.plot(ax=fig, edgecolor="k")
points.plot(ax=fig, facecolor="k")

Since this map would be published to the public in the real world, let’s spruce it up with a title and axis labels so there are no doubts about our warnings and messaging.

plot_title = "Hurricane Dorian Advisory #33\n11 AM EDT      1 September, 2019"
fig.set_title(plot_title)
fig.set_xlabel("Longitude")
fig.set_ylabel("Latitude")

Correctly Project the Basemap into WGS-84

The last map layer is the basemap. Because the basemap is not in the WGS-84 projection by default, we’ll need to pass that as well. To avoid type-o’s, we’ll reference it from one of the shapefiles that’s already in the WGS-84 projection. We’ll also manually set the zoom level to optimize the size and placement of state and city names on the basemap.

ctx.add_basemap(fig, crs=polygons.crs.to_string(), zoom=7)

Finally, save the figure to your hard drive.

plt.savefig("hurricane-dorian-cone-33.png")

Plenty of Room for Improvement

That’s a perfectly fine map, but I believe we can do better. While the cone itself looks great, the basemap leaves a bit to be desired. City and state labels are hard to read, especially when they’re inside the cone. The basemap looks blurry, no matter the zoom level you set it too.

Additionally, the green terrain draws the eye away from the cone. In emergency situations like a major hurricane making landfall, you want to convey a bit of urgency. The red just doesn’t “pop” off the page.

So how do you fix the map to convey more urgency? Change the basemap. The terrain is too distracting. Ideally, when you look at the map, you should instantly be able to identify the map’s location using the basemap. After that, though, the basemap should “fade” into the background, allowing your reader to focus on the data. Using muted or greyscale colors on the basemap is the best way to accomplish that.

Thankfully, GeoPandas provides a good selection of basemaps. Have a look at the basemaps in that link (at the bottom of the page). Can you identify any basemaps that use muted or greyscale colors? The one that catches my eye is called “Stamen Toner Lite”.

Updating Our Python Code

Updating our script is easy. When you call the add_basemap() method, you can specify which basemap you use by passing it the source parameter.

ctx.add_basemap(fig, crs=polygons.crs.to_string(), source=ctx.providers.Stamen.TonerLite)

After running the script, the difference is striking. It’s amazing the difference just changing a few colors makes.

So how did we do? Instantly identify the location on the map? Check. Eye instantly drawn to the red? Check. The red pops off the page? You bet. For a final comparison, here’s the actual graphic from the National Hurricane Center. I’ll let you decide which map you like best.

Official NHC Advisory for Hurricane Dorian – 11 AM EDT on 1 September, 2019

The Hurricane Center actually provides all of the GIS data and layers to recreate their official advisory graphics. Unfortunately, that’s outside the scope of this tutorial, but we’ll create those maps in a future lesson.

Pro Tip: Use this Python script in real time this hurricane season. You just need to change the shapefile path to the live URL on the National Hurricane Center Website.

Conclusion

It wasn’t too long ago that you needed expensive GIS software to make high-quality, publication-ready figures. Thankfully, those days are forever behind us. With web-based and open-source GIS platforms coming online, geospatial data processing is not only becoming much more affordable. It’s also gotten exponentially more powerful.

This tutorial doesn’t even begin to scratch the surface of what you can do with Python GeoPandas. As a result, we’ll be exploring the GeoPandas tool much more as we go though this summer and into the fall.

In our next tutorial, we’ll be analyzing data from one of the busiest and deadly tornado season in US history. Make sure you come back later this month when that drops. In the meantime, please sign up for our email newsletter to stay on top of industry news. And if you have any questions, please get in touch or leave a comment below. See you next time.

Top Photo: A Secluded Stretch of Beach on the Intracoastal Waterway
St. Petersburg, Florida – March, 2011

The post Python GeoPandas: Easily Create Stunning Maps without a GIS Application appeared first on Matthew Gove Blog.

Learn Python the Fun Way: A Simple Rock-Paper-Scissors Game

Matt Gove — Fri, 16 Apr 2021 16:00:00 +0000

Python is currently one of the most in-demand and versatile programming languages out there. Because it’s so heavily used across so many different industries, sectors, and technologies, that trend is expected to continue for the foreseeable future. Python is also one of the easiest programming languages to learn. Learning Python is an investment that will pay off for you for years to come.

I am a firm believer that the best way to learn anything is to not only dive in and get your hands dirty with it, but also to do something that’s fun. Instead of slogging though hours of lectures, let’s instead dive right in. And what better way to have fun than to code your own game? Today, you’re going to learn how to code a simple Rock-Paper-Scissors game using Python.

Rock-Paper-Scissors

In addition to being a great application to learn Python, Rock-Paper-Scissors is a simple, fun, and entertaining game for 2 players. Each player chooses rock, paper, or scissors. They simultaneously reveal their choice in a face off-style showdown to determine the winner. Rock-Paper-Scissors is often played as a Best-of-3.

Rock crushes Scissors
Scissors cuts Paper
Paper covers Rock

Plan Out the Python Script for the Rock-Paper-Scissors Game

Before writing any code, you need to sit down any plan out your program first. Without proper planning, your code will be bloated, slow, and inefficient. Break the task into a series of blocks or modules to start. Then, you can dive into the details and work out the exact logic of each module.

Think about how you might modularize the Rock-Paper-Scissors game. What steps or sections did you come up with? Here’s what I have.

Input the User’s Move
The Computer Makes Its Move
Determine Which Player Won
Inform the User Who Won

Break the Logic Down into the Smallest Nuts and Bolts You Can Think Of

Planning the logic is the hardest part of writing any program. The first version of logic for any program will likely not be the most efficient. It’s much easier to patch flaws and inefficiencies in your logic before you’ve written any code. We’ll write our logic out in a form of pseudocode before translating it into Python.

Thankfully, the simplicity of Rock-Paper-Scissors severely limits opportunities for bad logic. However, it does not eliminate them.

Input the User’s Move

Put your design hat on for a second. How you would input the user’s move? Think about games you’ve played before. What did you come up with? I thought of the following.

Prompt the user to enter their move by typing it in
Display three buttons and have the user click on a button

Since our Rock-Paper-Scissors game will initially not have a graphical user interface (GUI), let’s go with the first option. Adding a GUI is beyond the scope of this tutorial and will be covered is a future, more advanced lesson. When we prompt the user for their move, we’ll have them enter “R” for rock, “P” for paper, and “S” for scissors.

USER INPUT PROMPT: ENTER "R", "P", or "S"

Before we go any further, there’s already a flaw in the logic, here. Any guesses as to what it might be?

Fixing the Flaw in the User Input Logic

Have you figured out what the flaw is? What happens if you enter something other than “R”, “P”, or “S”? The program will crash. The easiest fix is to stick the input into a while loop to validate the user’s input. A while loop will repeat until one of the conditions it’s given is met. In our case, that loop will break when the user enters “R”, “P”, or “S”.

SET user_input TO ""
WHILE user_input NOT EQUAL TO "R", "P", or "S" THEN
    PROMPT TO SET user_input: ENTER "R", "P", or "S"
END WHILE

The Computer’s Turn

If you’re unsure of how to program the computer’s turn, you’re in luck. The logic is very simple. First, think about the last time you played Rock-Paper-Scissors. When it was your turn, what steps did you go through to make your move? I bet they looked something like this:

Choose rock, paper, or scissors at random.
Put your hand out in the shape of which item you chose

This is exactly how we’re going to code up the computer’s turn. We’ll map the numbers 1, 2, and 3 to “R”, “P”, and “S”, respectively. The computer will choose a number between 1 and 3 at random, and its selection will be compared to the user’s input to determine which player won.

SET rps_map TO [
    1 => "R",
    2 => "P",
    3 => "S",
]
SET computer_number TO RANDOM NUMBER BETWEEN 1 AND 3
SET computer_input TO rps_map[computer_number]

Determine Which Player Won

To determine which player won, simply compare the user’s choice (user_input) to the computer’s choice (computer_input). Before we do that, write out all possible outcomes.

User Choice	Computer Choice	Winner
Same as Computer	Same as User	Tie
Rock	Paper	Computer
Rock	Scissors	User
Paper	Rock	User
Paper	Scissors	Computer
Scissors	Paper	User
Scissors	Rock	Computer

All Possible Outcomes of Rock, Paper, Scissors

We can write the logic for each outcome in a series of if/else statements. The easiest way to group the statements is to look at the user’s input, and then compare the computer’s move for each scenario.

IF user_input = computer_input THEN
    IT'S A TIE
ELSE IF user_input = ROCK THEN
    IF computer_input = PAPER THEN COMPUTER WINS
    ELSE IF computer_input = SCISSORS THEN USER WINS
    END IF
ELSE IF user_input = PAPER THEN
    IF computer_input = ROCK THEN USER WINS
    ELSE IF computer_input = SCISSORS THEN COMPUTER WINS
    END IF
ELSE IF user_input = SCISSORS THEN
    IF computer_input = PAPER THEN USER WINS
    ELSE IF computer_input = ROCK THEN COMPUTER WINS
    END IF
END IF

Notify the User of the Result

Our Python script will output the results of the game to the Terminal window. However, before we can output anything, we first need to set the message that the user will see at the end of the game. When we determine which player won the game, we’ll set the final results output message as well.

SET result_msg TO ""
IF user_input = computer_input THEN
    SET result_msg TO "We both chose {rock/paper/scissors}. It's a tie!"
ELSE IF user_input = ROCK THEN
    IF computer_input = PAPER THEN 
        SET result_msg TO "Paper covers Rock. I win!"
    ELSE IF computer_input = SCISSORS THEN 
        SET result_msg TO "Rock crushes Scissors. You win!"
    END IF
ELSE IF user_input = PAPER THEN
    IF computer_input = ROCK THEN 
        SET result_msg TO "Paper covers Rock. You win!"
    ELSE IF computer_input = SCISSORS THEN
        SET result_msg TO "Scissors cuts Paper. I win!"
    END IF
ELSE IF user_input = SCISSORS THEN
    IF computer_input = PAPER THEN 
        SET result_msg TO "Scissors cuts Paper. You win!"
    ELSE IF computer_input = ROCK THEN 
        SET result_msg TO "Rock crushes Scissors. I win!"
    END IF
END IF

PRINT result_msg TO TERMINAL WINDOW

Translate Your Pseudocode into Python

Now, it’s time for the fun part. Writing out your logic in pseudocode comes with more benefits than just clean and optimized code. Once your logic is finalized, it’s incredibly easy to translate it into any programming language. While we’re using Python, you can just as easily translate it to PHP, Java, R, Perl, C#, or any other language of your choosing. Additionally, separating the logic design from the programming makes it easier to learn Python, since you can focus exclusively on Python.

Python Modules You Need to Import

Before you write any code, go through your pseudocode and see if any Python modules need to be installed and/or imported. For the Rock-Paper-Scissors game, you’ll need to import Python’s built-in random module. The random module generates the random numbers we need for the computer to take its turn.

#!/usr/bin/env python3
import random

The User’s Turn

To prompt the user for their turn, we’ll use Python’s input() function. Don’t forget to wrap it in a while loop to keep the program from crashing. Additionally, we’ll use the .lower() method to make the user’s input case-insensitive. The .lower() method converts a string to all lowercase.

user_input = ""
acceptable_inputs = ["r", "p", "s"]
user_prompt = "Make Your Move. Enter 'R' for rock, 'P' for paper, or 'S' for scissors: "

while user_input not in acceptable_inputs:
    user_input = input(user_prompt)
    user_input = user_input.lower()

The Computer’s Turn

Before we do anything, we need to first define the map that maps the number 1, 2, and 3 to rock, paper, and scissors, respectively. While you can easily use a list or array to define the map, it’s much easier to understand how the map works if we use a dictionary to define the map.

rps_map = {
   1: "R",
   2: "P",
   3: "S",
}

Next, we need to choose a random number. This is where the random module we imported comes into play. The randint() method chooses an integer at random based on the range you pass it. For example, to choose a random number between 30 and 70, use randint(30, 70). In the Rock-Paper-Scissors game, the computer is choosing a random number between 1 and 3.

computer_number = random.randint(1, 3).

Finally, we use the randomly generated number to determine the computer’s choice from the map. For example, if the computer chooses 1, the map states that rps_map[1] = "R". In other words, the computer chose “rock”.

computer_input = rps_map[computer_number]
computer_input = computer_input.lower()

Determining Who Won

Translating the pseudocode to determine the winner into Python is straight-forward. However, there is still one outstanding item that needs to be addressed. In the event of a tie, we need to output what both players chose. Recall this line from the pseudocode.

SET result_msg TO "We both chose {rock/paper/scissors}. It's a tie!"

Thankfully, Python makes substituting variables into strings very easy. All you need to do is insert a pair of open/closed curly braces where you want to put the variable. Then use the .format() method on the string to tell Python which variable you want to put into the string. You can add as many variables to a string as you wish.

For example, let’s say you prompt the user for their name. You want to say hello to them using their name. This is a perfect place to use variable substitution. If I entered my name at the prompt, it would output “Hello, Matt”.

name = input("Please Enter Your Name: ")
hello_message = "Hello, {}".format(name)
print(hello_message)

Since we are using “r”, “p”, and “s” to denote the user and computer’s choices, we need another map to give the full name of our choices. It would look pretty weird if the message read, “We both chose r. It’s a tie!”. We instead want it to read, “We both chose Rock. It’s a tie!” Define the map just like we did for the computer’s turn.

msg_map = {
   "r": "Rock",
   "p": "Paper",
   "s": "Scissors,
}

Then use variable substitution to generate the message for a tie game.

result_msg = "We both chose {}. It's a tie!".format(msg_map[user_input])

Fully translated into Python, the module determining the winner reads as follows.

result_msg = ""
msg_map = {
    "r": "Rock",
    "p": "Paper",
    "s": "Scissors",
}

if user_input == computer_input:
    result_msg = "We both chose {}. It's a tie!".format(msg_map[user_input])
elif user_input == "r":
    if computer_input == "p":
        result_msg = "Paper covers Rock. I win!"
    elif computer_input == "s":
        result_msg = "Rock crushes Scissors. You win!"
elif user_input == "p":
    if computer_input == "r":
        result_msg = "Paper covers Rock. You win!"
    elif computer_input == "s":
        result_msg = "Scissors cuts Paper. I win!"
elif user_input == "s":
    if computer_input == "p":
        result_msg = "Scissors cuts Paper. You win!"
    elif computer_input == "r":
        result_msg = "Rock crushes Scissors. I win!"

Display the Game’s Result in the Terminal Window

Once we know the result of the game, all we need to do is use Python’s print() function to print the result_msg variable to the Terminal window.

print(result_msg)

Additional Upgrades to Finalize the Script

Even though we have now translated our pseudocode into Python, we’re not done just yet. We still need to clean up and finalize the code, as well as add one more feature.

One Last Feature: The Game Runs Until You Lose

Let’s face it. Restarting the game after every single turn gets old fast. Instead, why not reward yourself for winning and keep taking turns until you lose. All you need to do is wrap the entire script in a while loop that breaks when the computer wins.

First, define a boolean (true/false) variable called user_is_victorious to monitor who wins. Initialize it at the beginning of the script and use it as the condition to determine if the player has won another turn.

#!/usr/bin/env python3
import random

user_is_victorious = True
while user_is_victorious:
    # Game Code Goes Here

How sharp is your logic? If you’re on top of your game, you should have noticed that we introduced another major flaw into the logic here. We initialized the user_is_victorious variable as true, but never set it later in the code. As a result, the while loop will run forever because the user_is_victorious variable never gets set to false, which is the condition necessary to break the loop. In the world of programming, we call that an infinite loop.

To fix it, we need to revise the code that determines which player won the game. Whenever the computer wins the game, just set the user_is_victorious variable to False.

if user_input == computer_input:
    result_msg = "We both chose {}. It's a tie!".format(msg_map[user_input])
elif user_input == "r":
    if computer_input == "p":
        result_msg = "Paper covers Rock. I win!"
        user_is_victorious = False
    elif computer_input == "s":
        result_msg = "Rock crushes Scissors. You win!"
elif user_input == "p":
    if computer_input == "r":
        result_msg = "Paper covers Rock. You win!"
    elif computer_input == "s":
        result_msg = "Scissors cuts Paper. I win!"
        user_is_victorious = False
elif user_input == "s":
    if computer_input == "p":
        result_msg = "Scissors cuts Paper. You win!"
    elif computer_input == "r":
        result_msg = "Rock crushes Scissors. I win!"
        user_is_victorious = False

Remove String Literals for Easier Readability and Maintenance

In our initial version of the Rock-Paper-Scissors game, we used a lot of string literals. What is a string literal you ask? It’s any string used directly in the logic.

While a single instance of a unique string literal is perfectly fine, string literals that are repeated throughout the Python script quickly become a nightmare to maintain. What happens if you have to change one of them? You have to go through the code and change every single one of them. If you miss one, your code will break.

To remove them, initialize a constant at the beginning of the script that stores each repeated string literal. Then use that constant in the logic. In the Rock-Paper-Scissors game, the string literals "r", "p", and "s" are the repeat offenders. Let’s define a few constants for them at the beginning of the script.

ROCK = "r"
PAPER = "p"
SCISSORS = "s"

Then, just go through the logic of your script and replace every instance of "r" with ROCK. Anywhere you find "p", put PAPER in its place. Swap out every instance of "s" with SCISSORS. That way, if you ever need to change their values, all you need to do is change them once in the constants at the top of the script. When you choose the constant names correctly, you code is much easier to read as well.

Document the Code for Future Reference

Once the current version of your code is finalized, go back and document it. I cannot stress this enough for anyone who is trying to learn Python. When I first started writing code back in the day, I was horrible at documenting code. I didn’t do it much, and the little bit that I did was next to worthless.

Then, one day, I had to go back and modify some code I hadn’t touched in over a year. You’d be amazed at how much you can forget in just a year. Because the code wasn’t documented properly, I had to spend over a week just rereading the code trying to figure out what the hell it was doing before I could even start editing it. The fact that I was a rookie at the time and the code was pretty crappy and unreadable didn’t help any either, but I digress.

After that incident, I have always made sure to properly document every block of code that I write. In addition, if you write readable code, you will expend far less effort documenting it. I have had clients come to me asking to revise code that I haven’t touched in years. With proper documentation, I can go in, know what the code is doing, and make the edits right away. As a result, turnaround times are measured in hours, not weeks.

To document the Rock-Paper-Scissors game, add a docstring to the top that describes what the script does and how to run it. Then go through the code and add comments anywhere it’s not clear what a block of code does.

The Final Result

You can download or clone the Rock-Paper-Scissors game from my Bitbucket Repository. The syntax is properly highlighted and the lines are numbered.

Run the Game

To run the Rock-Paper-Scissors game, open a Terminal or Command Prompt window and navigate to the folder with the Python script in it. Then execute the following command.

python3 rock-paper-scissors.py

What’s your record for consecutive wins?

A Big Bang Theory Challenge

If you think you’ve mastered the tutorial and feel comfortable working with the Python code, I have a challenge for you. Watch the YouTube clip from The Big Bang Theory at the top of this post. Then modify the Python code to create Sheldon’s Rock, Paper, Scissors, Lizard, Spock game. You can download the code from my Bitbucket Repository.

I’ll give you one hint to get started. Stick with entering single letters as the input. Since Scissors and Spock both start with the same letter, I would continue to denote scissors as "s" and use "k" to indicate Spock.

Future Learn Python Tutorials

In future tutorials, we’ll use some more advanced techniques to improve our Rock, Paper, Scissors game.

Add a graphical user interface (GUI) using Tkinter
Set the number of rounds instead of playing until you lose
Let 2 players play against each other instead of the computer

If you want to take a crack at any of those improvements before I can write the tutorials, please be my guest. I’d love to see what you come up with.

Conclusion

Coding your own game is one of the most fun and satisfying ways to learn Python. Because Python is so in-demand, it’s also an incredible investment you can make in yourself. Have you built anything cool with Python recently? Let me know either in the comments or on my Facebook page.

Top Photo: Rugged Cliffs Along the Pacific Coastline
Newport, Oregon – August, 2017

The post Learn Python the Fun Way: A Simple Rock-Paper-Scissors Game appeared first on Matthew Gove Blog.

Python Archives - Matthew Gove Blog

How to Bulk Edit Your Photos’ EXIF Data with 10 Lines of Python

Why Do You Need to Edit EXIF Data?

Available Tools to Bulk Edit EXIF Data

Python Image Libraries

Install the Pillow and Exif Libraries (If You Haven’t Already)

Import the Image Property from both the Pillow and Exif Libraries into Your Python Script

Images for this Demo

Back Up Your Images Before You Begin

Reading EXIF Data with Python

Define the Universal Parameters You’ll Use to Extract EXIF Data in the Python Script

Read EXIF Data with the Pillow Library

Final Pillow Code

A Quick Word on Interpreting the GPS Output

Read EXIF Data with the Exif Library

Final Exif Library Code

Writing, Editing, and Updating EXIF Data Using Python

Universal Tags We’ll Use Throughout the Python Script

How to Edit EXIF Data with the Pillow Library

How to Edit EXIF Data with the Exif Library

Confirming Your EXIF Edits Worked

Download the Code in This Tutorial

Conclusion

How to Remove Noise from Photos with 14 Lines of Python…and Blow Lightroom Out of the Water

The Problem with Noise in Low Light Photos

Poor Composition Leads to Noise in Photos

What Causes Noise: A Crash Course in ISO Levels

Even Professional Image Processors Like Adobe Lightroom Can Only Do So Much to Remove Noise from Your Photos

Python Has Powerful Image Processing Capabilities

How Python Algorithms Remove Noise From Photos

Example: Removing Noise from COVID-19 Data

How to Average Values with Python to Remove Noise in Photos

What Does This Method of Removing Noise From Photos Look Like in Python

Example: Dusk in the Oregon Pines

A Travel Photography Problem: What Happens If You Only Have a Single Shot and It’s Impossible to Recreate the Scene to Get Multiple Shots?

How Does Our Python Script Hold Up Against Adobe Lightroom’s Noise Reduction?

Lightroom vs. Python Comparison on a Level Playing Field

Want to Try the Python Script Yourself?

Conclusion

How to Boost Your GIS Productivity with Python Automation in 5 Minutes

Automate Your Desktop GIS Application

Automate Your Web-Based GIS Application

Remote Sensing Automation with Python

Data Entry Automation with Python

Data Analysis Automation with Python

How to Trigger Your GIS Automation

Trigger Your Automation to Run at a Set Time

Trigger the Script to Run When a Specific Event Occurs

Don’t Forget to Test Your Automation Scripts Before Putting Them into Production

Use Creativity and Innovation in Your Python Automation

Conclusion

How to Automate Region Mapping in TerriaJS with 49 Lines of Python

What You’ll Need

What is a Region Mapping File

Convert all Shapefiles to GeoJSON Before Automating Region Mapping

First, Input Python’s Built-In json Library

First, Write a Python Function to Extract the Feature ID from each Feature in the GeoJSON

Define Your Input Parameters

Define the Output Path for Your Region Map File

Read the GeoJSON into Python

Sort the GeoJSON features by Feature ID

TerriaJS Region Maps are just Arrays of Sorted Feature ID’s

Assemble the full TerriaJS Region Mapping JSON in a Python Dictionary

Finally, Write the TerriaJS Region Mapping JSON to a .json File

Add Your New TerriaJS Region Mapping to the Configuration File

Create a Dummy CSV File to Test it Out

Conclusion

13 Stunning Examples Showing How Easy It Is to Spread Disinformation without Manipulating Any Data

How Do We Create and Spread Disinformation?

One of My First Memorable Encounters with Real World Disinformation

The Most Insidious Way to Spread Disinformation: A Look at the 2020 Election and the COVID-19 Pandemic

Disinformation Spread in the 2020 Election: It All Starts with a Simple Map

There’s A Lot This Map Does Not Show

What the Map Shows

What the Map Doesn’t Show

Introduce Population and Vote Tallies into the Map to Improve It

But Wait, Trump Won the 2016 Election 306-232. This Map Doesn’t Reflect That!

A Look at 2004: The Most Recent Election the Republican Candidate Won the Popular Vote

So Can We Create an Electoral College Map That Does Not Spread Disinformation?

Can Any Maps Debunk the Spread of Election Disinformation and Conspiracy Theories?

Import the `Image` Property from both the Pillow and Exif Libraries into Your Python Script

First, Input Python’s Built-In `json` Library

Finally, Write the TerriaJS Region Mapping JSON to a `.json` File