Visualize my Training Sessions with Scikit-Mobility

Photo by sporlab on Unsplash

Hi everyone! This is my very first blog post. Mostly all the articles you will find here will be related to my passions: running, traveling, and human mobility analysis. I am a second-year Ph.D. student at the Free University of Bolzano, Italy, and Fondazione Bruno Kessler, Italy. During my Ph.D. I will work on cool stuff related to human mobility and deep learning. If you want to take a look at my work, please visit my website.

This post will tell you how I visualized the logs of my training using scikit-mobility [1], an open-source Python library to visualize and analyze mobility data.

Goal

The goal of this first blog post is simple: retrieve some data of my training from the app I am using for my training and visualize them using scikit-mobility in terms of
- a trajectory of a single training session
- a heatmap of the areas in which I run most of the times

Data

I started to run almost one year ago thanks to a friend that introduced me to this amazing sport and now, I collected the logs for about 150 training sessions (the ones used for the visualizations) and I downloaded them using the APIs related to the app.

In my case, the APIs return a JSON file for each training session containing, among much other information (e.g., ascent, descent, calories, …), the spatio-temporal information we are looking for: latitude, longitude, pace (speed), and a timestamp for the measurement.

Visualization

We will use scikit-mobility [1], an excellent open-source Python library, to deal with human mobility for the visualizations.

First thing first, I created an environment and I installed scikit-mobility using pip:

Scikit-Mobility provides two ways to represent mobility data: trajectories (cool if you want to represent data of an individual) or flows of people (more useful for aggregated data)

In our case, we want to use a TrajectoryDataFrame (TDF)to work with the trajectories generated by my training sessions. As written in the documentation, we are going to feed the TDF with latitudes, longitudes, datetime and trajectory_id.

After a rapid preprocessing of the JSON files downloaded through the APIs, now I have a list of lists, each containing a sample:

Now we can create the desired TDF as follows:

Plotting my trajectories is simpler than ever. I just need to call the plot_trajectory method of the TDF and, eventually, specify the parameters like the number of trajectories I want to visualize simultaneously, the number of points, or the others highlighted in the documentation.

Here is the code to visualize four of my training sessions and the output:

Example of a plot with 4 trajectories

as you can see here, there are four trajectories pictured with different colors. The colors highlight different trajectories, the green markers the start positions and the red ones the end positions. If I plot more trajectories, you can spot that more or less always run in the same areas:

Example of a plot with 40 trajectories. Look at the yellow one! The app collected my GPS position for more than one day even if I was not running!

There is just an exception: the yellow trajectory! In that case, the app collected my GPS signal for more than one day even if I wasn’t running (maybe it has something to do with the permissions of accessing the GPS?).

There are also the trajectories of my first non-competitive race (right) and the ones generated while I was on holiday in Tuscany (left)

As you saw in the previous image, as more trajectory I plot, it is more challenging to figure out if I run more on the cycleway or around the beautiful Levico’s lake (Trento, Italy).

I thought about a way to answer this question using the tools that scikit-mobility offer to us! The idea is to create a squared tessellation over the municipality of Levico and map the trajectories on it to plot a heatmap. In general, a tessellation is nothing more than a way to divide an area into meaningful subareas to aggregate the data. Here are some example on New York City (image from the paper “Deep Learning for Human Mobility: a Survey on Data and Models” [2])

Examples of tessellations

Here are the things to do:

  1. Create a squared tessellation on Levico.
  2. Map the trajectories to the tessellation.
  3. Plot the heatmap.

Regarding the first point, scikit-mobility provide a way to create a tessellation using the tiler module automatically. Here I generate a squared tessellation over Levico with squares of 1000 meters:

Step first is now completed. Regarding step 2, we need to map to points of the training sessions to the tessellation’s corresponding square. We can use the mapping function provided by skmob, but we have to change the TDF a bit before doing that. At the moment, it contains points collected with an extremely high frequency, and we do not need them for the scope of our project. Thus, given the computation cost of a spatial join, before doing that, we extract the so-called stop locations (e.g., I consider only a point for a path of 1 km):

Here is an example of the same trajectory visualized before and after the detection of the stop locations:

To map and then count the points in each cell, we use the mapping function:

The last thing to do is create a colormap to highlight with red the areas in which I spent less time and blue the ones in which I spend more time. Finally, we need to plot the heatmap using the colormap and tessellation with the precious help of Folium:

The output looks like this:

and actually reflect the reality. I have done most of my training in the cycleway taking small distances (10–12 km) direction Novaledo. In Autumn, I love to run around the lake while in Winter I usually run much less than the other periods of the years and I do it in the city center as streets are more practicable

References

[1] Pappalardo, Luca, et al. “scikit-mobility: A Python library for the analysis, generation and risk assessment of mobility data.” arXiv preprint arXiv:1907.07062 (2019).

[2] Luca, Massimiliano, et al. “Deep Learning for Human Mobility: a Survey on Data and Models.” arXiv preprint arXiv:2012.02825 (2020).

--

--

--

Ph.D. Student at the Free University of Bolzano working on human mobility and deep learning. Really like 🌍, 🍔, 🏃‍♂️

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Best 7 Data Analytics Courses on Coursera in 2021

Data Analytics in Reservoir Engineering — Review

Filtering Out the Noise: the Moving Average and its Modifications

Predicting Subscription Churn Using PySpark ML

Integrate Pipeline into Scikit-Learn’s Hyperparameter Search

Breakout: Digging Deeper into DEW Frequency (November 12 update)

Using Genetic Algorithms to Efficiently Trade the Wheel (Python)

SOCIAL MEDIA NEWS GENERATION

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Massimiliano Luca

Massimiliano Luca

Ph.D. Student at the Free University of Bolzano working on human mobility and deep learning. Really like 🌍, 🍔, 🏃‍♂️

More from Medium

Why Should I Learn About Debugging?

Satellite IoT for Science: A Brief on Recent Advances in the Field

Mapping Marlowe: Using Natural Language Processing to Chart Places in the Novels of Raymond…

Converting your Research Work into a Software Product? This might help. Part I