Learning to Get Over Roadblocks in Coding

One of my biggest roadblocks with my first Python project, the Trail Usage Analysis project, was getting through some lines of code that took forever to run. Being inexperienced, at first I thought maybe I just needed to be patient, so I would hit “run” on my Jupyter notebook and then go cook dinner. Inevitably, when I would get back I would find that nothing had changed. This prevented me from moving forward with my project, and my frustration grew. The dataset had about 2000 records, so it was large, but I knew there must be a way to work with much larger datasets if I was going to get serious about data science.

The first question to answer was: Does this code work with even a very small dataset, or is it in some kind of infinite loop where it will not work no matter the size? Thus, I created a new list with just five records and I ran the code with that list. It worked fine, but even that wasn’t instantaneous as it seemed like it should be. So I knew the code wasn’t completely wrong, but that something was still taking longer than it should.

Photo by Patrick Tomasso on Unsplash

The second question to answer was: which line of code is causing the problem? It’s hard when the problem is that something is taking too long, because it doesn’t give you an error message you can Google. At the time I had too many lines of code in one block, and I have since learned it’s a best practice to split these up as much as possible. I split the code up so each function was in a separate code block. This allowed me to pin the problem on one function I had written earlier in the program.

Through trial and error with each individual segment of the problem, I identified the function that was slowing things down the most. I figured it might have to do with the fact that it was parsing through this big dictionary with every iteration. But I still did not know how to write it so that it wouldn’t need to do that.

With my questions defined, I turned to StackOverflow. Most questions I’d had throughout this process could be answered with existing threads, as there’s nearly always someone out there with your same question. However, in this case I was not able to find my answer in threads about generic slowness with dictionaries in Python, and I really needed someone to look at my code and suggest a better way.

The problem was that I could not post my whole notebook to StackOverflow, and I soon found out that pasting one cell of code without all of its dependencies did not provide the Minimal, Reproducible Example, or reprex, that StackOverlow users need to be able to reproduce and understand the problem. My question was closed but I was given the chance to edit it. This required me to re-work the code, paste a small snippet of my data, and trim down both the data and the code to only those parts truly necessary to understand my problem.

Photo by Sigmund on Unsplash

Of course, the next problem was that with a very trimmed-down data source and code, it executed just fine. How could I reproduce a problem in a simple way, when the very complexity was what caused the problem? I finally posted a message to StackOverflow with my reprex explaining my conundrum and asking that someone see if they could identify the inefficiencies in my code even if they couldn’t run it themselves. Alas, the internet came through, and I am very grateful that I got some advice that greatly sped up my notebook. From this helpful advice, I learned a few things:

  • Cache unchanging function calls: rather than put a function inside of another function, where it will get called again and again, if possible assign a variable to that function and then just call that variable in the other function to save processing time.
  • Generators are slower than list comps: a simple syntax change sped things up greatly while iterating through many records. I still don’t fully understand what generators are, but refer back to this when needed.
  • List comprehensions can be combined: this speeds things up and cleans up code. After this experience, list comprehensions are at the top of my list to learn more about.

As I am still at the beginning of my journey, I’m so grateful for people willing to take the time to help others learn. I try to meet them halfway by doing as much research as I can ahead of time. I hope to be able to give technical advice to others in the near future.

Published by Kelly Dunn

Blogger about transportation and analytics.

One thought on “Learning to Get Over Roadblocks in Coding

Leave a comment