I know people joke around about this a lot. But I was blown away by a recent switch to I did from using cv2 with python to using cv2 with C++.
I had literally hundreds of thousands of images to analyze for a dataset. My python script would have taken 12 hours.
I ported it to C++ and it literally destroyed it in 20 minutes.
I’m sure I was doing something that really wasn’t optimized well for python. I know somewhere in the backend it probably was using a completely different library with multi thread optimization. Or maybe turbojpg is just garbage in python. I’m still not even sure what the bottleneck was. I don’t know enough to really explain why.
But holy shit. I never had that much of a performance difference in such a simple task.
It’s a great tinkering language. Which is a lot of what I do for personal projects no one else will ever see. I find it’s biggest strength is also it’s biggest weakness. It’s really easy to write that you assume you don’t have to care about under the hood stuff.
But something as simple as using a list instead of a set can turn a 2 minute script into a 2 hour script pretty quickly.
I remember when I first started using it I was working with building a list and then comparing elements of another list to see if it was contained.
My list was static in the comparison so I just did a “x in Y”.
Y was massive though.
If I was using any other language I would have thought about the data type more and obviously use a set/hash for O(1) lookup. But since I was new to python I didn’t even think about it because it didn’t seem to give a fuck about data types.
A simple set_a = set(list_a) was all I needed. I think python is so easy to pick up that no one even bothers to optimize what they are writing. So you get even worse performance than it should have.
Was a personal project. But absolutely. Half my job is trying to explain why something is taking so long when in reality I actually it’s already done I just don’t want to do nothing for the next few days.
Managers never really know. And other engineers don’t care. It’s all about balancing expectations.
I know people joke around about this a lot. But I was blown away by a recent switch to I did from using cv2 with python to using cv2 with C++.
I had literally hundreds of thousands of images to analyze for a dataset. My python script would have taken 12 hours.
I ported it to C++ and it literally destroyed it in 20 minutes.
I’m sure I was doing something that really wasn’t optimized well for python. I know somewhere in the backend it probably was using a completely different library with multi thread optimization. Or maybe turbojpg is just garbage in python. I’m still not even sure what the bottleneck was. I don’t know enough to really explain why.
But holy shit. I never had that much of a performance difference in such a simple task.
Was very impressed.
If you had let me write the C++ code, I could have literally destroyed your dataset in a couple of seconds.
I don’t even need C++ for this just give me a rm with a side dish of -rf
I had a similar experience processing PDFs of building plans. 4-8k PDFs, took 5-10minutes in Python.
I ended up switching to node.js and it processed the same PDFs in 120 seconds.
Over the years I only really find Python useful for interviewing and occasionally in ci pipelines.
It’s a great tinkering language. Which is a lot of what I do for personal projects no one else will ever see. I find it’s biggest strength is also it’s biggest weakness. It’s really easy to write that you assume you don’t have to care about under the hood stuff.
But something as simple as using a list instead of a set can turn a 2 minute script into a 2 hour script pretty quickly.
I remember when I first started using it I was working with building a list and then comparing elements of another list to see if it was contained.
My list was static in the comparison so I just did a “x in Y”.
Y was massive though.
If I was using any other language I would have thought about the data type more and obviously use a set/hash for O(1) lookup. But since I was new to python I didn’t even think about it because it didn’t seem to give a fuck about data types.
A simple
set_a = set(list_a)
was all I needed. I think python is so easy to pick up that no one even bothers to optimize what they are writing. So you get even worse performance than it should have.Exactly what this comic is saying. C++ can handle in 20 minutes what takes python 12 hours, but something gets destroyed.
But make sure to bill for the full 12 hours.
Was a personal project. But absolutely. Half my job is trying to explain why something is taking so long when in reality I actually it’s already done I just don’t want to do nothing for the next few days.
Managers never really know. And other engineers don’t care. It’s all about balancing expectations.