• wheezy@lemmy.ml
    link
    fedilink
    arrow-up
    52
    arrow-down
    1
    ·
    edit-2
    8 hours ago

    I know people joke around about this a lot. But I was blown away by a recent switch to I did from using cv2 with python to using cv2 with C++.

    I had literally hundreds of thousands of images to analyze for a dataset. My python script would have taken 12 hours.

    I ported it to C++ and it literally destroyed it in 20 minutes.

    I’m sure I was doing something that really wasn’t optimized well for python. I know somewhere in the backend it probably was using a completely different library with multi thread optimization. Or maybe turbojpg is just garbage in python. I’m still not even sure what the bottleneck was. I don’t know enough to really explain why.

    But holy shit. I never had that much of a performance difference in such a simple task.

    Was very impressed.

    • bitwolf@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      5
      ·
      4 hours ago

      I had a similar experience processing PDFs of building plans. 4-8k PDFs, took 5-10minutes in Python.

      I ended up switching to node.js and it processed the same PDFs in 120 seconds.

      Over the years I only really find Python useful for interviewing and occasionally in ci pipelines.

      • wheezy@lemmy.ml
        link
        fedilink
        arrow-up
        3
        ·
        1 hour ago

        It’s a great tinkering language. Which is a lot of what I do for personal projects no one else will ever see. I find it’s biggest strength is also it’s biggest weakness. It’s really easy to write that you assume you don’t have to care about under the hood stuff.

        But something as simple as using a list instead of a set can turn a 2 minute script into a 2 hour script pretty quickly.

        I remember when I first started using it I was working with building a list and then comparing elements of another list to see if it was contained.

        My list was static in the comparison so I just did a “x in Y”.

        Y was massive though.

        If I was using any other language I would have thought about the data type more and obviously use a set/hash for O(1) lookup. But since I was new to python I didn’t even think about it because it didn’t seem to give a fuck about data types.

        A simple set_a = set(list_a) was all I needed. I think python is so easy to pick up that no one even bothers to optimize what they are writing. So you get even worse performance than it should have.

    • jsomae@lemmy.ml
      link
      fedilink
      arrow-up
      25
      ·
      7 hours ago

      Exactly what this comic is saying. C++ can handle in 20 minutes what takes python 12 hours, but something gets destroyed.

      • wheezy@lemmy.ml
        link
        fedilink
        arrow-up
        9
        ·
        edit-2
        5 hours ago

        Was a personal project. But absolutely. Half my job is trying to explain why something is taking so long when in reality I actually it’s already done I just don’t want to do nothing for the next few days.

        Managers never really know. And other engineers don’t care. It’s all about balancing expectations.