Is there anyway to make it use less at it gets more advanced or will there be huge power plants just dedicated to AI all over the world soon?

  • vrighter@discuss.tchncs.de
    link
    fedilink
    arrow-up
    5
    ·
    15 hours ago

    that’s why they need huge datacenters and thousands of GPUs. And, pretty soon, dedicated power plants. It is insane just how wasteful this all is.

    • hisao@ani.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      15 hours ago

      So do they load all those matrices (totalling to 175b params in this case) to available GPUs for every token of every user?

      • vrighter@discuss.tchncs.de
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        14 hours ago

        yep. you could of course swap weights in and out, but that would slow things down to a crawl. So they get lots of vram (edit: for example, an H100 has 80gb of vram)

        • hisao@ani.social
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          2
          ·
          13 hours ago

          I also asked ChatGPT itself, and it listed a number of approaches, and one that sounded good to me is to pin layers to GPUs, for example we have 500 GPUs: cards 1-100 have permanently loaded layers 1-30 of AI, cards 101-200 have permanently loaded layers 31-60 and so on, this way no need to frequently load huge matrices itself as they stay in GPUs permanently, just basically pipeline user prompt through appropriate sequence of GPUs.

          • howrar@lemmy.ca
            link
            fedilink
            arrow-up
            3
            ·
            10 hours ago

            I can confirm as a human with domain knowledge that this is indeed a commonly used approach when a model doesn’t fit into a single GPU.