Is there anyway to make it use less at it gets more advanced or will there be huge power plants just dedicated to AI all over the world soon?

  • hisao@ani.social
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    2
    ·
    1 day ago

    I also asked ChatGPT itself, and it listed a number of approaches, and one that sounded good to me is to pin layers to GPUs, for example we have 500 GPUs: cards 1-100 have permanently loaded layers 1-30 of AI, cards 101-200 have permanently loaded layers 31-60 and so on, this way no need to frequently load huge matrices itself as they stay in GPUs permanently, just basically pipeline user prompt through appropriate sequence of GPUs.

    • howrar@lemmy.ca
      link
      fedilink
      arrow-up
      3
      ·
      1 day ago

      I can confirm as a human with domain knowledge that this is indeed a commonly used approach when a model doesn’t fit into a single GPU.