"Generative AI" or "generative artificial intelligence" is a
term for the "large language model" (LLM) style of predictive content generation. They started out as
predictive text but are now used for images, audio, video, etc. You can read more about it on the
Wikipedia page for
genAI, but here's my best, simple explanation based on having followed LLM news as a non-expert since
around 2020:
The explainers I've read used text as the basis for their explanation, so I'll use
that as I understand that best.
Generative AI/LLM systems "generate" (i.e. "guess") what would be
said next based on their "training data". Training data is the text, images, video, etc. which are put into
the dataset that people use to train the LLM.
Instead of having words or phrases
as the way they build their guesses (e.g. the way a human person would), they work in chunks of (usually
four) characters at a time. This is just enough information to be useful in guessing the next thing, while
not being so much that it becomes inflexible and unable to deal with new
information.
At its core, generative AI works by looking at a whole
bunch of stuff that already exists, and then, when given a prompt, guessing what would fit based on the
stuff it already knows about. This leads to several problems:
- Any biases inherent in the training data will be replicated in the
output of the LLM. Racism, sexism, ableism... it's all magnified by the data.
- The dataset is full of men but doesn't contain many images of women
and none tagged as nonbinary? Now the LLM thinks most humans are men.
- The dataset is full of clothed men and scantily clad women in sexually
suggestive poses? Now the LLM thinks female nudity is inherently sexual and is more likely to
categorize femme people showing any skin as a sexually suggestive image.
- The dataset is mostly full of white faces, and
most of the darker (color/hue) images are of animals? Now the LLM is likely to think that Black people
are not human (BBC article, Nature Article, Radford,
A. et al.).
- Generative AI needs a lot of data in order to work. A lot of the early
LLMs used Reddit for the data, which meant what the LLM would generate was constrained by whatever
moderation policies kept the worst overtly racist, sexist, queerphobic/queermisic, etc. content out of
the subreddits. This let early LLMs seem kind of wholesome if you stuck to their text outputs.
- As LLM makers have needed more and more data, they've started
"scraping" (grabbing content from) anywhere they can possibly get it. If you've spent much time online
at all, either you're very aware that there's a lot of really terrible stuff said online, or you've
been lucky enough to exist in well-moderated spaces where the truly vile shit is kept away by admins
and moderators.
- LLMs don't understand context or jokes, and are likely to
misunderstand the same word or phrase used in a different context (e.g. "clean" as in removing dirt or
debris, "clean" as in prepare for cooking), and different words used for the same underlying thing
(e.g. "terrorist" and "freedom fighter" as terms for the same group, depending on whether the speaker
is sympathetic to the group's actions and/or goals). This issue is especially pronounced when LLMs are
used to summarize information on the same topic from a variety of sources, mixing together different
meanings or not collecting otherwise relevant information...
- People who may have agreed to something being shared in one context
(e.g. posting a story online for people to read for free) have not consented to it being used in
another context (e.g. building a LLM that claims to generate text mimicking someone's writing style).
This is even more egregious when the work is copyright protected, and so the LLM-builders are
specifically stealing other people's work in order to try and get around paying them for training
data.
- Additional problems accrue at the intersections of these issues and
others which are inherent to the nature of LLMs as programs which guess what an average respondent is
likely to say based on the prompt.
TL;DR:The best case scenario is that generative AI takes
information people freely chose to provide to it, and then makes guesses based on that information as to what
someone might say in a given situation as outlined by the prompt. In practice, the information is usually
stolen, the outputs are frequently incorrect and full of bigotries, and the whole thing is a massive
(capitalist) project to avoid paying people for their work.