Lernzettel: AI Language Interaction and Technologies

AI Practice and Technologies Interacting with Humans and the Real World — Revision Sheet

1. 📌 Essentials

  • NLP enables machines to interpret, understand, and generate human language naturally.
  • Classical NLP pipeline: tokenization, morphology/POS tagging, syntax, semantics, pragmatics.
  • Word embeddings (static and contextual) represent words as vectors capturing meaning.
  • Transformersized NLP with parallel processing and self-attention.
  • Large Language Models (LLMs): BERT, GPT, T5, capable of understanding and generating language.
  • Practical tools: spaCy (fast, rule-based), Hugging Face (state-of-the-art neural models).
  • Responsible NLP: address bias, fairness, privacy, and energy consumption.
  • Challenges include ambiguity, context-dependence, and intent recognition.
  • Hierarchical flow: raw text → structured understanding → response or action.
  • Future trends: instruction tuning, retrieval augmentation, multilingual models, safety.

2. 🧩 Key Structures & Components

  • Tokenization — splits text into units (words, subwords).
  • Morphology & POS tagging — identifies grammatical forms and parts of speech.
  • Syntax parsing — builds sentence structure (trees, dependencies).
  • Semantics mapping — assigns meaning to words/phrases.
  • Pragmatics — infers speaker intent based on context.
  • Word Embeddings — dense vector representations of words.
  • Static embeddings — Word2Vec, GloVe; limited polysemy handling.
  • Contextual embeddings — BERT, GPT; dynamic, context-aware.
  • Neural sequence models — RNNs, LSTMs, attention mechanisms.
  • Transformers — parallel, self-attention-based models.
  • Large Language Models — encoder-only, decoder-only, encoder–decoder architectures.
  • Tools — spaCy, Hugging Face Transformers.

3. 🔬 Functions, Mechanisms & Relationships

  • Pipeline flow: raw text → tokenization → morphology/POS → syntax parsing → semantics → pragmatics.
  • Embeddings: convert words into vectors; proximity indicates similarity.
  • Static vs. contextual embeddings: static (Word2Vec) are fixed; contextual (BERT) change with context.
  • Neural models: process sequences, with RNNs/LSTMs capturing order; attention highlights relevant info.
  • Transformers: use self-attention to model global context in parallel.
  • LLMs: scale models for diverse tasks—classification, translation, generation.
  • Tools: implement NLP tasks efficiently; spaCy for speed, Hugging Face for flexibility.
  • Responsible NLP: balance accuracy, interpretability, and ethical considerations.

4. 📊 Comparative Table

ItemKey FeaturesNotes / Differences
Classical NLP pipelineTokenization → Morphology/POS → Syntax → Semantics → PragmaticsLayered analysis from raw text to meaning
Bag of Words / TF–IDFUnordered, simple, fast; weights important wordsIgnores word order and structure
Static embeddingsWord2Vec, GloVe; fixed vectors for wordsLimited by polysemy; context-independent
Contextual embeddingsBERT, GPT; dynamic, context-dependentHandle polysemy; adapt meaning based on context
Neural sequence modelsRNNs, LSTMs; process sequences with memoryStruggle with long dependencies
Attention mechanismsFocus on relevant parts of inputImprove relevance in sequence processing
TransformersParallel, self-attention; foundation of modern NLPEfficient, scalable, handle long-range dependencies
Large Language ModelsEncoder-only (BERT), decoder-only (GPT), encoder–decoder (T5)Capable of understanding and generating language

5. 🗂️ Hierarchical Diagram (ASCII)

NLP & HCI
 ├─ Interaction paradigms
 │   ├─ Button/menu commands
 │   └─ Natural language understanding
 ├─ Classical pipeline
 │   ├─ Tokenization
 │   ├─ Morphology & POS
 │   ├─ Syntax parsing
 │   ├─ Semantics mapping
 │   └─ Pragmatic inference
 ├─ Word representations
 │   ├─ Bag of Words / TF–IDF
 │   ├─ Static embeddings (Word2Vec, GloVe)
 │   └─ Contextual embeddings (BERT, GPT)
 ├─ Neural models
 │   ├─ RNNs / LSTMs
 │   ├─ Attention mechanisms
 │   └─ Transformers
 ├─ Large language models
 │   ├─ Encoder-only (BERT)
 │   ├─ Decoder-only (GPT)
 │   └─ Encoder–decoder (T5, BART)
 ├─ Practical tools
 │   ├─ spaCy
 │   └─ Hugging Face
 └─ Responsible NLP
     ├─ Accuracy vs interpretability
     ├─ Bias, fairness, privacy
     └─ Sustainability

6. ⚠️ High-Yield Pitfalls & Confusions

  • Confusing static and contextual embeddings; static cannot handle polysemy well.
  • Overlooking the importance of syntax parsing in semantic understanding.
  • Assuming larger models always outperform smaller ones without considering resource constraints.
  • Misinterpreting bag of words as capturing syntax or context.
  • Ignoring bias and fairness issues in large models.
  • Believing tokenization is trivial; it varies greatly across languages.
  • Overestimating the interpretability of neural models.
  • Confusing encoder-only (BERT) with decoder-only (GPT) architectures.

7. ✅ Final Exam Checklist

  • Understand the stages of the classical NLP pipeline.
  • Differentiate between static and contextual word embeddings.
  • Know key models: RNNs, LSTMs, Transformers, BERT, GPT.
  • Be familiar with practical NLP tools: spaCy, Hugging Face.
  • Recognize the importance of responsible AI: bias, fairness, privacy.
  • Comprehend how attention mechanisms improve relevance.
  • Know evaluation metrics: F1, BLEU, ROUGE.
  • Be aware of sustainability practices: model compression, caching.
  • Understand future trends: instruction tuning, retrieval-augmented generation.
  • Grasp the hierarchical flow from raw text to meaningful response.
  • Recognize challenges: ambiguity, context-dependence, multilinguality.
  • Know the differences between model architectures and their applications.
  • Be prepared to discuss ethical considerations in deploying NLP systems.

End of Revision Sheet

Teste dein Wissen

Teste dein Wissen zu AI Language Interaction and Technologies mit 9 Multiple-Choice-Fragen mit detaillierten Korrekturen.

1. What is the primary goal of natural language processing (NLP) in human–computer interaction?

2. Which of the following models is known for its encoder-only architecture that is capable of understanding and generating language, and has been mentioned in the revision sheet?

Quiz machen →

Mit Karteikarten lernen

Merke dir die Schlüsselkonzepte von AI Language Interaction and Technologies mit 10 interaktiven Karteikarten.

NLP — core task?

Extract meaning from human language

NLP — definition?

Enables machines to interpret and generate human language.

Classical NLP pipeline — step?

Tokenization, morphology, syntax, semantics, pragmatics

Karteikarten ansehen →

Similar courses

Erstelle deine eigenen Lernzettel

Importiere deinen Kurs und die KI erstellt in 30 Sekunden Lernzettel, Quizze und Karteikarten.

Lernzettel-Generator