Latent Dirichlet Allocation

LDA

Latent Dirichlet Allocation (LDA) is a generative probabilistic model used in natural language processing and text mining to discover (Unsupervised Machine Learning) topics (Topic Modeling) in a collection of documents. It assumes that each document is a mixture of topics, and each topic is characterized by a distribution over words.

Data Formatting for LDA

To perform LDA, the data must be preprocessed and formatted. This involves tokenizing the text, removing stop words, and creating a document-term matrix.

I used my favorite Vectorizer for this, CountVecotrizer from sklearn!

Here is an example of how the data looks:

Data Example

Since LDA is unsupervised, I strip the labels right from them, here you can see the labels are the subreddit that I gatherred the text data!

Data Example Data Example

Analysis visualization

After running LDA, I visualize the topics using pyLDAvis, which provides an interactive interface to explore the topics and their relationships.

Here is an example of how the data looks:

Data Example

Or you can interact with it here!

Here is a sample of the topics that I found:

Data Example

Overall Themes

Topic 1: Energy, Emissions, and Planetary Risk

Top terms: energy, nuclear, power, years, planet, world, water, earth, co2, civilization

This topic accounted for over 40% of all tokens, showing how central energy debates are to climate discussions. Terms like nuclear and power point to policy-oriented conversations around solutions and tradeoffs, while words like civilization, humans, and co2 suggest an existential framing of climate change.

This reflects public awareness of climate change as not just a scientific challenge, but a global, ethical, and technological one. Posts within this topic often express urgency, long-term thinking, and uncertainty about the future of the planet.

Topic 3: Agriculture and Ecological Threats

Top terms: years, pesticides, bugs, farm, humans, warming, wrong, habitat, problem

Topic 3 reflects concerns about the ecological and health impacts of climate change. Words like pesticides, farm, and bugs suggest a focus on agriculture and food systems, while terms like habitat and warming point to ecosystem degradation.

These discussions show how climate change is understood not only in terms of policy or weather, but also through everyday systems people depend on — such as farming and biodiversity. There is often an undertone of frustration or moral concern, reflected in words like wrong and problem.