David LaPaglia

Support Vector Machine

Support Vector Machines (SVMs) are supervised learning models used for classification and regression tasks. They work by finding the optimal hyperplane that separates data points of different classes in a high-dimensional space.

SVMs are particularly effective in high-dimensional spaces and are robust to overfitting, especially in cases where the number of dimensions exceeds the number of samples.

In this project, I used SVMs to classify climate-related text data. The SVM model was trained on the preprocessed data, and the performance was evaluated using metrics such as accuracy, precision, recall, and F1-score.

Here is an example of how my cm looks:

Then I remembered my applied machine learning class where I can use sklearn's grid search cv, to find the best possible hyperparameters (those tiny settings) so I did so, and I got the best results for this module so far:

As you can see, the SVM model performed better than the Decision Tree model, and the Multinomial Naive Bayes model, but there is still room for improvement.

45% was the best accuracy I found between all my models. The differences between these two SVMs are big, one uses a linear kernel and the other a radial bias function kernel, for more dynamic data. I also decreased the regularization when I got 45% accuracy.

Despite all of this, I would argue still, that I need to review and regather more data based on more distinct labels.

Sentiment Analysis:

In addition to classification, I also performed sentiment analysis on the text data using SVMs. The goal was to determine the sentiment (positive, negative) expressed in the text related to climate change.

Sentiment analysis can provide valuable insights into public opinion and attitudes towards climate-related issues. The SVM model was trained on (manually) labeled sentiment data, and the performance was evaluated using metrics such as accuracy, precision, recall, and F1-score.

Here is an example of how my cm looks:

The issue with manually labeling data is that as a human, I carry inhernent bais. That can massively lead to analysis that isn't 100% factual.

Garbage in, garbage out. This sentiment analysis is proof of good data leading to good results and the ability for the model to learn well becomes evident.

look at my code and data collected on my github here:

GitHub Code