Repository containing DataScience projects.
- U.S. Patent Phrase to Phrase Matching - Kaggle: Notebooks created for the kaggle PTP matching competition. EDA, Siamese LSTM network, 'Bert-for-patents' using Hugging face and Keras, Sentence-Transformers with 'AI-Growth-Lab/PatentSBERTa'.
- Distributed semantic representations: In this project I expirement with distributed semantic representations on different analogy tests using Word2vec, Glove50d and Glove100d implementations.
- Text_sentiment_analysis_with_spark: This project presents a text sentiment analysis pipeline implementation using Pyspark to classify tweets polarity (positive/negative), then applies the Pipeline to streaming tweets from twitter API using spark’s structured straming, and streams the output to parquet files.
- Recommender_System: This project uses Pyspark ALS to predict movie recommendations for users based on the Movielens dataset.
- Analysis_of _40_Years_of_Evolution_data: Data analysis of the 40 Years of Evolution Data published by Peter and Rosemary Grant of Princeton University on 2014.
- Heart_Disease_DecisionTree_classifier: This project Uses Decision Trees to classify the Heart disease dataset from UCI machine learning repository.
- Jigsaw Rate Severity of Toxic Comments: In progress work on Jigsaw Rate Severity of Toxic Comments kaggle cometition.
- Audio_Sentiment_analysis: Audio sentiment analysis using Deeplearning.