Skip to content

Abd-elr4hman/Data-Science

Repository files navigation

DataScience Projects

Repository containing DataScience projects.

Contents

  • U.S. Patent Phrase to Phrase Matching - Kaggle: Notebooks created for the kaggle PTP matching competition. EDA, Siamese LSTM network, 'Bert-for-patents' using Hugging face and Keras, Sentence-Transformers with 'AI-Growth-Lab/PatentSBERTa'.
  • Distributed semantic representations: In this project I expirement with distributed semantic representations on different analogy tests using Word2vec, Glove50d and Glove100d implementations.
  • Text_sentiment_analysis_with_spark: This project presents a text sentiment analysis pipeline implementation using Pyspark to classify tweets polarity (positive/negative), then applies the Pipeline to streaming tweets from twitter API using spark’s structured straming, and streams the output to parquet files.
  • Recommender_System: This project uses Pyspark ALS to predict movie recommendations for users based on the Movielens dataset.
  • Analysis_of _40_Years_of_Evolution_data: Data analysis of the 40 Years of Evolution Data published by Peter and Rosemary Grant of Princeton University on 2014.
  • Heart_Disease_DecisionTree_classifier: This project Uses Decision Trees to classify the Heart disease dataset from UCI machine learning repository.
  • Jigsaw Rate Severity of Toxic Comments: In progress work on Jigsaw Rate Severity of Toxic Comments kaggle cometition.
  • Audio_Sentiment_analysis: Audio sentiment analysis using Deeplearning.

About

Repository for DataScience projects.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published