Here you'll find mostly packages for the R programming language I've been working on. Some of them have been created for the European Data Journalism Network - EDJNet, many others have been developed for a variety of more or less serious reasons.
They include:
tidywikidatar
- Interact with Wikidata and get tidy data frames in response (it's also on CRAN)ganttrify
- Create beautiful Gantt charts with ggplot2 (far from perfect, but apparently very popular, with hundreds of Github users kind enough to give it a star)castarter
- Content analysis starter toolkit for R - is a more modern, fully-featured, and consistent iteration of a package dedicated to text mining (castarter.legacy
- legacy version, still functioning and available). It facilitates text mining and web scraping by taking care of many of the most common file management issues, keeps tracks of download advancement in a local database, facilitates extraction through dedicated convenience functions, and allows for basic exploration of textual corpora through a Shiny interface. It is fit for daily use and it has rich documentation, although more is planned for covering a larger range of use cases.quackingllama
- Use LLM deployed locally to categorise text and images making use of schema-based consistent structured outputs or conduct other generative-LLM related task. Workflow optimised for text categorisation based on local deployments and caching. Caching with DuckDB, LLM must be deployed with Ollama, hence the package name.ytdlpr
- R wrapper for yt-dlp, focused on extracting and processing subtitles of videos posted on YouTube, allowing e.g. to extract all video segments posted by a user including a given keyword.latlon2map
- Facilitates matching lat/lon data with administrative units and other geographic shapes (it also includes a lot of convenience functions for downloading and caching geo-spatial datasets... not a beauty, but it gets its job done and I use it in so many of my everyday projects)rbackupr
- An R package to backup to Google Drive with limited permissions, useful e.g. for uploads from remote servers; speedy, thanks to local caching of metadata (not fully documented, but reasonably functional)nomnomlgraph
- Createnomnoml
diagrams in R based on data frames with edges and nodesriskviewer
- riskviewer - Show risks and probability in real world contexts (conceptually, this may be one of the most valuable things I've worked on. Check out the theoretical background - and give a quick spin to the shiny app showcasing a basic functionality - unfortunately, the package does not yet work consistently but I hope to make it better)networkedwebsitesdetector
- A structured approach for finding networked websites (I don't even know what to say... this is kind of great, but also I never had the time to really polish and finalise it, so...)genderedstreetnames
- Automatically find the gender of street names, manually fix what the automatic part got wrong.streetnamer
- match street names to people or object they are dedicated to - not fully polished, but this is an advanced project, with a functioning shiny interface.shinyshoppinglist
- A shopping list app made in R shiny (this is really basic, but also, it actually works)cornucopia
- Facilitate reporting on sponsored and organic activities on Facebook, Instagram, and LinkedIn (will possibly include other platforms)
zoteror
- Access the Zotero API in R (it does what it says on the tin, reasonably functional with a clear README... I may put it on CRAN one day)plausibler
- Access Plausible Analytics API from Rhuecontroller
- Control Philips Hue lights using the R programming language (so... one night I was on my couch, and wanted to soften the lights, but didn't want to get up, so I ended up writing an R package to control lights... it even has a reasonably functional Shiny app, but it's only intensity and warmth for the time being - setting colour is possible, but not yet integrated in the shiny app)
- Prigozhin audio files, transcribed - An automatic transcription of all the audio messages posted on Prigozhin’s official Telegram channel
- textual datasets (mostly) from Russia - available in full, or as a metadata only. A more formal release will follow.
tifkremlinen
- A corpus with all items published on the website of the Kremlin (1999-2020) - (it even has a hex logo, and a DOI, so it's surely serious stuff; to be honest, I do have big plans about expanding this one)olympics2020nuts
- Retrieve details about Olympics 2020 medalists via Wikipedia and Wikidata (check out the readme for more details as well as this nice map)european_routes
- Matching data from Eurostat's datasets on flights to hubs with coordinates and data on train routes, see also step-by-step processlau_centres
- Population-weighted centres of local administrative units and consistent concordance with NUTS regions (website with all details)- When Europeans go the cinema, what do they watch? - An interactive exploration of cinema-goers' habits in Europe based on twenty years of data (1996-2016) on 40 996 films (just visuals, no dataset).
gpx2pdf
- R package and shiny interface to create a pdf printout based on a gpx track (you know, with elevation charts and black and white maps? I made it for a very specific project and started to transform into an R package but it still shows that it's half way through... perhaps still useful if somebody aims to achieve something like it)
I am happy to have contributed to some of the most used R packages. These are small contributions, but I remain nonetheless proud to be featured in the "acknowledgments" section of the release notes of tidyr
(version 1.0), readr
(version 2.0), and dbplyr
(version 2.3.0). I have also contributed to other packages, such as workflowr
, rtweet
, labeleR
, and wikidataR
, and reported confirmed bugs in others, including arrow
(one still open) and fs
.
- Animating ‘One Degree Warmer’ time series with ggplot2 and gganimate - 9 November 2018
- European Elections 2019 and Italy's varying size - 11 June 2019
- How to feel lucky on a Monday morning: calculating the travel distance between places and each point of the European population grid - 27 November 2019
- How to find the population-weighted centre of local administrative units - 27 March 2020
- Beautiful Gantt charts with ggplot2 - 4 June 2020
- Google Earth Studio as a data visualisation tool (with R) - 8 October 2020
- A new R package for exploring the wealth of information stored by Wikidata: tidywikidatar - 23 April 2021
- Visualising risk: a modern implementation of the Risk Characterisation Theatre - 29 April 2021
- Finding gendered street names. A step-by-step walkthrough with R - 16 July 2021
- The data you need to win the Olympics if you go NUTS - 3 August 2021
- I maintain a reasonably updated Docker image of Omeka S
- I am one of the main translators into Italian of Omeka S and Tropy
- Text as data & data in the text - Studying conflicts in post-Soviet spaces through structured analysis of textual contents available on-line - tadadit.xyz
More about me on my website, giorgiocomai.eu