#TidyTuesday is a weekly data project aimed at the R ecosystem, aimed at developing skills in the tidyverse ecosystem. The Tidier.jl implementation aims to bring the tidyverse ecosystem to Julia!
I’ll be processing and visualising some of the #TidyTuesday data sets in Julia here!
2023/05/02: The Portal Project
The Portal Project is a long-term ecological research site studying the dynamics of desert rodents, plants, ants and weather in Arizona. This chart shows the number of Merriam’s kangaroo rats in 8 different plots. Plots 3, 15, 19, and 21 are exclosure plots whilst the rest are control plots. Merriam’s kangaroo rat has been surveyed the most often, especially in control plots.
Since the first London Marathon in 1981, the number of people applying for a place in the race has drastically increased - especially in recent years. Over 450, 000 people applied for the 2020 race, which ended up taking place with only elite athletes due to Covid-19.
usingTidierusingUrlDownloadusingDataFramesusingAlgebraOfGraphics, CairoMakielondon_marathon =urldownload("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-04-25/london_marathon.csv") |> DataFrame ;plot_data =@chain london_marathon begin@select(Year, Applicants)@filter(Applicants !="NA")endplot_data[!,:Applicants] = [parse(Int,x) for x in plot_data[!,:Applicants]] xy1 =data(plot_data) *mapping(:Year, :Applicants) *visual(BarPlot, color=:black, width=0.1)xy2 =data(plot_data) *mapping(:Year, :Applicants) *visual(Scatter, color="#e00601")with_theme(theme_ggplot2()) dodraw(xy1 + xy2; axis=(; title="London Marathon", ylabel="Number of applicants", xlabel=""))end
2023/04/18: Neolithic Founder Crops
Eight founder crops — emmer wheat, einkorn wheat, barley, lentil, pea, chickpea, bitter vetch, and flax — have long been thought to have been the bedrock of Neolithic economies. The world map below shows site locations considered in the Origins of Agriculture database, with sites highlighted based on their highest proportion of crops from different categories shown in the magnified versions on the right.
The line chart shows the production (in millions) of cage-free organic eggs in the USA. The data used in this infographic is based on reports produced by the United States Department of Agriculture, which are published weekly or monthly.
WARNING: using Makie.plots in module Main conflicts with an existing identifier.
2023/03/28: Time Zones
Time zones tend to follow the boundaries between countries and their subdivisions instead of strictly following longitude. For every one-hour time, a point on the earth moves through 15 degrees of longitude. Each point relates to one of 337 time zones listed in the IANA time zone database. The colours show which time zones are in Africa, America, Antarctica, Asia, Atlantic, Australia, Europe, Indian, and Pacific zones.
usingUrlDownloadusingDataFramesusingGeoMakie, CairoMakieusingColorsusingGLMakietimezones =urldownload("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-03-28/timezones.csv") |> DataFrame ;lons =-180:180lats =-90:90fig =Figure()ax =GeoAxis(fig[1,1], title ="Time Zones of the World")usingGeoMakie.GeoJSONcountries_file =download("https://datahub.io/core/geo-countries/r/countries.geojson")countries = GeoJSON.read(read(countries_file, String))poly!(ax, countries; strokecolor ="#2F4F4F", strokewidth =0.5, color="#b2cfcf")slons = timezones[:, "longitude"]slats = timezones[:, "latitude"]scatter!(slons, slats, color="#E30B5C", markersize=10)fig
2023/03/21: Programming Languages
Of the 4,303 programming languages listed in the Programming Language DataBase, 205 use //, 101 use #, and 64 use ; to define which lines are comments. 3,831 languages do not have a comment token listed. The plots below show when a language first appeared, and when its last activity was.
The European Medicines Agency (EMA) is the official regulator that directs drug development for both humans and animals, and decides whether to authorize marketing a new drug in Europe or not. Medicines for dogs are being authorised at a faster rate compared to other animals including pigs, cats, and chickens.
usingTidierusingUrlDownloadusingDataFramesusingPyPlotdrugs =urldownload("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-03-14/drugs.csv") |> DataFrame ;plot_data =@chain drugs begin@select(therapeutic_area, authorisation_status)@filter(therapeutic_area in ["Epilepsy","HIV Infections","Parkinson Disease","Diabetes Mellitus","Pulmonary Disease, Chronic Obstructive"])@filter(authorisation_status =="authorised")@group_by(therapeutic_area)@summarize(n =nrow())@ungroup@arrange(n)endbarh(plot_data[:, :therapeutic_area], plot_data[:, :n], color="#508080", align="center", alpha=0.5)suptitle("European Drug Development");title("Number of drugs authorised for use in treatment of each condition.");xlabel("Number of authorisations")grid("on")
2023/03/07: Numbats
Numbats are small, distinctively-striped, insectivorous marsupials found in Australia. The species was once widespread across southern Australia, but is now restricted to several small colonies in Western Australia. They are therefore considered an endangered species. The calendar below shows thenumber of sightings of numbats per day between 2016 and 2022, using data from the Atlas of Living Australia. The full dataset includes data from 1856 to 2023 and, of the 805 observations, only 552 had dates recorded. Therefore the calendar may not reflect all numbat sightings.
Over 100,000 tweets in 14 different African languages were analysed to uncover the sentiment of the text. Sentiment analysis was performed and each tweet was labelled as either positive, negative, or neutral. Nigerian pidgin is particularly notable for its very few neutral tweets.