LAB

What


Since my knowledge of data science has been growing, I’ve been coming up with new ways to satisfy my curiosities and intuitions For example, the most important words that you can find in Dante’s Divine Comedy, or how many times a newborn baby’s been fed in her first months of life.
This lab was created to show some examples of how curiosity and data science, can trasform information into knowledge

Enjoy and have fun!

2021

Vaccine location Distance

March 2021


For my job I work with billions of location data, so my team and I need a way to do and show analysis.
I found out Kepler.Gl: a powerful open source geospatial analysis tool for large-scale data sets. I really feel in love with it.
It offers a complete integration with Python and Jupyter so we can manipulate, analyze and aggregate millions of data and save it in an HTML. All in a Jupyter notebook.
Then I discovered that the developers of Kepler.Gl created Unfolded.ai and, since we are evaluating many map providers to use for presentations and analysis and my team loves Kepler, I decided to try Unfolded Studio with a project based on open data about vaccine places in Italy.
Firstly I filled the entire Italy with hexagons of 65m edge lengths, then I calculated the distance between each hexagon and the closest hospital which is able to vaccine. I took this places from GitHub of italian's govern and enrich with latitude and longitude through Google'APIs.
Unfortunately on GitHub there aren't all the places able to vaccinate; my wife, who is a doctor, is vaccinating in 2 hospitals not in that list...

Some times it takes few minutes to show up...
Wait for it... (cit)
... or click here

Text Mining Divina Commedia

September 2021


The Divina Commedia is widely considered to be the pre-eminent work in italian and one of the gratest works of world literature.
I have always been impressed by how important the numbers are for Dante:
- 3 cantiche, 33 canti for each cantica.
- 14233 verses with 11 syllables
- The total number of syllables in each tercet is thus 33.
So I wanted to check some KPIs and explore new ones.
Moreover I've created a tool that if one chooses a cantica and canto, it will create a summary. I asked a friend of mine to try it and he game began:
I sent her text and she guesses the cantica and canto, so I also created a quiz area.
Have fun and text for advice of new analysis or if you discover something using that dashboard.

2020

Audio file to text

Quarantine 2020


During COVID-19 quarantine, I heard my wife speaking with her collegues about a lesson recorded and then she had to transcript it.
The lesson took 2 hours so I decided to try converting that audio into text in order to help her and her collegues.
I’ve never tried Google Speech APIs and I didn’t know about the limit of 15 minutes.
So I built a python loop that splits audio in blocks of 200 seconds and then gather all the pieces.

2019

HACKATHON TOPIC MODELING

June 2019


In June 2019 I attended a hackathon with 4 friends of mine. The hackathon was based on documents modelling and in particular composed of 3 tasks:
- Assign every document (totally ~700) to a topic, based on 6 topics available
- Create a unsupervised model to create links between documents
- Design and build a UI to explore and use analysis
The documents were given at 8a.m. and the delivery was at 7p.m. so we didn’t have much time. In the previous weeks we prepared ourselves as best as we could. We prepared beforehand many scripts and Spark environments just in case of there would be Big Data. We discovered the amazing TiKa from Apache which was used to parse data and it changed everything! Then thanks to LDA we assign topic to each documents and with cosine similarities we created the network. The main attraction was a bot on telegram that applied classification model to a text written by users. You can see it clicking the image below.

MARGHERITA'S BREASFEEDING

Last part of 2019


My wife kept notes every time she fed our little daughter (born the 2nd of september 2019). And I thought: Texts are data, and data means information, information needs dashboard. So I stole her notes to create a dashboard.
In some days there were a lack of data because of errors of data entry… I should consider firing her…
p.s. when I showed my wife my work, she stopped taking notes… -.-