One of the big goals of this project is to use the power of the LLM to interpret images and create photo captions. The photo captions are then used to synthesize the day’s activities.
4.1 Setup and Initialize
The usual stuff that gets the libraries and data ready for use.
Show the code
## Standard Packageslibrary(tidyverse) ## Lots of useful stufflibrary(gt) ## Make Tableslibrary(ggplot2) ## Create chartslibrary(devtools) ## Load packages from GitHub## Specialized librarieslibrary(httr) ## Send requests and receive responseslibrary(jsonlite) ## Handle the request formattinglibrary(pdftools) ## Handle PDF fileslibrary(base64enc) ## Convert image files to base64 encodinglibrary(googledrive) ## Download Google Doc fileslibrary(curl) ## The force behind the httr functions## Get the accessOAI package (do this just once)## install_github("kimbridges/accessOAI")## Initialize the accessOAI librarylibrary(accessOAI)## Initialize some things that rarely change.LLM <-"gpt-4o"LLM_alt <-"gpt-3.5-turbo"## (text only)temp <-1apiKey <-Sys.getenv("OPENAI_API_KEY")## Get Baseinfo (originates in photos chapter)baseinfo <-read.table("baseinfo.txt")source <- baseinfo$sourcefolder <- baseinfo$folderthumbs_folder <- baseinfo$thumbs_folderfiles_folder <- baseinfo$files_folder
4.2 Create Photo Captions
Here’s where we’re using the LLM to interpret each photo.
Show the code
## Setup the request for the LLM.prompt <-"The input file in a photo taken during a trip as part of a set of photos documenting the day's activities. Each photo attempts to record either an activity or a location or both. Please create a caption for the photo. Don't include any comments in your response. Here is the location and time of the photo: "## Define the LLM role.role <-"You are a photo interpretation expert. You know how to extract information like the photo location, things in the scene, the overall weather or ambiance of the scene and other relevant facts. You try to limit your responses to things in which you have confidence and avoid speculation. If provided with a time, please convert it to a 12-hour clock and use this in your response."## Get information about the photos.file_location <-paste0(files_folder,"/photo_info.txt")data <-read.table(file = file_location)## Initializing the new element in the data frame.data$caption <-""## Find out how many photos.n_photos <-nrow(data)## Get photo directoryworking_dir <-getwd()## Loop through the photos.for (i in1:n_photos){## Build the photo name. photo <-paste0(working_dir,"/",thumbs_folder,"/photo_",i,".png")## Build information to be used in the prompt. hint <-paste0("location: ",data$location[i]," time: ", data$time[i])## Enhance the prompt. full_prompt <-paste0(prompt, hint)## Reset the response. response <-NULL## Do the image analysis. response <-analyzeIMG(analysis_image = photo,AI_role = role,analysis_prompt = full_prompt,LLM = LLM,apiKey = apiKey,connecttimeout =90)## Watch progress.## cat("photo",i,":",response,"\n\n")## Add the caption to the data. data$caption[i] <- response## Pause five seconds.Sys.sleep(3)} ## end photo loop## Save another temporary data file.file_location <-paste0(files_folder,"/photo_info.txt")write.table(data, file = file_location)
4.3 Captions Table
The captions can be seen in a table.
Show the code
## Get the data.file_location <-paste0(files_folder,"/photo_info.txt")data <-read.table(file = file_location)## Extract the photo number and the caption.cap_data <- data |>select(number, caption)## Make the table.gt(cap_data) |>tab_style( style ="vertical-align:top", locations =cells_body())
number
caption
1
Enjoying a delicious Belgian waffle for breakfast at Pelikaanstraat, Antwerpen.
2
Boarding the train at Tuinstraat, ready for a day of adventures in Delft at 11:57 AM.
3
Exploring the architectural beauty of Den Haag Centraal Station at 12:19 PM.
4
Exploring Gravenstraat in Den Haag at 12:40 PM, featuring vibrant street activities and a passing tram.
5
Enjoying a leisurely lunch with wine in Den Haag at 12:55 PM.
6
Admiring the Girl with a Pearl Earring at the Mauritshuis Gallery, 15:15.
7
Exploring art at the museum in Den Haag, Netherlands at 3:18 PM.
8
Exploring the "Girl with a Pearl Earring" exhibition at the Mauritshuis Museum in Den Haag, Netherlands.
9
Visiting the historic Mauritshuis museum in The Hague at 4:10 PM.
10
Exploring the Binnenhof courtyard in The Hague under overcast skies at 4:12 PM.
11
Exploring the charming streets of Den Haag with its historic buildings and cozy cafes on a rainy afternoon.
12
Enjoying a cozy evening snack at Denneweg 130 in Den Haag.
13
Enjoying a cozy evening with a glass of Chakana wine at Denneweg 130, Den Haag.
14
Enjoying a delightful dinner at Denneweg 130, Den Haag around 6:25 PM.
15
Enjoying a beautifully plated dessert at Denneweg 130, Den Haag during the evening.
16
"Evening stroll past Dekxels restaurant on Denneweg, Den Haag – 7:49 PM on Friday, May 3, 2024."
4.4 Synthesize the Captions
Here, the set of captions and data on the movement (time and distance) are put into a synthesized story about the day.
The result is shown in the Synthesis Chapter.
Show the code
## Get the data.file_location <-paste0(files_folder,"/photo_info.txt")data <-read.table(file = file_location)## Figure out the number of entries.n_rows <-nrow(data)## Initialize the data.text_input <-""## Create a table with structured info for each photo.for (i in1:n_rows){ line1 <-paste("Photo number",i) line2 <-paste("Location:", data$location[i]) line3 <-paste("Meters moved:", data$dist_meters[i]) line4 <-paste("Minutes since last move:",data$timespan[i]) line5 <-paste("Photo caption:", data$caption[i]) line6 <-" "## Add the current row to the table. text_input <-c(text_input, line1, line2, line3, line4, line5, line6) } ## End for loop on data## Check progress.## text_inputprompt <-"Create a description of the day's events based on the information in the file."role <-"You are an expert at inferring daily activities of a person traveling from a log of their photos. The log has information for each photo including the location at which the photo was taken, the distance (in meters) moved from the previous photo location, the time (in minutes) between this and the previous photo, and the photo caption. You know how days are structured with meals and activities and you can aggregate similar events. You do not make up activities or events that are not part of the set of photos."## Do the analysis. response <-analyzeTXT(analysis_text = text_input,AI_role = role,analysis_prompt = prompt,LLM = LLM,apiKey = apiKey,connecttimeout =90)## Check on progress.## cat(response)## Save the results. file_location <-paste0("full_story.txt")write.table(response, file = file_location)