The theme is about re-imagining the way we do field research. We’re specifically targeting several of the steps in the process that often interrupt or slow down the research process. These include data collection, data entry, data analysis and chart creation.
The changes we are making are due to the power of a Large Language Model (LLM).
You can see this with a multi-step example.
A New Way
Imagine you’re in a remote village. A research interest in rural healthcare has brought you here. An outdoor clinic has patients waiting on chairs and benches. You and a local colleagues stand nearby but out of hearing range of the patients.
You take out your phone and start the Google Docs app. You’re wearing ear buds so when you press the Docs microphone icon, your words are transcribed as you talk.
Voice entry is a powerful tool. AI-based speech-to-text engines do a good job at capturing your words.
Here’s what appears on the phone’s screen when you’re done.
I’m at the Village clinic. This is my fifth day here. As I look around the waiting room I see an old woman sitting by herself. There is a middle-age woman, also by herself. There is a man with his young daughter. He’s trying to keep her calm, so I’m assuming the daughter is the patient. An older man is with his wife. It looks like the wife has brought her husband for treatment. There is a middle age woman with her son. The boy has his hand wrapped in cloth so I expect that he is the patient. There is a single elderly man sitting by himself.
On returning to your hotel, you open Anthropic’s Claude app on your laptop computer and drag and drop the file with the observations from the clinic.
You type: Can you draw inferences and find commonalities from this text?
Claude responds.
Certainly, I can draw some inferences and find commonalities from the text provided. Here are some observations:
Setting: The scene takes place in a village clinic waiting room, suggesting a rural or small community healthcare setting.
Duration: The narrator mentions it’s their fifth day at the clinic, implying they are either working there or have been visiting regularly for some reason.
Patient demographics:
There’s a mix of ages represented: elderly, middle-aged, and young patients.
Both genders are present: men, women, and children.
There are single individuals as well as family groups.
Family support: Several patients are accompanied by family members, suggesting a strong support system within the community. Examples include a father with his daughter, a wife with her husband, and a mother with her son.
Probable medical issues:
The young girl seems distressed, requiring her father to keep her calm.
The boy has a wrapped hand, indicating a potential injury.
The elderly patients might be there for age-related health issues or regular check-ups.
Waiting room atmosphere: The scene appears relatively quiet and orderly. People are described as sitting, suggesting a calm environment.
Community dynamics: The presence of various age groups and family units gives a snapshot of the village’s population and social structure.
Healthcare access: The existence of a village clinic suggests that basic healthcare is available to the community, though it may be limited compared to urban areas.
Commonalities:
Need for medical care across all age groups
Family involvement in healthcare
Patience (everyone is waiting their turn)
Reliance on local healthcare facilities
These observations paint a picture of a typical day in a village clinic, showing how healthcare needs bring together different members of the community and how family support plays a role in seeking medical attention.
If you’re like me, this was just a “wow!” moment.
This first step is valuable. The report reinforces what you remember seeing. What if we ask for more?
The Next Steps
The LLM can do more. Indeed, we should ask it to continue.
Prompt
Can you structure the patient data into categories so that it can be analyzed?
Response
Certainly, I can structure the patient data into categories for analysis. Let’s organize the information into a more structured format:
Village Clinic Patient Data (Day 5)
Patient Categories
Elderly Patients
Old woman (alone)
Elderly man (with wife)
Single elderly man
Middle-Aged Patients
Middle-aged woman (alone)
Middle-aged woman (with son)
Adult Patients
Man (with young daughter)
Child Patients
Young daughter (with father)
Boy (with mother)
Accompanying Persons
Self (unaccompanied):
Old woman
Middle-aged woman
Single elderly man
Family member:
Wife (with elderly husband)
Father (with young daughter)
Mother (with son)
Observed Health Issues
Young girl: Distressed, needs calming
Boy: Hand wrapped in cloth, possible injury
Others: No visible symptoms noted
Patient Groupings
Single patients: 3
Family groups: 3
Gender Distribution
Female patients: 3
Male patients: 4
Undetermined: 1 (elderly patient with spouse, gender not specified)
Age Distribution
Elderly: 3
Middle-aged: 2
Adult: 1
Children: 2
Additional Notes
Total observed individuals: 8
Setting: Village clinic waiting room
Observation day: Day 5
I’ve created a structured format for the patient data, categorizing it based on various factors such as age groups, accompanying persons, observed health issues, patient groupings, gender distribution, and age distribution. This structure allows for easier analysis and comparison of the data. Here are some key points from this structured data:
There’s a fairly even distribution across age groups, with a slight skew towards elderly patients.
Most patients are accompanied by family members, indicating strong family support in healthcare.
There are more male patients than female patients observed on this day.
Only two patients have noticeable health issues (the distressed girl and the boy with a wrapped hand).
There’s an even split between single patients and family groups.
This structured format makes it easier to identify patterns, track changes over time, or compare with data from other days or clinics. Would you like me to elaborate on any specific category or draw any further conclusions from this structured data?
Prompt
Can you make a table with the patient data using the kinds of categories you’ve observed in the data?
Response
Certainly, I can create a table with the patient data using the categories observed in the data.
Patient Description
Age Group
Gender
Accompanied By
Observed Health Issue
Patient Grouping
Old woman
Elderly
Female
None
None noted
Single
Middle-aged woman
Middle-aged
Female
None
None noted
Single
Young daughter
Child
Female
Father
Distressed
Family group
Elderly man
Elderly
Male
Wife
None noted
Family group
Middle-aged woman
Middle-aged
Female
None
None noted
Family group
Son
Child
Male
Mother
Hand wrapped in cloth
Family group
Elderly man
Elderly
Male
None
None noted
Single
Man (father)
Adult
Male
Daughter
None noted
Family group
Note: The “Patient Description” column includes both patients and accompanying family members to provide a complete picture of all individuals mentioned in the waiting room.
I’ve created a table that organizes the patient data into the following categories:
Patient Description
Age Group
Gender
Accompanied By
Observed Health Issue
Patient Grouping
This table provides a clear and concise overview of the patient data observed in the village clinic. It allows for easy comparison across different categories and can be useful for identifying patterns or trends.
Some observations from this table:
There’s a mix of age groups, with representation from elderly, middle-aged, adult, and child patients.
Gender distribution is relatively balanced.
Most patients are either unaccompanied or accompanied by a single family member.
Only two patients have noted health issues.
There’s an even split between single patients and those in family groups.
Would you like me to perform any specific analysis on this data or make any modifications to the table?
Shifting to R
Doing analyses in the R language is an important step. This is where you create the code that’s a critical part of the Reproducible Document strategy. While it’s nice to get early data reports from the LLM, the “real” analysis needs to be done in an environment like R.
A request to Claude will get us there.
Note the following:
The R code is revealed by clicking on “Show the code”.
Requests can be made to modify the R code so it better serves the analysis requirements.
Prompt
Can you make a CSV file of this? Then, I’d like the R code so I can see the Age Group as a bar chart using this CSV file.
Response
Certainly! I’ll create a CSV file with the patient data and then provide you with R code to create a bar chart of the Age Group distribution.
First, let’s create the CSV file:
Patient Description,Age Group,Gender,Accompanied By,Observed Health Issue,Patient Grouping Old woman,Elderly,Female,None,None noted,Single Middle-aged woman,Middle-aged,Female,None,None noted,Single Young daughter,Child,Female,Father,Distressed,Family group Elderly man,Elderly,Male,Wife,None noted,Family group Middle-aged woman,Middle-aged,Female,None,None noted,Family group Son,Child,Male,Mother,Hand wrapped in cloth,Family group Elderly man,Elderly,Male,None,None noted,Single Man (father),Adult,Male,Daughter,None noted,Family group
These lines were placed in a CSV file named “village_clinic_patient_data.csv”.
Now, let’s create the R code to read this CSV file and create a bar chart of the Age Group distribution:
Show the code
# Load required librarieslibrary(readr)library(ggplot2)# Read the CSV filedata<-read_csv("village_clinic_patient_data.csv")# Create a frequency table of Age Groupsage_group_freq<-table(data$`Age Group`)# Convert the frequency table to a data frameage_group_df<-as.data.frame(age_group_freq)colnames(age_group_df)<-c("Age_Group", "Frequency")# Create the bar chartggplot(age_group_df, aes(x =Age_Group, y =Frequency, fill =Age_Group))+geom_bar(stat ="identity")+theme_minimal()+labs(title ="Distribution of Age Groups in Village Clinic", x ="Age Group", y ="Frequency")+theme(axis.text.x =element_text(angle =45, hjust =1))
Show the code
# Save the plot (optional)# ggsave("age_group_distribution.png", width = 8, height = 6)
To use this R code:
Save the CSV data to a file named “village_clinic_patient_data.csv” in your R working directory.
Run the R code in your R environment (e.g., RStudio).
This code will:
Read the CSV file.
Create a frequency table of Age Groups.
Generate a bar chart using ggplot2, showing the distribution of Age Groups.
The resulting chart will display:
X-axis: Age Groups (Elderly, Middle-aged, Adult, Child)
Y-axis: Frequency (count of patients in each age group)
Each age group will be represented by a bar with a different color
You can uncomment the last line (ggsave) if you want to save the plot as an image file.
This visualization will help you quickly see the distribution of age groups among the patients in the village clinic. Is there anything else you’d like me to add or modify in the code or data?
Prompt
Can you modify the R code in the following ways. Remove the legend. Make all the bars the same color, such as lightblue. Make the order of the bars “Child”, “Adult”, “Middle-aged”, and “Elderly”.
Response
Certainly! I’ll modify the R code to incorporate your requested changes. I’ll update the existing artifact with the new code.
Show the code
# Load required librarieslibrary(readr)library(ggplot2)# Read the CSV filedata<-read_csv("village_clinic_patient_data.csv")# Create a frequency table of Age Groupsage_group_freq<-table(data$`Age Group`)# Convert the frequency table to a data frameage_group_df<-as.data.frame(age_group_freq)colnames(age_group_df)<-c("Age_Group", "Frequency")# Define the desired order of age groupsage_group_order<-c("Child", "Adult", "Middle-aged", "Elderly")# Convert Age_Group to a factor with the specified orderage_group_df$Age_Group<-factor(age_group_df$Age_Group, levels =age_group_order)# Create the bar chartggplot(age_group_df, aes(x =Age_Group, y =Frequency))+geom_bar(stat ="identity", fill ="lightblue")+theme_minimal()+labs(title ="Distribution of Age Groups in Village Clinic", x ="Age Group", y ="Frequency")+theme(axis.text.x =element_text(angle =45, hjust =1))+scale_x_discrete(limits =age_group_order)
Show the code
# Save the plot (optional)# ggsave("age_group_distribution.png", width = 8, height = 6)
I’ve made the following modifications to the R code:
Removed the legend by eliminating the fill = Age_Group aesthetic from the aes() function.
Set all bars to the same color (blue) by adding fill = "lightblue" to the geom_bar() function.
Created a custom order for the age groups: “Child”, “Adult”, “Middle-aged”, and “Elderly”.
Converted the Age_Group column to a factor with the specified order.
Used scale_x_discrete(limits = age_group_order) to ensure the x-axis follows the desired order.
This updated code will create a bar chart with the following characteristics:
All bars will be lightblue.
There will be no legend.
The order of the bars on the x-axis will be “Child”, “Adult”, “Middle-aged”, and “Elderly” (from left to right).
The y-axis will show the frequency (count) of patients in each age group.
To use this code:
Ensure you have the “village_clinic_patient_data.csv” file in your R working directory.
Run the code in your R environment.
The resulting plot will reflect the changes you requested, providing a clear and ordered visualization of the age group distribution in the village clinic.
Is there anything else you’d like me to modify or explain about this code?
But Wait, There’s More …
So far, the use of the LLM has shown the following:
Data can be input using a natural language dialog.
The LLM can interpret and summarize data.
R code can be generated with simple English commands.
What’s possible, but not shown yet, is the ability to use saved information to further simplify the interaction with the LLM and to better customize the responses.