##Overview
It is envisaged that different users will start at different points of their data preparation. THis section is intended to explain the fake data I have created so the type of data used for the examples can be better understood.
There are several data files used. These are detailed below
#####TheOGDReportFinal A dataset containing fake upper GI endoscopy reports. The report field is provided as a whole report without any fields having been already extracted. There are 2000 rows
#####PathDataFrameFinal A dataset containing fake upper GI pathology reports. The report field is provided as a whole report without any fields having been already extracted. There are 2000 rows
####Pre-extracted datasets:
#####Myendo This has been extracted using the Extractor method as follows from the raw text within Mypath:
mywords <- c("OGDReportWhole","HospitalNumber","PatientName", "GeneralPractitioner","Dateofprocedure","Endoscopist", "Secondendoscopist","Medications","Instrument","ExtentofExam", "Indications","ProcedurePerformed","Findings") Extractor(TheOGDReportFinal,"OGDReportWhole",mywords)
####Mypath
This has been extracted using the Extractor method as follows from the raw text within Mypath:
mywords<-c("HospitalNumber","PatientName","DOB","GeneralPractitioner", "Dateofprocedure","ClinicalDetails","Macroscopicdescription", "Histology","Diagnosis") Extractor(PathDataFrameFinal,"PathReportWhole",mywords)
The original dataset has also been added as “PathReportWhole”,
##Colonoscopy
###Raw datasets
####ColonFinal
A dataset containing fake lower GI endoscopy reports. The report field is provided as a whole report without any fields having been already extracted. There are 2000 rows
####PathDataFrameFinalColon
A dataset containing fake lower GI pathology reports. The report field is provided as a whole report without any fields having been already extracted. There are 2000 rows.