Methodology

Data Selection

The data used in this study is from GSE30165, a microarray dataset focusing on differentially expressed genes in the proximal nerve stumps at 0.5 h, 1 h, 3 h, 6 h, 9 h, 1 d, 4 d, 7 d, and 14 d post-injury. The dataset contains gene expression profiles from dorsal root ganglia (DRG) and sciatic nerve (SN) tissues.

Raw Data Retrieval

Raw signal intensity files were downloaded from the GEO repository and extracted using R. Files were organized by tissue type and time point to correctly map between samples and metadata.

Preprocessing and Normalization

The raw microarray data underwent background correction, quantile normalization, and log₂ transformation to reduce technical variation and stabilize variance. Quality control was performed using distribution plots and principal component analysis (PCA) to detect outliers and potential batch effects.

Sample Annotation

A phenodata table was created from the sample metadata to classify each sample by tissue type, time point, and experimental condition (injured or control). This ensured accurate grouping for downstream analysis.

Differential Expression Analysis

Differentially expressed genes (DEGs) were identified using linear modeling and empirical Bayes moderation. Multiple testing correction was applied using the Benjamini–Hochberg method to control the false discovery rate (FDR).

Functional Enrichment Analysis

Gene Ontology (GO) biological process analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were conducted to identify biological processes and pathways significantly enriched among DEGs.

Visualization

Results were visualized using volcano plots, hierarchical clustering heatmaps, and pathway enrichment plots to highlight gene expression trends and functional associations.