Hejie Cui

Hejie Cui

PhD Student in CS

Emory University


I am a second-year PhD student in Computer Science at Emory University, currently working with Dr. Carl Yang in Emory Graph Mining Lab. I have also been working closely with Dr. Eugene Agichtein in Emory Intelligent Information Access Lab (IRLab).

Before joing Emory, I got my bachelor’s degree in Software Engineering from Tongji University, where I was working with Dr. Lin Zhang.

My current research interests lie in graph data mining and structured information systems.


  • Graph Data Mining
  • Structured Information Systems
  • Information Retrieval


  • PhD in Computer Science, 2024

    Emory University

  • BEng in Software Engineering, 2019

    Tongji University







Research Intern


Dec 2018 – Jul 2019 Beijing, China

Pulmonary Vessel Segmentation based on OrthogonalFused U-Net++ of Chest CT Images:

  • Worked as a intern algorithm engineer in the Intelligent Medical Group.
  • Developed a pulmonary vessel segmentation algorithm based on my updated network, an orthogonal fused U-Net++, for chest CT images.
  • Published a patent on my intern work and got one paper accepted by MICCAI (International Conference on Medical Image Computing and Computer Assisted Intervention, which is the tier 1 conference in medical imaging field) 2019 as the first author

Mitacs Global Research Intern

Queens University

Jul 2018 – Oct 2018 Kingston, Cananda

Improve Center Line Tutor by Deep Learning:

  • Built an extension for classifying web-cam video images using Tensorflow in 3D Slicer
  • Used Tensorflow in real-time workflow detection for providing real-time feedback in central venous catheterization training.
  • Made distortion such as deforming, cropping, or brightening in the training inputs in random ways to polish the model, analyzed the influence of each parameters to get the best retrained model.

Software Engineer Intern


Jul 2017 – Aug 2017 Shanghai, China
Helped to develop SAP ERP system and use the HANA database to process enterprise management data


COVID-19 Search

  • Target: given a topic related to COVID-19, produce a ranked list of documents from a collection of biomedical literature articles per topic ordered by decreasing likelihood that the document matches the information need (still going on).

Mining of Potential Influencing Factors for COVID19 Spread

  • Use GAM model to check whether environment factors have influence on spread of coronavirus.
  • Use SIR model to check whether non-pharmaceutical intervention can help to prevent the spread of coronaviru.
  • Fit a Recurrent Neural Network (RNN) model to predict the daily new confirmed cases of tomorrow based on government response and the daily new confirmed cases of today.

Automatic Commit Messages Generation from Diffs

  • A hybrid method: TF-IDF ranking method improved with a Seq2Seq model based on pointer generator network.
  • TF-IDF part: given a diff string, find the most similar diff and get its comment as candidates.
  • Generate model part: use pointer generator network to predict the next word in the target sequence, re-rank the top 10 matching results from the IR method by the possibility matrix obtained from Seq2Seq model.

Detection and Distance Measurement of Speed Bumps

  • Developed an integral speeds bumps detection and distance measurement system for no-man sweeper vehicles by using Python programming.
  • Utilized Yolo v3 net to detect the speed bumps and obtain the position and size of bounding box.
  • Detected the speed bumps in the video and outputted the distance between the detected bump and the bracket in real time, optimized the model by redefining the distance calculation.

Anomaly Detection Framework using Machine Learning Methods

  • Established a new framework for anomaly (CPU, Memory, IO) detection and stress testing, which can forecast potential failures and pressure spills based on performance data.
  • Collected random injection failure and normal data by using Clear Water platform, trained the classical set KDDCUP99 and data collected in true environment through machine learning classifiers (SVM, Random Forest, NN, etc.).
  • Contrasted the precision, recall rates and F1 score of different classifiers, disovered the highest accuracy (0.997) with using NN methods.

DCOL for Nonlinear Distance Calculation Applied to PCA and T-SNE

  • Proposed a new kernel dimension reduction method based on DCOL(Distance Based on Conditional Ordered List), which could reveal strong nonlinear dependencies in the data.
  • Adopted squares instead of absolute values and made a transformation on DCOL matrix to make the kernel have a new property.
  • Compared the dimension reduction result of the new method with kernel PCA, PCA and T-SNE, and found that the information consistence was increased by introducing the new non-linear distance.

Recent Posts

Pytorch Geometric Environment

traps of pytorch, cuda, gcc version conflicts

Tmux and Screen 常用指令

frequent using command, multiple session ssh