Hejie Cui

Hejie Cui

PhD Student in CS

Emory University

Biography

Hejie is a first year PhD student in the Computer Science Department of Emory University. Her research interests include information retrieval, machine learning and data mining. She currently works with Dr.Eugene Agichtein. Before joing Emory, she got her bachelor’s degree in Software Engineering from Tongji University. AI in medical imaging and computer vision is her previous fields during undergraduate study, where she focus on lung vessel segmentation from CT images.

Interests

  • Information Retrieval
  • Machine Learning
  • Data Mining
  • AI in Medical Imaging

Education

  • PhD in Computer Science, 2024

    Emory University

  • BEng in Software Engineering, 2019

    Tongji University

Skills

Python

Git

Linux

Experience

 
 
 
 
 

Research Intern

SenseTime

Dec 2018 – Jul 2019 Beijing, China

Pulmonary Vessel Segmentation based on OrthogonalFused U-Net++ of Chest CT Images:

  • Worked as a intern algorithm engineer in the Intelligent Medical Group.
  • Developed a pulmonary vessel segmentation algorithm based on my updated network, an orthogonal fused U-Net++, for chest CT images.
  • Published a patent on my intern work and got one paper accepted by MICCAI (International Conference on Medical Image Computing and Computer Assisted Intervention, which is the tier 1 conference in medical imaging field) 2019 as the first author
 
 
 
 
 

Mitacs Global Research Intern

Queens University

Jul 2018 – Oct 2018 Kingston, Cananda

Improve Center Line Tutor by Deep Learning:

  • Built an extension for classifying web-cam video images using Tensorflow in 3D Slicer
  • Used Tensorflow in real-time workflow detection for providing real-time feedback in central venous catheterization training.
  • Made distortion such as deforming, cropping, or brightening in the training inputs in random ways to polish the model, analyzed the influence of each parameters to get the best retrained model.
 
 
 
 
 

Software Engineer Intern

SAP

Jul 2017 – Aug 2017 Shanghai, China
Helped to develop SAP ERP system and use the HANA database to process enterprise management data

Projects

COVID-19 Search

  • Target: given a topic related to COVID-19, produce a ranked list of documents from a collection of biomedical literature articles per topic ordered by decreasing likelihood that the document matches the information need (still going on).

Mining of Potential Influencing Factors for COVID19 Spread

  • Use GAM model to check whether environment factors have influence on spread of coronavirus.
  • Use SIR model to check whether non-pharmaceutical intervention can help to prevent the spread of coronaviru.
  • Fit a Recurrent Neural Network (RNN) model to predict the daily new confirmed cases of tomorrow based on government response and the daily new confirmed cases of today.

Automatic Commit Messages Generation from Diffs

  • A hybrid method: TF-IDF ranking method improved with a Seq2Seq model based on pointer generator network.
  • TF-IDF part: given a diff string, find the most similar diff and get its comment as candidates.
  • Generate model part: use pointer generator network to predict the next word in the target sequence, re-rank the top 10 matching results from the IR method by the possibility matrix obtained from Seq2Seq model.

Detection and Distance Measurement of Speed Bumps

  • Developed an integral speeds bumps detection and distance measurement system for no-man sweeper vehicles by using Python programming.
  • Utilized Yolo v3 net to detect the speed bumps and obtain the position and size of bounding box.
  • Detected the speed bumps in the video and outputted the distance between the detected bump and the bracket in real time, optimized the model by redefining the distance calculation.

Anomaly Detection Framework using Machine Learning Methods

  • Established a new framework for anomaly (CPU, Memory, IO) detection and stress testing, which can forecast potential failures and pressure spills based on performance data.
  • Collected random injection failure and normal data by using Clear Water platform, trained the classical set KDDCUP99 and data collected in true environment through machine learning classifiers (SVM, Random Forest, NN, etc.).
  • Contrasted the precision, recall rates and F1 score of different classifiers, disovered the highest accuracy (0.997) with using NN methods.

DCOL for Nonlinear Distance Calculation Applied to PCA and T-SNE

  • Proposed a new kernel dimension reduction method based on DCOL(Distance Based on Conditional Ordered List), which could reveal strong nonlinear dependencies in the data.
  • Adopted squares instead of absolute values and made a transformation on DCOL matrix to make the kernel have a new property.
  • Compared the dimension reduction result of the new method with kernel PCA, PCA and T-SNE, and found that the information consistence was increased by introducing the new non-linear distance.

Contact

  • 404-661-8863
  • 201 Dowman Drive, Atlanta, GA 30033
  • Computer Science and Informatics Department, Office N401