Recent years have seen a growing interest in Graph Contrastive Learning (GCL), which trains Graph Neural Network (GNN) model to discriminate similar and dissimilar pairs of nodes without human annotations. Most prior GCL work focuses on homogeneous graphs and little attention has been paid to Heterogeneous Graphs (HGs) that involve different types of nodes and edges. Moreover, earlier studies reveal that the explicit use of structure information of underlying graphs is useful for learning representations. Conventional GCL methods merely measure the likelihood of contrastive pairs according to node representations, which may not align with the true semantic similarities. How to leverage such structure information for GCL is not yet well-understood. To address the aforementioned challenges, this paper presents a novel method dubbed STructureEnhaNced heterogeneous graph ContrastIve Learning, STENCIL for brevity. At first, we generate multiple semantic views for HGs based on metapaths. Unlike most methods that maximizes the consistency among different views, we propose a novel multiview contrastive aggregation objective that adaptively learns information from each view. In addition, we advocate the explicit use of structure embedding, which enriches the model with local structural patterns of the underlying HGs, so as to better mine true and hard negatives for GCL. Empirical studies on three real-world datasets show that our proposed method consistently outperforms existing state-of-the-art methods and even surpasses several supervised counterparts.