To execute shared-memory-based parallel programs efficiently, we introduce two compiler-assisted software cache schemes which are well-suited to automatic optimizations of remote communications. One scheme is a full user-level software cache (User-level Distributed Shared Memory: UDSM) and another is a page-based cache (Asymmetric Distributed Shared Memory: ADSM) which exploits TLB/MMU only in the cases of read-access-misses. Under these schemes we can apply several optimizing techniques, which exploit capabilities of the middle-grained or coarse-grained remote-memory-accesses, to reduce the number and the amount of communications. We also introduce a high-speed user-level communication and synchronization scheme ``Memory-Based Communication Facilities (MBCF)'' for providing the capabilities in a general-purpose system with off-the-shelf communication-hardware. In this paper, we explain outline of our approach, the UDSM and the ADSM, the MBCF, and optimizing techniques for remote communications. Finally we show experimental results on effects of our proposed approach using our prototype optimizing compiler ``Remote Communication Optimizer (RCOP)'' and the MBCF on Fast Ethernet.