Dept. of Electrical and Computer Engineering
Data Storage Lab (DSL)
Center for Cybersecurity Innovation & Outreach (CyIO)
Center for Wireless, Communities and Innovation (WiCI)
Phone: (515) 294-6285
Email: mai AT iastate DOT edu
www.ece.iastate.edu/~mai
Data Storage Lab
Our research are motivated by problems of mission-critical systems that jeopardize data, e.g.:
- Crash, Corruption, & Bug across system layers ==> [SSD: TOCS'16/FAST'13 | OS: TOS'23 | FS: FAST'23, TOS'18/FAST'18 | DB: OSDI'14]
- Data Lost & Service Disruption @HPC/Data centers at scale ==> [HotStorage'24, TOS'22/ICS'18, HotStorage'21, ATC'19]
- Data Provenance, Observability, & Scalability challenges ==> [TPDS'24/HPDC'22, TBench'21, ASPLOS'23, ARA Platform]
|
We build systems/tools to attack such problems and open source our research prototypes/datasets.
[more
...]
Selected Publications
-
Revisiting Erasure Codes: A Configuration Perspective. [HotStorage'24]
- PROV-IO+:
A Cross-Platform Provenance Framework for Scientific Data on
HPC Systems. [TPDS'24]
-
λFS: A Scalable and Elastic Distributed
File System Metadata Service using Serverless Functions.
[ASPLOS'23]
-
Understanding Persistent-Memory Related Issues in the Linux
Kernel. [TOS'23]
-
ConfD: Analyzing Configuration Dependencies of File Systems
for Fun and Profit. [FAST'23]
-
Drill: Log-based Anomaly Detection for Large-scale Storage
Systems Using Source Code Analysis. [IPDPS'23]
-
FaultyRank: A Graph-based Parallel File System Checker.
[IPDPS'23]
-
Data Distribution for Heterogeneous Storage Systems. [TC'22]
-
PROV-IO: An I/O-Centric Provenance Framework for Scientific
Data on HPC Systems. [HPDC'22]
- Understanding
Configuration Dependencies of File Systems. [HotStorage'22]
Best paper nominee!
-
A Study of Failure Recovery and Logging of High-Performance
Parallel File Systems. [TOS'22]
-
Benchmarking for Observability: The Case of Diagnosing
Storage Failures. [TBench'21]
[BugBenchk]
- ARA: A Wireless Living Lab Vision for Smart and Connected Rural Communities. [WiNTECH'21] [ARA Platform]
-
SentiLog: Anomaly Detection on Parallel File Systems via
Log-based Sentiment Analysis . [HotStorage'21]
Best paper nominee!
- A
Study of Persistent Memory Bugs in the Linux Kernel. [SYSTOR'21]
- Lessons
and Actions: What We Learned from 10K SSD-Related Storage
System Failures. [USENIX
ATC'19]
- A
Performance Study of Lustre File System Checker: Bottlenecks
and Potentials. [MSST'19]
- Towards Robust File System Checkers. [TOS'18] Fast-tracked!
- Data Storage Research Vision 2025. [NSF Visioning Workshop]
- Understanding
SSD Reliability in Large-Scale Cloud Systems. [SC'18-PDSW]
- PFault:
A General Framework for Analyzing the Reliability of
High-Performance Parallel File Systems. [ICS'18]
- Towards Robust File System Checkers. [FAST'18] Best paper nominee!
-
Understanding the Fault Resilience of File System Checkers.
[HotStorage'17]
- Reliability
Analysis of SSDs under Power Fault. [TOCS'16]
-
Torturing Databases for Fun and Profit. [OSDI'14]
-
GMRace: Detecting Data Races in GPU Programs via A
Low-Overhead Scheme. [TPDS'14]
- Understanding
the Robustness of SSDs under Power Fault. [FAST'13]
-
2ndStrike: Towards Manifesting Hidden Concurrency Typestate
Bugs. [ASPLOS'11]
- GRace:
A Low-Overhead Mechanism for Detecting Data Races in GPU
Programs. [PPoPP'11]
[more
...]
Prospective Students
I'm always looking for self-motivated, intellectually-strong, & reliable students who are curious about how computer systems work and are interested in improving the design, implementation, evaluation, and application of various computer systems. Please check out our recent publications & projects and let me know if anything interests you. If you have experience in building systems, that's great! Let's talk and see if we have mutual interests. If you come from a different background, that's OK, too. I'm more than happy to pass my hands-on experience to you and help you grow and succeed, as long as you are hardworking, responsible, determined, and have the desire to become an expert in a challenging and high-impact area in the near future.
[more
...]
Teaching
- ISU CprE563 Advanced Data Storage Systems [Spring'20, Spring'21, Spring'22, Spring'23]
- ISU CprE308 Operating Systems [Fall'18, Fall'19, Fall'20, Fall'21, Spring'22, Fall'22, Spring'23]
- ISU CprE588 Embedded Computer Systems [Spring'19]
- NMSU CS479/579 Special Topics: Reliable Storage Systems [Fall'17]
- NMSU CS479/579 Special Topics: Modern Storage Systems: Flash, Cloud, & Beyond [Spring'16]
- NMSU CS574 Operating Systems II [Spring'17, Spring'18]
- NMSU CS474 Operating Systems I [Fall'15, Fall'16]
- NMSU CS573 Computer Architecture II [Fall'17]
- NMSU CS473 Computer Architecture I [Spring'18]
- NMSU CS491/521 Parallel Programming [Fall'16]
- OSU CSE4251 The UNIX Programming Environment [Fall'14, Spring'15]
[more
...]
Miscellaneous
- Why (Not) Do a PhD (in Computer Science/Engineering): [CRA | Professor@Purdue | Student@Harvard | The PhD Grind]
- [Why Iowa State: Facts & Rankings | Computer Engineering: 34th best in the nation (20th among public schools) | An award-winning campus in a top-ranked city]
- Resources: [Advice on Research and Writing | Advice Collection | Best free OS textbook: OSTEP | Measure, Then Build: Tips on the Process of Systems Research | Building Secure & Reliable Systems | Free ECE Textbook]
- Failures of real-world systems: [Amazon | Chameleon/OpenStack | Kyoto | Algolia | HPCC | OSC1 | OSC2] [Why Does the Cloud Stop Computing]
- Personal Gallery