boltdb/bolt

BoltDB.Open crash when opening a partial file, require integrity check function

xxr3376 opened this issue · 6 comments

I'am writing a storage agent based on boltDB, the agent will never restart if it's killed during network recovering. This is because it will try to open a partial file and checking its version at very beginning.

I found that if I migrate boltDB between two hosts by following code, if migration failed during the procedure, it may leave a partial db file on disk, and this file will make boltDB crash when trying to opening that file.

You can reproduce this error by truncate a boltDB file in the middle, then open it. This file will contain a proper magic header but wrong content.

Various of errors may happened depends on the location of truncate point, it's useless to paste any error here. Is there any method to check integrity of a file?
Sender:

func (b *boltDB) ExportTo(acceptType []string, meta MetaWriter, writer io.Writer) error {
    export := false
    for _, t := range acceptType {
        if t == boltType {
            export = true
        }
    }
    if !export {
        return errUnknownFormat
    }
    return b.db.View(func(tx *bolt.Tx) error {
        if meta != nil {
            if err := meta(boltType, tx.Size()); err != nil {
                return err
            }
        }
        _, err := tx.WriteTo(writer)
        return err
    })
}

Receiver:

func ImportBoltDB(filename string, contentType string, reader io.Reader) (KV, error) {
    if contentType != boltType {
        return nil, errUnknownFormat
    }
    file, err := os.OpenFile(filename, os.O_RDWR|os.O_CREATE, 0600)
    if err != nil {
        return nil, err
    }
    _, err = io.Copy(file, reader)
    file.Close()
    if err != nil {
        // XXX Incomplete file should be deleted to prevent boltDB crash
        // This method try it's best to remove partial file, but it can't do anything when receiving SIGKILL.
        os.Remove(filename)
        return nil, err
    }
    return NewBoltDB(filename)
}

blake2b features tree-based (updatable/incremental cryptographic) hashes that were designed for checksumming entire filesystems, so you could use it here to develop a solution. See

https://blake2.net/

and Go libs are available:

https://github.com/glycerine/blake2b-simd

https://github.com/dchest/blake2b

(update: specifically, see section 2.10 of https://blake2.net/blake2_20130129.pdf)

I will compare checksum for integrity during transmission.

Still hope to know, it's there any possible to avoid SEGFAULT when opening an partial file?

Use defer and recover.

No, you can't recover from an SEGFAULT error, no matter in which language.

don't be ridiculous. Only SIGKILL and SIGSTOP cannot be caught. recover works fine for segfaults:

package main                                                                                                   
                                                                                                               
import "fmt"                                                                                                   
                                                                                                               
type s struct {                                                                                                
    a int                                                                                                      
}                                                                                                              
                                                                                                               
func main() {                                                                                                  
                                                                                                               
    var p *s                                                                                                   
                                                                                                               
    defer func() {                                                                                             
        if caught := recover(); caught != nil {                                                                
            fmt.Printf("recovered from segfault")                                                              
        }                                                                                                      
    }()                                                                                                        
                                                                                                               
    p.a = 10                                                                                                   
}

Sorry for saying Can't handle SEGFAULT in any language, we can definitely recover by handling singal.

It's hard to recover from SEGFAULT in following code, you can have a try.

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"math/rand"
	"os"

	"github.com/boltdb/bolt"
)

func main() {
	// Remove previous data
	os.Remove("/tmp/test1.db")
	os.Remove("/tmp/test2.db")

	b, err := bolt.Open("/tmp/test1.db", 0600, nil)
	log.Println("Writing data.")
	err = b.Update(func(tx *bolt.Tx) error {
		b, err := tx.CreateBucketIfNotExists([]byte("haha"))
		if err != nil {
			return err
		}
		d := make([]byte, 128)

		for i := 0; i < 10000; i += 1 {
			n, err := rand.Read(d)
			if n != 128 {
				panic("bad len")
			}
			if err != nil {
				return err
			}
			err = b.Put(d, d)
			if err != nil {
				return err
			}
		}
		return nil
	})
	if err != nil {
		log.Panic("Inserting.")
	}
	err = b.Close()
	if err != nil {
		panic("can't close file")
	}

	log.Println("Testing")
	data, err := ioutil.ReadFile("/tmp/test1.db")
	if err != nil {
		log.Println(err)
		panic("can't read source db")
	}
	err = ioutil.WriteFile("/tmp/test2.db", data[:len(data)/2], 0600)
	if err != nil {
		panic("can't write source db")
	}
	testDB("/tmp/test2.db")
}

func testDB(fn string) {
	defer func() {
		if r := recover(); r != nil {
			log.Println("Recovered in testDB", r)
		}
	}()
	b, err := bolt.Open(fn, 0600, nil)
	if err != nil {
		return
	}
	_ = b
	return
}

(update):
I don't want to handle low-level signal in my main function, it's really hard to do in-place recover.
Your code works fine, because go runtime can identify that nil pointer for you. If errors come from linux kernel (e.g. mmap memory), I believe we can't just simplely recover by calling recover function.