使用 split 分割文件

默认 1k 行一个问题,默认以 x 为前缀:
$ split accesslog
$ ls | head -n 5
accesslog
xaa
xab
xac
xad

numeric-suffixes:
$ split -d accesslog
$ ls  | head -n 5

accesslog
x00
x01
x02
x03

分割固定的行:
$ wc -l accesslog
733928 accesslog
$ split -l 300000 accesslog
$ ll
total 317640
-rw-r–r– 1 jaseywang jaseywang 162625987 2012-06-02 20:45 accesslog
-rw-rw-r– 1 jaseywang jaseywang  66399043 2012-06-02 20:47 xaa
-rw-rw-r– 1 jaseywang jaseywang  66551898 2012-06-02 20:47 xab
-rw-rw-r– 1 jaseywang jaseywang  29675046 2012-06-02 20:47 xac
$ wc -l xaa
300000 xaa

分割固定的大小:
$ split -b 20M accesslog
$ ll -h

total 311M
-rw-r–r– 1 jaseywang jaseywang 156M 2012-06-02 20:45 accesslog
-rw-rw-r– 1 jaseywang jaseywang  20M 2012-06-02 20:51 xaa
-rw-rw-r– 1 jaseywang jaseywang  20M 2012-06-02 20:51 xab
-rw-rw-r– 1 jaseywang jaseywang  20M 2012-06-02 20:51 xac
-rw-rw-r– 1 jaseywang jaseywang  20M 2012-06-02 20:51 xad
-rw-rw-r– 1 jaseywang jaseywang  20M 2012-06-02 20:51 xae
-rw-rw-r– 1 jaseywang jaseywang  20M 2012-06-02 20:51 xaf
-rw-rw-r– 1 jaseywang jaseywang  20M 2012-06-02 20:51 xag
-rw-rw-r– 1 jaseywang jaseywang  16M 2012-06-02 20:51 xah

修改下前缀:
$ split  --bytes=50M accesslog  data
$ ll -h

total 311M
-rw-r–r– 1 jaseywang jaseywang 156M 2012-06-02 20:45 accesslog
-rw-rw-r– 1 jaseywang jaseywang  50M 2012-06-02 20:52 dataaa
-rw-rw-r– 1 jaseywang jaseywang  50M 2012-06-02 20:52 dataab
-rw-rw-r– 1 jaseywang jaseywang  50M 2012-06-02 20:52 dataac
-rw-rw-r– 1 jaseywang jaseywang 5.1M 2012-06-02 20:52 dataad

合并:
$ cat data* > accesslog

stackoverflow 上还有这样一个关于 split 的问题,前提是第一个文件要比第二个文件大。

使用 head, tail 可以实现:
$ wc -l accesslog
$ head -n 600000 accesslog  > top
$ tail -n 133928 accesslog  > bottom
$ ll

total 317636
-rw-r–r– 1 jaseywang jaseywang 162625987 2012-06-02 20:45 accesslog
-rw-rw-r– 1 jaseywang jaseywang  29675046 2012-06-02 21:00 bottom
-rw-rw-r– 1 jaseywang jaseywang 132950941 2012-06-02 20:59 top

split 也可以实现:
$ split -l 600000 accesslog
$ ll

total 317636
-rw-r–r– 1 jaseywang jaseywang 162625987 2012-06-02 20:45 accesslog
-rw-rw-r– 1 jaseywang jaseywang 132950941 2012-06-02 20:58 xaa
-rw-rw-r– 1 jaseywang jaseywang  29675046 2012-06-02 20:58 xab

除了原生的 split,这里还有个据说比官方更快的100行小代码,在指定分割大小的时候更加明显。