jim-emacs-fun-hylisp-keras 我的大脑从来不记忆公式,只是记忆书上不存在的Lisp,将哲学保存到每一个Lisp原子里面

从Hack计算代码到计算论文
用函数式LISP来表达问题,问题变得清晰很多
用李小龙和乔布斯的哲学推导吸引Hack计算代码: 首先你是个哲学家,然后才是一个Lisp程序员
午睡前沙发上的觉醒

声音和图像的观点： 声音的特征复用极限 元特征演算 复杂声音和图像！

时间和空间建模！你左声道层 和 右声道层 ，然后全链接层在耳朵里面

。。。。所以立体声 ，如何分离出左声道层 和右声道层，？特征采集的时候，至少需要两个话筒在 左右两边角落。。。。就像立体电影一样，多个摄像头合成一个视频，多个特征采集器的合成！

把左声道层单独抽出来就像带了左耳机一样，有时候就会可能听不懂。。。所以我们要合成才能看懂左声道，解决层的抽取看不懂张量的问题！但是计算机能看懂的反直觉的，局部化的特征层的问题。

深度学习就像 大象的多层 信息蒸馏器，尾巴特征，鼻子特征，瞎子摸象的信息蒸馏器

早上上班车上的机器学习自述和矫正

svm手写识别数字的观点 来看cnn概率数字分类！

概率就是面积 ，二的面积分布，八的面积分布。。。

暴力拟合 学习 0-9的数字的概率，特征暴力拟合，用 权重求和激活 的流数法 来表达！用几何也能解决，但用微积分反向传播更容易些！

高维空间的超平面来分类（svm） ，多层来暴力拟合 极坐标的分类边界！

分类和回归是机器学习的核心概念，用多维数据来暴力拟合分类的超平面界面 是深度学习。。。

回归最简单的是最小二乘和逻辑回归，一条线穿过大部分点叫回归。

分类是一条线或者超平面 最佳分开两种点！

jim-emacs-fun-hylisp-keras 我的大脑从来不记忆公式,只是记忆书上不存在的Lisp,将哲学保存到每一个Lisp原子里面

Elisp化

hy2py3repl2

hy2py3repl2 () {
	rlwrap sh -c 'while read line; do pycode=`echo "$line" | hy2py3`; echo "翻译:"$pycode; echo "执行:"; python -c "print($pycode)"; echo "------------" ; done'
}
#  ------------
#  (-> 1 (+ 2) (- 1) (/ 4))
#  翻译:(1 + 2 - 1) / 4
#  执行:
#  0.5
#  ------------

Hylisp function programming list

import

(import
 [tensorflow :as tf]
 keras
 [keras [backend :as K]]
 [keras.models [Model load_model]]
 [numpy :as np]
 [keras [losses]]
 [keras.layers [Input LSTM GRU Dense Embedding Bidirectional BatchNormalization Lambda Activation]])

Input

(Input :shape (, None) :name "Decoder-Input")
;;=> <tf.Tensor 'Decoder-Input_2:0' shape=(?, ?) dtype=float32>
(Input :shape (, 11) :name "Encoder-Input")
;;=> <tf.Tensor 'Encoder-Input_1:0' shape=(?, 11) dtype=float32>

summary

((. decoder_model summary))

shape

(import [keras.datasets [mnist]])
(setv (, (, train_images train_labels) (, test_images test_labels)) ((. mnist load_data)))

(. train_images shape) ;;=> (60000, 28, 28)
(len train_labels) ;;=> 60000
train_labels ;;=> array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

(. test_images shape) ;;=> (10000, 28, 28)
(len test_labels) ;;=> 10000
test_labels ;;=> array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

神经网络的黑盒不黑get_layer & layers: 就像纯函数一样调用每个层或模型或映射或矩阵或函数

get_layer

;; 拆出来层当模型来用,黑盒映射的白盒化
(setv model_get_layer (fn [name] (-> model (.get_layer name))))

;;添加一个 x -> x^2 层
(model.add (Lambda (fn [x] (** x 2))))

(** (np.array [[[1 8] [3 5]] [[9 7] [6 4]]]) 2)
;; => array([[[ 1, 64],
;;         [ 9, 25]],
;;        [[81, 49],
;;         [36, 16]]])

(K.eval
 ((Lambda (fn [x] (** x 2)))
  (K.variable
   (np.array [[[1 8] [3 5]] [[9 7] [6 4]]]))))
;;=> <tf.Tensor 'lambda_8/pow:0' shape=(2, 2, 2) dtype=float32>
;;K.eval之后=>
;; array([[[  1.,  64.],
;;        [  9.,  25.]],
;;       [[ 81.,  49.],
;;        [ 36.,  16.]]], dtype=float32)

layers

(import [keras.applications.vgg16 [VGG16]]
        [keras.models [Model]]
        [keras.preprocessing [image]]
        [keras.applications.vgg16 [preprocess_input]]
        [numpy :as np])

;; github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5
(setv base_model (VGG16 :weights "imagenet" :include_top True))

;; 先打印所有的layers出来看,以便get_layer单独取出层
(for [(, i layer) (enumerate base_model.layers)]
  (print i ": " layer.name ", " layer.input_shape ", " layer.output_shape))
;; 0 :  input_1 ,  (None, 224, 224, 3) ,  (None, 224, 224, 3)
;; 1 :  block1_conv1 ,  (None, 224, 224, 3) ,  (None, 224, 224, 64)
;; ...
;; 21 :  fc2 ,  (None, 4096) ,  (None, 4096)
;; 22 :  predictions ,  (None, 4096) ,  (None, 1000)

;;keras get weights of dense layer
(setv (, weights biases)
      (-> base_model (.get_layer "fc2") .get_weights))
;; weights=> weights.shape (4096, 4096)
;; biases=> biases.shape (4096,)
;; array([ 0.64710701,  0.48036072,  0.58551109, ...,  0.50245267,
;;         0.41782504,  0.66609925], dtype=float32)

;; 单个层的特征提取predict
(setv mmodel (Model :input base_model.input ;;<tf.Tensor 'input_2:0' shape=(?, 224, 224, 3) dtype=float32>
                    :output (-> base_model (.get_layer "block4_pool") (. output)) ;;<tf.Tensor 'block4_pool_1/MaxPool:0' shape=(?, 14, 14, 512) dtype=float32>
                    ))
(setv features
      (->
       "cat.jpg"
       (image.load_img :target_size (, 224 224))
       (image.img_to_array)
       (np.expand_dims :axis 0)
       (preprocess_input)
       (mmodel.predict)))

(-> features first len) ;;=> 14
(-> features (. shape)) ;; => (1, 14, 14, 512)
(-> features first first (. shape)) ;;=> (14, 512)

层的本质是函数

(setv inputs (Input :shape (, 784)))
(->
 inputs
 ((Dense 32))
 ((Activation K.sigmoid))
 ((Dense 10))
 ((Activation K.softmax))
 ((fn [predictions]
    (Model :inputs inputs :outputs predictions)))
 (.compile :loss keras.losses.categorical_crossentropy :optimizer (keras.optimizers.Adam)))

步步为营保存层,层和模型嫁接迁移

;; 保存层权重和numpy每一步结果,模型的每次结果

(np.save "max_emb_dim500_v2.npy" max_hs)

(seq2seq_Model.save "code_summary_seq2seq_model.h5")
(load_model "code_summary_seq2seq_model.h5")

(model.load_weights weights_path)

;; 
(np.save "a.npy" (np.random.randint :low 90 :high 100 :size 10)) ;; 160B  a.npy
(np.load "a.npy") ;;=> array([91, 92, 91, 95, 91, 90, 97, 91, 95, 94])

Dense softmax

;; 密集连接(全连接):
;; 最后一层是一个14002路的softmax层, 返回一个由14002个概率值(总和为1)组成的数组,
;; 每个概率值表示 当前代码向量 属于14002个句向量类别中某一个的概率
;; keras.activations.softmax or K.softmax
(Dense 14002 :activation keras.activations.softmax :name "Final-Output-Dense")
(fn [data]
  (setv (, dec_bn2 _) data)
  ((model_get_layer "Final-Output-Dense") dec_bn2))

compile

;; sparse_categorical_crossentropy 是整数(sparse)标签应该遵循分类编码
(seq2seq_Model.compile :optimizer (optimizers.Nadam :lr 0.00005)
                       :loss "sparse_categorical_crossentropy")

optimizer

(model.compile :optimizer (SGD) :loss keras.losses.categorical_crossentropy)

predict黑盒映射

(encoder_model.predict raw_tokenized)
(decoder_model.predict [state_value, encoding])

fit拟合数据

(seq2seq_Model.fit [encoder_input_data decoder_input_data]
                   (np.expand_dims decoder_target_data, -1))

evaluate

(setv (, test_loss test_acc) (network.evaluate test_images test_labels))
;;=> test_acc: 0.9785

get_weights & set_weights & load_weights & save_weights

(->
 (tf.placeholder tf.float32 [None 10] :name "input_x")
 ;; OR: (K.placeholder [None 10] :name "input_x")
 ((fn [input_x]
    (setv dense1 (Dense 10 :activation K.relu))
    (dense1 input_x) ;;=> <tf.Tensor 'dense_2/Relu:0' shape=(?, 10) dtype=float32>
    dense1))
 (.get_weights)
 (first)
 len
 ) ;;=> 10

np.array张量0D~3D

(import [numpy :as np])
(-> 12 np.array (. ndim)) ;; 0D, ()
(-> [1 3 5 8] np.array (. ndim)) ;;=> 1D, (4,)
(-> [[1 3] [5 8]] np.array (. ndim)) ;;=> 2D, (2, 2)
(-> [[[1 2] [3 5]] [[9 7] [6 4]]] np.array (. ndim)) ;;=> 3D , (2, 2, 2)

slice张量

(-> train_images (get 4) (. shape)) ;;=> (28, 28)
(-> train_images (get (slice 10 100)) (. shape)) ;;=> (90, 28, 28)
;; 所有图像右下角选出14x14的像素区域
(-> train_images (get [(slice None) (slice 14 None) (slice 14 None)]) (. shape)) ;;=> (60000, 14, 14)
;; 所有图像中心裁剪出14x14的像素区域
(-> train_images (get [(slice None) (slice 7 -7) (slice 7 -7)]) (. shape)) ;;=> (60000, 14, 14)

张量运算 AND OR (like 集合运算)

逐元素relu & add运算

(K.eval
 (K.relu (np.array [[[1 8] [3 5]] [[9 7] [6 4]]])
         :alpha 0.0 :max_value 5))
;;=> <tf.Tensor 'Relu:0' shape=(2, 2, 2) dtype=int64>
;;K.eval => array([[[1, 5],
;;        [3, 5]],
;;       [[5, 5],
;;        [5, 4]]])

(np.add (np.array [[[1 8] [3 5]] [[9 7] [6 4]]])
        (np.array [[[2 8] [5 2]] [[9 8] [6 4]]]))
;; array([[[ 3, 16],
;;         [ 8,  7]],
;;        [[18, 15],
;;         [12,  8]]])

maximum广播

(np.maximum (np.array [[[1 8] [3 5]] [[9 7] [6 4]]]) 0.0)
;;array([[[ 1.,  8.],
;;        [ 3.,  5.]],
;;       [[ 9.,  7.],
;;        [ 6.,  4.]]])

dot点积

(np.dot (np.array [[[1 8] [3 5]] [[9 7] [6 4]]])
        (np.array [[[2 8] [5 2]] [[9 8] [6 4]]]))
;; array([[[[ 42,  24],
;;          [ 57,  40]],
;;         [[ 31,  34],
;;          [ 57,  44]]],
;;        [[[ 53,  86],
;;          [123, 100]],
;;         [[ 32,  56],
;;          [ 78,  64]]]])

uniform

(np.random.uniform 0 1 (, 1 10 32 32))

reshape张量编写

(. (train_images.reshape (, 60000 (* 28 28))) shape) ;;=> (60000, 784)
(/ (train_images.astype "float32") 255)

(->
 [[0. 1.]
  [2. 3.]
  [4. 5.]]
 np.array
 (.reshape (, 2 3)))
;; array([[ 0.,  1.,  2.],
;;        [ 3.,  4.,  5.]])

(-> (np.zeros (, 300 200))
    np.transpose
    (. shape))
;; (200, 300)

empty

(np.empty (, 3 2)) ;; 2维dim, 3列rows
;;array([[  4.94065646e-324,   9.88131292e-324],
;;       [  1.48219694e-323,   1.97626258e-323],
;;       [  2.47032823e-323,   2.96439388e-323]])

randint

(np.random.randint :low 1 :high 100 :size 5) ;;=> array([90, 78, 17, 24, 81])

argmax沿轴axis最大值的索引

(np.argmax (model.predict im))

(setv a (-> (np.arange 6)
            (.reshape 2 3)))
;;array([[0, 1, 2],
;;       [3, 4, 5]])
(np.argmax a) ;;=>  5
(np.argmax a :axis 0) ;列=> array([1, 1, 1])
(np.argmax a :axis 1) ;行=> array([2, 2])
(-> (np.arange 6)
    ((fn [b]
       (setv (get b 1) 5)
       b)) ;;=> array([0, 5, 2, 3, 4, 5])
    np.argmax) ;;=> 1

negative 求相反数

(np.negative [1. -1.]) ;;=> array([-1.,  1.])

expand_dims在第'axis'维，加一个维度出来，原先的'维'，推到右边去

(np.expand_dims im :axis 0)

(-> (np.array [1 2 3])
    ;;(np.expand_dims  :axis 0) ;;=>array([[1, 2, 3]])
    ;;(np.expand_dims  :axis 1) ;;=> array([[1], [2], [3]])
    ;;(np.expand_dims :axis 2) ;;=> array([[1], [2], [3]])
    (np.expand_dims :axis 3) ;; => array([[1], [2], [3]])
    )

(-> (np.array [[1 2 3] [4 5 6]])
    ;;(np.expand_dims :axis 0) ;; array([[[1, 2, 3], [4, 5, 6]]])
    ;;(np.expand_dims :axis 1) array([[[1, 2, 3], [4, 5, 6]]])
    ;; (np.expand_dims :axis 2) ;;=> array([[[1] [2] [3]] [[4] [5] [6]]])
    ;;(np.expand_dims :axis 3) ;;=> array([[[1] [2] [3]] [[4] [5] [6]]])
    (np.expand_dims :axis 4) ;;=> array([[[1] [2] [3]] [[4] [5] [6]]])
    )

squeeze

(np.squeeze out)

argsort大小排序的索引

(-> [3 1 2]
    np.array
    np.argsort) ;;=> array([1, 2, 0])

argpartition(arg*都是返回索引的): 找出第 k 大的数的位置，以及大于 k (排在k后面)和小于 k (排在k前面)的数的位置

(setv a (np.array [9 4 4 3 3 9 0 4 6 0]))

(-> a
    (np.argpartition 6)
    ;; or (np.argpartition :kth 6)
    ((fn [ids] (get a ids)))) ;;=> array([0, 0, 3, 3, 4, 4, 4, 9, 6, 9])

;;Top5
(-> a
    (np.argpartition -5)
    (get (slice -5 None))
    ((fn [ids] (get a ids)))) ;;=> array([4, 4, 9, 6, 9])

数组拼接extend & append & concatenate

(setv a (np.array [[1 2 3] [4 5 6]])
      b (np.array [[11 21 31] [7 8 9]]))
(np.concatenate (, a b) :axis 0)
;;array([[ 1,  2,  3],
;;       [ 4,  5,  6],
;;       [11, 21, 31],
;;       [ 7,  8,  9]])
(np.concatenate (, a b) :axis 1)
;;array([[ 1,  2,  3, 11, 21, 31],
;;       [ 4,  5,  6,  7,  8,  9]])
(np.concatenate (, a b) :axis 2) ;;IndexError: axis 2 out of bounds [0, 2)

(np.append a b) ;;=> array([ 1,  2,  3,  4,  5,  6, 11, 21, 31,  7,  8,  9])
(np.append a b :axis None) ;;=> array([ 1,  2,  3,  4,  5,  6, 11, 21, 31,  7,  8,  9])
(np.append a b :axis 1)
;;array([[ 1,  2,  3, 11, 21, 31],
;;       [ 4,  5,  6,  7,  8,  9]])

(setv list_a (list a)) ;;=> [array([1, 2, 3]), array([4, 5, 6])]
(.extend
 list_a
 (list b)) ;;=> [array([1, 2, 3]), array([4, 5, 6]), array([11, 21, 31]), array([7, 8, 9])]

夹角余弦Cosine(损每一篇论文的所有公式和算法到本jim-emacs-fun)

(setv vector1 (np.array [1 2 3])
      vector2 (np.array [4 5 6]))

(np.divide
 (np.dot vector1 vector2)
 (np.multiply (np.linalg.norm vector1) (np.linalg.norm vector2)))
;;=> 0.97463184619707621

分布式表示最大的问题在于: 连接connect的问题,连接多个层的问题,充分利用各种细胞的简单优势计算

就像你要获得pdf的C-g数据一样,不能像网页一样获得,但是pdf可以用苹果笔来写,网页却不能
合乎逻辑的想象力->假设->Repl的速度证明:「在每一系统的探索中，存在第一原理，是一个最基本的命题或假设，不能被省略或删除，也不能被违反。」(第一原理)

基于梯度的优化

;; W权重(weight,可训练参数: kernel)和b偏置(biase属性)是张量为该层的属性
(K.relu (+ b (K.dot W input)))
;; 1. W权重由目标函数loss去更新
;; 2. 在训练样本上运行此网络层(前向传播),得到预测值,与原始值对比

斜率->导数->梯度(张量运算的导数)

斜率->导数

(setv (f x) y)
(setv (f x epsilon-x) (+ y epsilon-y))
(setv epsilon-y (* a epsilon-x)) ;;f近似为a的线性函数=>
(setv (f x epsilon-x) (+ y (* a epsilon-x)))
(setv (f_derivative x) a)

梯度gradient

;; *采用代码实例化来表达抽象公式*
(setv y-pred (K.dot W x))
(setv loss-value (keras.losses.categorical_crossentropy y-pred y))
(setv loss-value (f W))
(setv W1 (- W0 (* step ((gradient f) W0))))
;; 1. W1是W的当前值
;; 2. 张量((gradient f) W0)是在W0点的导数,形状和W相同: 函数(setv (f W) loss-value)在W0的导数,或可以表示(f W)在W0附近曲率的张量, 只是W0附近曲率的近似值
;; 3. W1是W0减去损失值: (- W0 (* step ((gradient f) W0))) =?> (- W0 loss-value)

随机梯度SGD

Examples实例对比

seq2seq model

(defn build_seq2seq_model [word_emb_dim
                           hidden_state_dim
                           encoder_seq_len
                           num_encoder_tokens
                           num_decoder_tokens]
  (setv
   ;; Encoder Model
   encoder_inputs (Input :shape (, encoder_seq_len) :name "Encoder-Input")
   seq2seq_encoder_out
   (->> encoder_inputs
        ((Embedding num_encoder_tokens word_emb_dim :name "Body-Word-Embedding" :mask_zero False))
        ((BatchNormalization :name "Encoder-Batchnorm-1"))
        ((GRU hidden_state_dim :return_state True :name "Encoder-Last-GRU" :dropout 0.5))
        ((fn [gru_state]
           (setv (, _ state_h) gru_state)
           ((Model :inputs encoder_inputs :outputs state_h :name "Encoder-Model") encoder_inputs))))
   ;; Decoder Model
   decoder_inputs (Input :shape (, None) :name "Decoder-Input")

   seq2seq_Model
   (->> decoder_inputs
        ;; 生成Embedding高阶函数
        ((Embedding num_decoder_tokens word_emb_dim :name "Decoder-Word-Embedding" :mask_zero False))
        ((BatchNormalization :name "Decoder-Batchnorm-1"))
        ((fn [dec_bn]
           (setv (, decoder_gru_output _)
                 ((GRU hidden_state_dim :return_state True :return_sequences True :name "Decoder-GRU" :dropout 0.5)
                  dec_bn :initial_state seq2seq_encoder_out))
           decoder_gru_output))
        ((BatchNormalization :name "Decoder-Batchnorm-2"))
        ((Dense num_decoder_tokens :activation "softmax" :name "Final-Output-Dense"))
        ((fn [decoder_outputs]
           (Model [encoder_inputs decoder_inputs] decoder_outputs))))
   )
  seq2seq_Model
  )

extract model

(defn extract_decoder_model [model]
  ;; Reconstruct the input into the decoder
  (setv model_get_layer (fn [name] (-> model (.get_layer name)))
        decoder_inputs (. (model_get_layer "Decoder-Input") input)
        gru_inference_state_input
        (Input :shape (, (get (. (model_get_layer "Encoder-Model") output_shape) -1))
               :name "hidden_state_input"))
  (setv decoder_model
        (-> decoder_inputs
            ((model_get_layer "Decoder-Word-Embedding"))
            ((model_get_layer "Decoder-Batchnorm-1"))
            ((fn [dec_bn]
               ((model_get_layer "Decoder-GRU") [dec_bn gru_inference_state_input])))
            ((fn [data]
               (setv (, gru_out gru_state_out) data)
               (, ((model_get_layer "Decoder-Batchnorm-2") gru_out) gru_state_out)))
            ((fn [data]
               (setv (, dec_bn2 gru_state_out) data)
               (, ((model_get_layer "Final-Output-Dense") dec_bn2) gru_state_out)))
            ((fn [data]
               (setv (, dense_out gru_state_out) data)
               (Model [decoder_inputs gru_inference_state_input]
                      [dense_out gru_state_out])))))
  decoder_model
  )

Keras vs. PyTorch

model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(MaxPool2D())
model.add(Conv2D(16, (3, 3), activation='relu'))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, 3)
        self.conv2 = nn.Conv2d(32, 16, 3)
        self.fc1 = nn.Linear(16 * 6 * 6, 10) 
        self.pool = nn.MaxPool2d(2, 2)
        
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 6 * 6)
        x = F.log_softmax(self.fc1(x), dim=-1)
        return x
model = Net()

chanshunli/jim-emacs-fun-hylisp-keras

jim-emacs-fun-hylisp-keras 我的大脑从来不记忆公式,只是记忆书上不存在的Lisp,将哲学保存到每一个Lisp原子里面

Elisp化

hy2py3repl2

Hylisp function programming list

import

Input

summary

shape

神经网络的黑盒不黑get_layer & layers: 就像纯函数一样调用每个层或模型或映射或矩阵或函数

步步为营保存层,层和模型嫁接迁移

Dense softmax

compile

optimizer

predict黑盒映射

fit拟合数据

evaluate

get_weights & set_weights & load_weights & save_weights

np.array张量0D~3D

slice张量

张量运算 AND OR (like 集合运算)

夹角余弦Cosine(损每一篇论文的所有公式和算法到本jim-emacs-fun)

分布式表示最大的问题在于: 连接connect的问题,连接多个层的问题,充分利用各种细胞的简单优势计算

基于梯度的优化

斜率->导数->梯度(张量运算的导数)

随机梯度SGD

Examples实例对比

seq2seq model

extract model

Keras vs. PyTorch