Chinese Handwriting Recognition β€” HSK1 v5 (ResNet CNN + Embedding Verification)

A ResNet-style CNN trained on HWDB1.0 to recognise 178 Chinese characters + Unknown, with an embedding-template verification head for handwriting-practice apps.

What's new in v5

Feature v4 v5
Mode Classification only (which char is this?) + Verification (is this a correct rendering of TARGET?)
Detects wrong char Only via low confidence Yes β€” explicit cosine similarity to target template
Detects missing strokes No (classifier picks closest) Yes β€” low sim to target template even if classifier picks target
New artifact β€” templates_v5.npz (mean embedding per Chinese class)

What's new in v4

Feature v3 v4
EMNIST polarity Same as raw (bright strokes on dark bg) Inverted to match HWDB (dark strokes on bright bg)
Brightness shortcut Mean ratio 5.2x Mean ratio <1.2x

Model details

Item Value
Input 40Γ—40 grayscale image
Classes 179 (178 Chinese characters + Unknown)
Embedding dim 512
Templates 178 (one per Chinese char)
Framework Keras / TensorFlow
Confidence threshold (classify) 0.3
Similarity threshold (verify) 0.9
OOD training data EMNIST Balanced (8% of training set, polarity-inverted)

Quick start β€” verification mode

import numpy as np, json
import tensorflow as tf
from tensorflow import keras

model           = keras.models.load_model('chinese_hsk1_model_v5.keras')
embedding_model = keras.Model(model.input, model.layers[-2].output)

tpl_npz   = np.load('templates_v5.npz')
templates = dict(zip(tpl_npz['chars'].tolist(), tpl_npz['embeddings']))

def verify(img_gray, target_char, sim_threshold=0.65):
    x = img_gray.astype('float32') / 255.0
    x = x.reshape(1, 40, 40, 1)
    emb = embedding_model.predict(x, verbose=0)[0]
    emb = emb / (np.linalg.norm(emb) + 1e-8)
    if target_char not in templates:
        return 'invalid_target', 0.0
    sim = float(np.dot(emb, templates[target_char]))
    return ('correct' if sim >= sim_threshold else 'incomplete_or_unclear'), sim
Downloads last month
135
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support