-
Notifications
You must be signed in to change notification settings - Fork 5.8k
PaddlePaddle 3.1.0 Release Note EN
PaddlePaddle framework version 3.1 further optimizes and polishes the core function of automatic parallelism, enhancing usability and performance; it also provides FP8 low-precision training support, increasing the training speed of large models by 10-20%; it improves the hardware extension mechanism, reducing the cost of adapting to hardware similar to CUDA, and users only need to register kernels; at the same time, the basic capabilities of the framework are enhanced to improve its stability. The key updated features are as follows:
- Automatic Parallel Architecture: The automatic parallel architecture has undergone further refinement to enhance the usability of the automatic parallel core mechanism and improve dynamic graph performance. The automatic parallel core mechanism has been improved, including the addition of multiple operator splitting derivation rules, support for the same dimension of distributed tensors being split by multiple mesh dimensions, and support for dynamic graph parallel strategies (PP, CP, SEP, TP-CONV), among others. At the same time, performance optimizations have been systematically implemented for the automatic parallel system of dynamic graphs, achieving performance that is essentially on par with manual parallelism on models such as Llama2, Qwen Baichuan, and others.
- Low-precision training: Based on the blockwise fp8 gemm operator, it supports low-precision training, achieving training accuracy comparable to BF16, and speeding up the training of large models by 10-20%.
- Heterogeneous multi-chip adaptation: Provides a mechanism similar to CUDA operator reuse, where only registration is required to use the corresponding kernel.
- Framework stability enhancement: The system has fixed the calculation errors of operators in the cases of 0-Size and large dimensions.
API enhancements, bug fixes, and improvements are aimed at enhancing user experience and API usability. The paddle.randn_like
API has been added, multiple API functional defects have been fixed, and support for complex types and 0-Size Tensor has been enhanced. Documentation and code have also been updated and optimized accordingly to improve overall accuracy and professionalism.
- Added
paddle.randn_like
API. #72492
- Fixed the issue of inconsistent input and output types in the
tensordot
API. #72139 - Fixed the issue where the output of the
atleast
API was a Tensor list. #73102 - Fixed the issue with the
nonzer
API. #72003 - Fixed the memory leak issue in
dualpipev
. #72070 - Fixed the overflow issue in
softmax
calculation. #71935 - Fixed the shape checking issue in
take_along_axis
whenbroadcast=False
. #72436 - Fixed the incorrect handling of Nan input in
maximum
andminimum
functions. #71933 - Fixed the issue with
visit_type
. #72782 - Fixed the int32 out-of-bounds issue in
gather_scatter_functor
. #72905 - Fixed the inplace implementation of
Bernoulli
.#73271 - Fixed issues with
moe_permute
andmoe_unpermute
. #73365 - Fixed the syntax checking issue of
ast.parse
for pyi files. #71872 - Fixed the issue of complex division. #73331
- Fixed issues related to TensorRT integration. #72302, #72278
- Enhance the functionality of the API, improve its usability, and enhance the user experience. This includes but is not limited to expanding the data types supported by the API, checking API parameters, correcting default values of API parameters, and refining API return values. #71997, #72911, #72985, #73240, #72927, #73451, #73416, #73420, #73347, #73050, #73246, #73123, #73336, #73062, #72201, #72190
- Enhanced API support for complex types. #72279, #72308, #72518, #72391, #72239, #72286, #72169, #72577, #72619
- Enhanced API support for 0-Size Tensor. #72570, #72692, #72138, #72410, #72565, #72262
- Correct spelling errors in the API code to enhance overall accuracy and professionalism. #71780, #71786, #72093, #72113, #72241, #72237, #72590, #72591, #72769, #72858, #73045, #72195, #72627, #72657, #73162, #73402, #72208, #72659, #72658, #72660, #72661, #72656
- Communication optimization reduces peak memory usage. #72035
- Updates to code style check rules. #72896, #73179, #73060, #72553, #72915, #72916, #73338, #72935, #72325, #72935
- Code variable naming updates and code migration. #73048, #73148, #73149, #73264, #73159, #73124, #73160, #73161, #73374, #73395, #73076, #73163, #73255
- LodTensor is being phased out. #71968, #72152, #72145
- Cleaned up useless code. #71795, #71792, #71794, #71793, #72265, #73167, #73115, #73049, #72162, #72321, #72336, #72952, #72828
Supports FP8 matrix operations, enhancing model training efficiency, and simultaneously enhancing multiple models to improve stability; provides a C_ops-style interface for calling the reverse operation, facilitating memory optimization and functional experimentation.
- Support FP8 matrix multiplication acceleration to enhance computational performance and precision adaptability. #73092
- Support for 0-size Tensor execution. #71829, #72263, #72244, #72814
- DeepEP support. #73495
- Enable CINN backend by default. #71838
- Support for SOT-related execution. #72472, #72559, #72466, #73269, #73329, #73405, #73399, #73424, #73509
- Support for converting dynamic to static. #73417, #73081
- Added support for kernels with stride mechanism. #73053
- Performance optimization and stability: Optimized training stability, enhanced support for Python 3.11+, improved automatic activation logic of the CINN compiler in dynamic graph mode, fixed issues with dynamic shape inference and gradient backpropagation, optimized GPU kernel execution efficiency (such as for_range and constant folding), improved NPU memory copy and context management, and enhanced large-scale model training performance and hardware utilization. #71777, #71837, #71834, #71950, #71960, #72103, #70652, #72313, #72405, #72581, #73418
- Large Tensor Support Extension: The extension operator supports extremely large-sized tensors, including mathematical operations (lerp/mean/bmm/trapezoid), tensor operations (arg_min_max/diag/prelu), padding, comparisons (allclose/isclose), and fusion operators (softmax_mask_fuse), addressing compatibility issues in mixed-precision training. #71916, #71970, #72516, #72517, #72638, #72652, #73046, #73093, #73136, #72679, #73174, #73198, #73121, #73096, #73261, #73201, #73291, #73373, #73318, #73436, #72705, #72276, #73135, #73304, #73381, #72712, #72717, #72634, #72562, #72628, #72706, #72831, #72888, #72753, #72931, #73021, #73064, #73069, #73153, #73118, #73252, #73253, #73262, #73259, #73288, #73105, #73275, #73284, #73110, #73335, #73342, #73447, #73460, #73194
- 0-Size Tensor issue fix: Fixed the calculation anomalies caused by 0-Size Tensor, covering pooling (max_pool1d/lp_pool1d), sorting (matrix_rank), statistics (std/nanmedian), and element-level operations (elementwise compare), ensuring numerical stability and API consistency under extreme input scenarios. #71961, #72017, #72785, #73214, #73263, #73267, #73280, #72444, #72437, #72460, #73090, #73516, #72807, #72799, #72800, #72809, #73497
- API Enhancements and Compatibility: Added support for Python standard library types (dataclasses), expanded API data type compatibility (creation of bfloat16 parameters, automatic inference of -1 dimension), fixed NumPy API interaction errors, and optimized BatchNorm memory layout. #72059, #72283, #72451, #72512, #72618, #72976, #73084, #73205, #73250, #73111, #73260, #72094, #71844, #71357
- Memory management and bug fixes: Address high-risk issues such as memory overflow (set_value/nonzero), null pointer (data nullptr), and CUDA graph allocation failure. Fix memory leaks and computational errors in core operations such as gradient clipping (clip_grad), tensor assignment (assign), and broadcasting (broadcast). Optimize NPU asynchronous execution and predictor GIL release logic to enhance system robustness. #71895, #72101, #72133, #72149, #72176, #72314, #72256, #72757, #72749, #72792, #72815, #72819, #72958, #73023, #73103, #73014, #73137, #73256, #73211, #73251, #73210, #73415, #73206, #71983, #72485, #72561
- Other important fixes: Fixed defects in scientific computation, save/load, and other modules, improved the Slice operator kernel configuration, optimized the fallback strategy for dynamic shape inference, and refined the exception throwing and type checking logic. #71810, #72246, #72378, #72467, #72635, #72751, #72044, #72051, #73231, #73109
- Fixed issues related to SOT. #71932, #71971, #72194, #72288, #72306, #72367, #72495, #72522, #72704, #72631, #72737, #73067, #73030, #73059, #73282, #73511, #73526, #73549, #73515
-
Construction of Paddle API 0-size mechanism. #72721, #72756, #72790, #72806, #72764, #72786, #72853, #72826, #72851, #72928, #72912, #72922, #72924, #72887, #72921, #72906, #72895, #72821, #72914, #72936, #72943, #72694, #72919, #72940, #72820, #72934, #72975, #72872, #72984, #72988, #72972, #72977, #72937, #73086, #73042, #73017, #73044, #73077, #73108, #73027, #72970, #73008, #72996, #73165, #73166, #73170, #73122, #73204, #73207, #73186, #73197, #73168, #73172, #73125, #73181, #73270, #73028, #73094, #73180, #73276, #73333, #73341, #73299, #73346, #73361, #73375, #73152, #73377, #73355, #73382, #73385, #73386, #73352, #73387, #73401, #73384, #73450, #73437, #73503, #73507, #73477, #73513, #73525, #73528, #73517, #72898, #72880, #72864, #72993, #72954, #72866, #72878, #72889, #72861, #72837
-
SOT-related enhancements: Enhanced functionality (such as NumPy interoperability and super support), improved training stability, and fixed multiple issues to enhance code robustness. #71763, #71666, #71858, #71865, #72474, #72154, #72784, #72956, #73038, #73066, #73287, #73278, #73332, #73372, #73412, #73407, #73506
-
Code style refactoring: Through code refactoring and unification of cross-platform kernel behaviors, we have improved code quality and maintainability, and added a YAML format pre-commit check tool. #72216, #72360, #72816, #72969, #73106, #72825, #73150, #73151, #73158, #73101, #73326, #72580, #72424
-
Paddle CPU/GPU Kernel accuracy issue is pushed to the whole team. #72879, #72894, #73012, #72973, #73018, #72965, #73128, #73229, #72992, #73344, #73274, #73295, #73293, #73317, #73320, #73454, #73492, #73535
-
Slice issue fixes: Fixed issues related to slices, including indexing logic, performance optimization, etc. #72644, #72676, #72838, #72966, #73095, #72840, #73112, #73367, #73390, #73307, #73465, #73362, #72733, #72886
-
Performance optimization: By optimizing index logic and improving performance, we aim to enhance overall performance. #72707, #73485
-
Other significant improvements: including dynamic shape support, fixing meshgrid and adding unit tests, upgrading CUB to version 2.1.0, improving FP8 numerical processing, optimizing the CUDA graph shared pool mechanism, removing ShadowFeedOp to simplify data flow, enhancing version compatibility for PIR model saving/loading, fixing flip and reverse kernel issues, improving NaN propagation logic for paddle.angle, introducing an asynchronous GC check mechanism, optimizing the Scope lock-free interface for Dy2St, cleaning up unused third-party dependencies (absl), and further promoting the decoupling of PHI and Fluid to enhance the framework's stability, performance, and scalability. #72356, #72380, #72633, #72794, #72917, #72920, #72945, #72620, #73011, #73051, #73052, #73075, #73176, #73191, #73337, #73311, #73173, #73239, #73448, #73478, #73522, #73369
- SOT-related: Through improvements such as optimizing the Guard condition mechanism, enhancing dynamic shape processing capabilities, and adding no_grad support, execution efficiency has been enhanced, functional features have been expanded, and the code structure and performance have been optimized. #70362. #70154, #71748, #72004, #72159, #72174, #71994, #72250, #72285, #72322, #72272, #72417, #72438, #72462, #72463, #72503, #72501, #72521, #72509, #72544, #73469, #73471, #73555
- Code cleanup: Cleaned up Python 3.8 support declarations, and completed related code cleanup, dependency reduction, and syntax modernization updates to optimize code maintainability and compatibility. #71815, #72802, #72856, #72854, #72855, #72873, #72870, #72868, #72891
- Optimized CINN backend integration and dynamic shape processing logic, improved framework stability through code structure refactoring and test reinforcement, and added debugging log functionality to enhance maintainability. #71817, #71896, #71984, #72067, #72165, #72207, #72235, #72273, #72326, #72400, #72381, #72560, #72783, #73530
- Others: Added kernel support for FP16/BF16 data types in CPU sections, optimized error handling and tolerance configuration in test modules, etc. #71764, #71951, #72944
Optimize compiler performance and enhance stability
- Support automatic conversion and optimization of Layout in training scenarios. #71891
- Kernel compilation optimizations for operators such as argmin, argmax, and arange have been added to the backend. #71956, #72598
- Support for fused optimization of matrix multiplication. #72846
- Optimize the computation performance of some operators, specifically the Kernel. #72871
Fix some processing logic bugs in various scenarios. #71813, #71886, #71927, #71915, #71946, #71949, #71955, #71942, #71939, #71973, #72001, #72020, #72014, #72021, #72027, #72061, #72025, #72095, #72108, #72132, #71985, #72106, #72140, #72167, #72037, #72178, #72143, #72175, #72191, #72213, #72189, #72214, #72166, #72180, #72284, #72267, #72348, #72332, #72307, #72353, #72204, #72457, #72426, #72536, #72541, #72365, #72621, #72630, #72669, #72682, #72732, #72811, #72941, #72795, #73536
In version 3.1, we further refined the automatic parallel architecture to enhance the usability of automatic parallelism and the performance of dynamic graphs. Specifically, we improved the core mechanism of automatic parallelism, including adding new splitting derivation rules for multiple operators, supporting the splitting of the same dimension of distributed tensors by multiple mesh dimensions, and supporting dynamic graph parallel strategies (PP, CP, SEP, TP-CONV), etc. At the same time, we systematically optimized the performance of the automatic parallel system for dynamic graphs, achieving performance that is basically on par with manual parallelism on models such as Llama.
-
Support for distributed tensors where the same dimension is split across multiple mesh dimensions. #73233
-
Support for converting automatic parallel communication topology descriptions (ProcessMesh) into manual parallel communication groups. #72052
-
Support send/recv of any serializable Python object. #72098
-
Complete the parallel strategy for dynamic graphs
-
Support for pipeline parallelism strategies 1F1B and VPP scheduling. #72155. #72480, #72179
-
Support for parallel processing of long texts. #73195
-
Support automatic parallelism in communication along the data parallelism dimension. #72540
-
Add the following operator segmentation derivation rules
-
min
,min_grad
#72269 -
bitwise_or
,atan2
,fmax
,fmin
,reciprocal
#72310 -
argmin
,abs
,cosh
#72264 -
mean_all
,mean_all_grad
#72479 -
topk
,topk_grad
#72499 -
argsort
#72388 -
round
,mish
,elu
,selu
,celu
,stanh
,softplus
,softshrink
,thresholded_relu
,logit
,nonzero
#72312 -
unique ops
#72824 -
put_along_axis
#72766 -
round_grad
,trunc_grad
,ceil_grad
,floor_grad
,poisson_grad
#72677 -
log_softmax
,cummax
,cummin
#72720 -
unary
#72177 -
unary_grad
#72260 -
index_select
,index_select_grad
#72727 -
roll
,roll_grad
#72740 -
empty_like
#73169 -
roi_align
,roi_align_grad
#72925 -
expand_as
,expand_as_grad
#73107 -
fused_gemm_epilogur
#73126 -
label_smooth
,label_smooth
#72845 -
group_norm
,group_norm_grad
#72946 -
instance_norm
,instance_norm_grad
#72938 -
batch_norm
,sync_batch_norm
#72918 -
reduce_any
#73175 -
fused_gemm_epilogue_rule
#73494
- Support for the tensor_fusion optimization strategy and overlap optimization strategy with grouped parallel segmentation. #72551, #72902, #73142, #71785
- Optimize the reshard module to reduce communication overhead. #71969, #73024, #71868
- Optimize the slicing derivation rule for multiply to reduce communication overhead. #73408
- Optimize the reverse communication when the distributed partition status is set to Partial, to reduce communication overhead. #73236
- Communication fusion optimization during gradient update. #72120 and #72745
- Optimize the derivation of gelu slicing to reduce communication overhead. #73279
- Optimize the slicing derivation rule of fused_rms_norm when there is Partial status in the input, to reduce communication and computation overhead. #73054
- Fixed the bug of communication hang in the virtual pipeline parallel strategy on H-card. #71104, #73470
- Fixed the bug in save/load. #72023
- Fixed the bug that the linear_fused_grad_add strategy did not work in dynamic graph mode. #72708
- Fixed the issues of the fused_rms_norm operator not running and precision bugs. #72663
- Fixed a bug in the derivation rule for the expand operator segmentation. #73154
- Clean up dead code to facilitate code maintenance. #71814, #72538
- Added API local_map to pass distributed tensors to functions written for ordinary tensors. #71804
- Added checks for operator fused_linear_param_grad_add. #72483
- Gradient and automatic differentiation optimization: Initially supports dual gradient computation for put_along_axis and repeat_interleave operations, enhances the numerical stability of complex operators in automatic differentiation scenarios, and implements operator decomposition for masked_fill operations. #72789, #73056, #73225
- Operator mechanism extension: Added custom support for radd and rmul, enhancing the framework's ability to overload asymmetric operators. #73119
- FP8 module support and operator development: Added support for FP8 block quantization GEMM, introduced multiple fused operators, and provided efficient operator-level implementation for mixed expert (MoE) models, enhancing training and inference performance. #73228, #73285, #73133, #73364, #73520, #73531
- Gradient and automatic differentiation stability improvement: Fixed some errors in the calculation of the inverse operator gradient, enhancing numerical stability and functional correctness in automatic differentiation scenarios. #71716, #72299, #72358, #73037, #73140, #73185
- Numerical accuracy and overflow protection: Addresses issues such as numerical overflow, loss of precision, and large tensor overflow, ensuring the reliability of low-precision computations and large tensor operations. #72584, #72608, #72681, #72639, #73245, #73359, #72456
- Operator logic and framework alignment: Align operator operation logic, fix issues such as abnormal operator inputs, and other important fixes: add checks to ensure the correctness of framework functionality. #72282, #71863, #72650, #72843, #73070, #73141, #73203, #73350, #73440, #73539, #73339
- CUDA kernel and hardware adaptation optimization: Supports NVIDIA SM90 architecture, fixes issues such as overflow, removes redundant CUDA error checks, and enhances GPU computing efficiency and adaptability to new hardware. #72507, #72849, #72959, #73130, #73489
- Added a fast division and modulo implementation for int64_t version, improving computational performance and numerical stability in large integer scenarios, #72530
- Optimize the kernel with stride tensor copy to improve the efficiency of data copy under non-continuous memory layout. #72662
-Unify the usage of quantization API in dynamic and static graph modes, simplifying the quantization model development process, #73100
- Optimize the decomposition performance of the Gelu operator to enhance computational efficiency. #72812
- Fluid operator normalization and exit, #71789, #71818, #71808, #71860, #71806, #72011, #72043, #72034, #72047, #72056, #72087, #72086, #72083, #72079, #72078, #72076, #72057, #72077, #72096, #72085, #72092, #72110, #72127, #72111, #72126, #72135, #72112, #72131, #70358, #72125, #72171, #72160, #72188, #72197, [#7221
The acc_steps
of sharding_overlap
is configurable. #72395
- Fixed the
inplace
issue of operatorc_softmax_with_cross_entropy_grad
. #72366
- Performance optimization and acceleration: Enabled cuDNN support for deep convolution, enhancing convolution operation efficiency. Updated pooling operation strategies and optimized permute memory operations to reduce CUDA memory usage. Optimized printing speed, accelerating debugging and log output processes. #71796, #73442, #73563
- Feature Enhancements and Operational Support: Added the masked_fill operation and Boolean index optimization to enhance tensor masking processing capabilities. Implemented the index_elementwise operation to support index-based element-level operations. Added pooling and reshape execution strategies to enhance the flexibility of model operations. #72788, #72942
- Bug fixes and stability improvements: Fixed partial state support issues of fused_rms_norm in SPMD parallel mode. Corrected index errors in output dimension calculation and IndexGetStride during slice operations to ensure computational correctness. #72118, #72223, #73184, #73237, #73054
- Faster Guard adaptation: Reduce SOT end-to-end link overhead. #71900, #71979, #72081, #72327, #72564, #72823
- Performance optimization and acceleration: Optimize operator scheduling strategy. Upgrade Flash Attention to version 3 to reduce computational overhead. Fix model performance bottlenecks and improve inference and training speed. #71937, #71828, #71461, #72039, #72228, #72225, #72623, #72666, #73147, #73393
- Parallel computing: Optimize the grid re-sharding strategy in automatic parallelism, achieve communication integration and optimization logic in the Sharding Stage, enhance the stability of distributed training, and reduce the communication overhead of distributed training. #71969, #72120, #73279, #73406
Feature Enhancements and Fixes: - Optimized operator indexing and kernel scheduling logic. #72625, #72741, #73082, #73501
- Model and operation support: Support for deep convolution in NHWC format, adapting to more hardware memory layouts. #72121
Optimize hardware mechanisms and provide a solution for reusing hardware kernels similar to CUDA.
Based on the customdevice integration solution, we introduce a low-cost support solution for hardware backends similar to CUDA. These CUDA-like backends can be plugged into Paddle in a modular manner, allowing for low-cost reuse of the majority of CUDA kernels from the NVIDIA ecosystem within Paddle. Furthermore, they can be decoupled from feature upgrades within the Paddle framework, significantly reducing the cost of hardware backend integration and iteration, enhancing user willingness to adopt, and fostering a positive collaborative ecosystem between Paddle and hardware manufacturers. #72604, #72668, #72758, #72865, #72910, #73033, #73145, #73281, #73079
Enhancing XPU basic capabilities: adding kernels, expanding data types, and supplementing branches in the XPU environment #71424, #71809, #71594, #71779, #71756, #71573, #71883, #71954, #71931, #72280, #72361, #72406, #72528, #72752, #72852, #72982, #73357, #73414, #73464, #73234, #71776
DCU kernel extended data type #73129
Fix xpu execution issues #71852, #71966, #72005, #71908, #72431, #72519, #72734, #72763, #72762, #72890, #72867, #73071, #73004, #72726, #73113, #73127, #73025, #73301, #73292, #73272, #73305, #73356, #73438, #72041, #72275, #72787, #73504, #73290
We have optimized the stability and cross-platform compatibility of the framework, fixed issues related to compilation and installation failures on different platforms, upgraded key dependencies such as CUDA, further optimized the CI/CD process, improved the build speed, and enhanced the overall stability of the system. We have also discontinued the maintenance of compilation and installation in the Python 3.8 environment.
- Fixed compilation errors when using clang17 to compile third-party libraries. #72524
- Fixed compilation issues when using CUDA 12.9. #72808, #72841, #72978, #73360
- Fixed compilation issues when using GCC 13.3. #73144
- Fixed compilation issues when WITH_PIP_CUDA_LIBRARIES=ON. #72907
- Fixed compilation issues when WITH_NVSHMEM=ON. #73368
- Avoid copying temporary files generated during the compilation of custom operators. #73196
- Warning message optimization. #72877
- Compilation, installation, maintenance, and upgrade. #71911, #73005
- Image maintenance and updates. #71065, #71821
- Import, export, and update of symbols on the Windows platform. #72497, #72498, #72500
- Windows platform supports CUDA 12.8. #72433
- CI maintenance and upgrade. #72443, #72836, #72563, #72653, #72477, #72778, #72960, #73289, #73422, #73514, #72748,
- Github Action CI construction. #71738, #70602, #71958, #71959, #71992, #72013, #72153, #72031, #72141, #72104, #72182, #72342, #72352, #72249, #72068, #72441, #72392, #72446, #72435, #72515, #72514, #72396, #72547, #72345, #72236, #72586, #72537, #72609, #72632, #72642, #72673, #72647, #72696, #72771, #72711, #72680, #72774, #72813, #72804, #72903, #72900, #72932, #72967, #72991, #72115, #73242, #72801, #73433, #73391, #73456, #73376, #73453, #73481, #73546, #73446, #72744
- Discontinue support for compilation in Python 3.8 environment. #72827
0x3878f, A-nnonymous, AndSonder, ApricityXX, aquagull, author, baoqiwen, BeingGod, blacksheep-Aristotle, BoShen5, bukejiyu, cangtianhuang, carryyu, chang-wenbin, changeyoung98, chen2016013, ckl117, co63oc, cqulilujia, crashbussy, cszdrg, Cutelemon6, cyy536, DanielSun11, danleifeng, datutu-L, deepllz, Dmovic, DrRyanHuang, dynamicheart, Eddie-Wang1120, eggman-1024, emmanuel-ferdman, Enigmatisms, enkilee, fangfangssj, feixi21, FeixLiu, ForFishes, Function-Samuel, ggggxm, GITD245, Glencsa, GoldenStain, gongshaotian, gouzil, gzy19990617, hanlintang, Hongqing-work, houj04, huangjiyi, hxzd5568, HydrogenSulfate, jzhang533, LCStayingdullCircuit, leon062112, lifulll, linkk08, LittleHeroZZZX, liufengwei0103, Liujie0926, liuruyan, lixinqi, LiYuRio, lizexu123, lizhenyun01, lj970926, lshpku, megemini, mikethegoblin, ming1753, mzj104, NKNaN, ooooo-create, pesionzhao, phlrain, pkuzyc, PolaKuma, Qin-sx, RichardWooSJTU, risemeup1, runzhech, RuohengMa, sasaya123, shanjiang7, SigureMo, sneaxiy, swgu98, SylarTiaNII, tianhaodongbd, tianshuo78520a, timminator, tizhou86, umiswing, waliwali777, wanghuancoder, Waynezee, Wennie396, xiaoguoguo626807, XieYunshen, Xing-lil, xkkkkkk23, Xreki, xuxinyi389, Yeenyeong, yongqiangma, YqGe585, yuanlehome, YuanRisheng, yulangz, yuwu46, zeroRains, zhangbo9674, zhanghonggeng, zhangting2020, ZhangX-21, zhangyk0314, zhangyuqin1998, zhink, zhiqiu, zhouquan32, zhoutianzi666, zhupengyang, zrr1999, zty-king, zyfncg